latex.divide

Divide LaTeX files into parts

from fastcore.test import ExceptionExpected, test_eq

from trouver.helper.tests import _test_directory# , non_utf8_chars_in_file

Divide LaTeX file into parts

To make Obsidian notes from a LaTeX file, I use sections/subsections, and environments as places to make new notes.

Things to think about: Sections/subsections environments, including theorems, corollaries, propositions, lemmas, definitions, notations citations Macros defined in the preamble?

LatexMacroNodes include: sections/subsections, citations, references, and labels, e.g.

> \section{Introduction}
\cite{ellenberg2nilpotent}
\subsection{The section conjecture}
\'e
\ref{fundamental-exact-sequence}
\cite{stix2010period}
\ref{fundamental-exact-sequence}
\cite{stix2012rational}
\cite[Appendix C]{stix2010period}
\subsection{The tropical section conjecture}
\label{subsec:tropical-section-conjecture}

Get the Document Node

source

NoDocumentNodeError

 NoDocumentNodeError (text)

*Exception raised when a LatexEnvironmentNode corresponding to the document environment is expected in a LaTeX string, but no such node exists.

Attributes - text - str - The text in which the document environment is not found.*

source

find_document_node

 find_document_node (text:str, document_environment_name:str='document')

*Find the LatexNode object for the main document in text.

Raises - NoDocumentNodeError - If document environment node is not detected.*

	Type	Default	Details
text	str		LaTeX str
document_environment_name	str	document	The name of the document environment.
Returns	LatexEnvironmentNode

The main content of virtually all LaTeX math articles belongs to a document environment, which pylatexenc can often detect. The find_document_node function returns this LatexEnvironmentNode object:

latex_file_path = _test_directory() / 'latex_examples' / 'latex_example_1' / 'main.tex'
text = text_from_file(latex_file_path)
document_node = find_document_node(text)

If the LaTeX file has no document environment, then a NoDocumentNodeError is raised:

# This latex document has its `document` environment commented out.
latex_file_path = _test_directory() / 'latex_examples' / 'latex_example_2' / 'main.tex'
text = text_from_file(latex_file_path)
with ExceptionExpected(NoDocumentNodeError):
    document_node = find_document_node(text)

At the time of this writinga NoDocumentNodeError may be raised even if the LaTeX file has a proper document environment

latex_file_path = _test_directory() / 'latex_examples' / 'example_with_a_command_with_begin.tex'
text = text_from_file(latex_file_path)

# Perhaps in the future, pylatexenc will be able to find the document node for this file.
# When that time comes, delete this example.
with ExceptionExpected(NoDocumentNodeError):
    find_document_node(text)

The divide_preamble function can be used to circumvent this problem:

preamble, document = divide_preamble(text)
document_node = find_document_node(document)
test_eq(document_node.environmentname, 'document')
assert document_node.isNodeType(LatexEnvironmentNode)

# hide
# Find no document node error causes

# latex_file_path = r'_tests\latex_full\litt_cfag\main.tex'
# text = text_from_file(latex_file_path)
# document_node = find_document_node(text)

Detect environment names used in a file

source

environment_names_used

 environment_names_used (text:str)

Return the set of all environment names used in the main document of the latex code.

	Type	Details
text	str	LaTeX document
Returns	set	The set of all environment names used in the main document.

Writers often use different environment names. For examples, writers often use theorem, thm, or theo for theorem environments or lemma or lem for lemma environments. The environment_names_used function returns the environment names actually used in the tex file.

In the example below, note that only the environments that are actually used are returned. For instance, the preamble of the document defines the theorem environments problem, and lemma (among other things), but these are not actually used in the document itself.

latex_file_path = _test_directory() / 'latex_examples' / 'has_fully_written_out_environment_names.tex'
sample_text_1 = text_from_file(latex_file_path)
sample_output_1 = environment_names_used(sample_text_1)
test_eq({'corollary', 'proof', 'maincorollary', 'abstract', 'proposition'}, sample_output_1)

The document in the example below uses shorter names for theorem environments:

latex_file_path = _test_directory() / 'latex_examples' / 'has_shorter_environment_names.tex'
sample_text_2 = text_from_file(latex_file_path)
sample_output_2 = environment_names_used(sample_text_2)
test_eq({'conj', 'notation', 'corollary', 'defn'}, sample_output_2)

Identify the numbering convention of a LaTeX document

LaTeX documents have various number conventions. Here are some examples of papers on the arXiv and notes on their numbering schemes. Note that the source code to these articles are publicly available on the arXiv.

Ellenberg, Venkatesh, and Westerland, Homological stability for Hurwitz spaces and the Cohen-Lenstra conjecture over function fields,
- The subsections and theorem-like environments of each section share a numbering scheme, e.g. section 1 has subsection 1.1 The Cohen-Lenstra heuristics, 1.2 Theorem, 1.3 Hurwitz spaces. This is accomplished by defining theorem-like environments using the subsection counter, e.g.
```
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture}
```
  defines the thm, prop, cor, and conj environments to be numbered using the subsection counter, the remark environmment to be defiend as an unnumbered environment, and the conj* environment to be defined as an unnumbered environment with a different name than the conj environment.
- The \swapnumbers command is included in the preamble to change the way that theorems are numbered in the document, e.g. the article has 1.2 Theorem as opposed to Theorem 1.2.
- The equations are numbered along the subsections - this is accomplished by the lines
```
\numberwithin{equation}{subsection}
\renewcommand{\theequation}{\thesubsection.\arabic{equation}}
```
  in the preamble.
Hoyois, A quadratic refinement of the Grothendieck-Lefschetz-Verdier Trace Formula
- The theorem-like environments are numbered Theorem 1.1, Theorem 1.3, Corollary 1.4, Theorem 1.5, etc.
  - The theorem-like environments that are numbered are assigned the equation counter. In particular, the equation environments share their numberings with the theorem-like environments. For example, section 1 has Equation (1.2)
  - This equation counter is reset at the beginning of each section and the section number is included in the numbering via
```
\numberwithin{equation}{section}
```

# TODO: consider different arxiv articles to see how they are numbered

source

numbered_newtheorems_counters_in_preamble

 numbered_newtheorems_counters_in_preamble (document:str,
                                            add_equation_counter:Optional[
                                            bool]=None)

*Return the dict specifying the numbered \newtheorem command invocations

Assumes that

invocations of the \newtheorem command are exclusively in the preamble of the LaTeX document.
theorem-like environments are defined using the \newtheorem command.
no environments of the same name are defined twice.
There is at most one invocation of \theoremstyle or \newtheorem in each line.

This function does not take into account \numberwithins being used. The numberwithins_in_preamble function accounts for invocations of the \numberwithins command instead.

The equation environment (and other related environments, such as eqnarray) seems to be included in documents of the class amsart or article (i.e. documents which invoke \documentclass{amsart} or \documentclass{article}, possibly with some optional arguments). The equation environment (and other related environments) is accordingly included in the output of this function if the document is of the class amsart and add_equation_counter is not specified, set to None.

This function uses two separate regex patterns, one to detect the invocations of \newtheorem in which the optional parameter is the second parameter and one to detect those in which the optional parameter is the third parameter.*

	Type	Default	Details
document	str		The LaTeX document
add_equation_counter	Optional	None	Determines whether or not the `equation` environment will have a counter added when a `newthoerem` command for the `equation` environment is not explicitly invoked in the preamble. If `None`, then the counter is added if the article is of class `amsart` or `article`. If `True`, then the counter is added. If `False`, then the counter is not added.
Returns	dict		The keys are the command names of the environments. The value a key is a tuple `(<counter>, <reset_by_counter>)`, where `<counter>` is the counter that the environment belongs to, which can be custom defined or predefined in LaTeX, and `<reset_by_counter>` is a counter whose incrementation resets the # counter of the environment, if available.

text = r"""\theoremstyle{definition}                 \newtheorem{conj}{Conjecture}
\newtheorem*{example}{Example}            \newtheorem{defn}{Definition}
\newtheorem{remark}{Remark} \newtheorem*{notation}{Notation}
\begin{document}
\end{document}"""
numbered_newtheorems_counters_in_preamble(text)

{'conj': ('conj', None), 'defn': ('defn', None), 'remark': ('remark', None)}

The numbered_newtheorems_counter_in_preamble function parses the preamble of a LaTeX document for invocations of the \newtheorem command and returns what counters each theorem-like environment command belongs to.

text = text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex') 
print(text)

counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(counters,
   {'theorem': ('theorem', None), 'lemma': ('theorem', None), 'definition': ('theorem', None), 'corollary': ('corollary', None), 'remark': ('remark', 'theorem'), 'equation': ('equation', None), 'eqnarray': ('equation', None)}
)

\documentclass{article}
\usepackage{amsthm}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{lemma}
This is Lemma 2.
\end{lemma}

\begin{definition}
This is Definition 3.
\end{definition}

\end{document}

text = r"""
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture}
\begin{document}
\end{document}
"""
counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(
    counters,
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'remark': ('remark', None), 'conj': ('subsection', None)})

numbered_newtheorems_counters_in_preamble ignores commented out text:

text = r"""
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}

\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture} %\newtheorem{fakeenv}{This won't be picked up!}
\begin{document}
\end{document}
"""
counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(
    counters,
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'conj': ('subsection', None)})

numbered_newtheorems_counters_in_preamble does not account for \numberwithin command invocations. The numberwithins_in_preamble function accounts for invocations of \numberwithin instead.

text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers
# the theorem-like environemnts as being counted by 'equation'.
# Note that the command  `\numberwithin{equation}{section}`
# resets the equation counter
# every time the `section` counter is incremented.
test_eq(numbered_newtheorems_counters_in_preamble(text), 
       {'theorem': ('equation', None), 'proposition': ('equation', None), 'lemma': ('equation', None), 'corollary': ('equation', None), 'definition': ('equation', None), 'example': ('equation', None), 'remark': ('equation', None), 'equation': ('equation', None), 'eqnarray': ('equation', None)}
        )

\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem| be counted by the equation counter.
\end{theorem}

The following makes an equation labeled 1.2; 
\begin{equation}
5 + 7 = 12
\end{equation}

\begin{theorem*}
This Theorem is unnumbered
\end{theorem*}

\begin{corollary}
This is Corollary 1.3.
\end{corollary}

\section{Another section}
\begin{theorem}
This is theorem 2.1
\end{theorem}

The following is labeled 2.2:
\begin{equation}
3+5 = 8.
\end{equation}

\end{document}

The \newtheorem command can be used to specify the counter of the newly defined theorem-like environment to be reset upon another counter’s incrementation; for example \newtheorem{theorem}{Theorem}[section] specifies for a new environment named theorem (with display text Theorem) that is reset whenever the section counter is incremented.

text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex') 
print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers the theorem-like
#  environemnts as being counted by 'equation'.
# Note that the command  `\numberwithin{equation}{section}` resets the equation counter
# every time the `section` counter is incremented.

test_eq(numbered_newtheorems_counters_in_preamble(text), 
        {'lemma': ('theorem', None), 'theorem': ('theorem', 'section'), 'corollary': ('corollary', 'theorem'), 'proposition': ('proposition', 'section'), 'equation': ('equation', None), 'eqnarray': ('equation', None)}

        )


% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]

\begin{document}
\section{Introduction}
Theorems can easily be defined:

\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is 
a continuous function.
\end{theorem}

\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next 
equation 
\[ x^2 + y^2 = z^2 \]
\end{theorem}

And a consequence of theorem \ref{pythagorean} is the statement in the next 
corollary.

\begin{corollary}
There's no right rectangle whose sides measure 3cm, 4cm, and 6cm.
\end{corollary}

You can reference theorems such as \ref{pythagorean} when a label is assigned.

\begin{lemma}
Given two line segments whose lengths are \(a\) and \(b\) respectively there is a 
real number \(r\) such that \(b=ra\).
\end{lemma}

\end{document}

For the following test, we have multiple theorems defined in the same line:

text = r"""\theoremstyle{definition}                 \newtheorem{conj}{Conjecture}
\newtheorem*{example}{Example}            \newtheorem{defn}{Definition}
\newtheorem{remark}{Remark} \newtheorem*{notation}{Notation}
\begin{document}
\end{document}"""
numbered_newtheorems_counters_in_preamble(text)

{'conj': ('conj', None), 'defn': ('defn', None), 'remark': ('remark', None)}

source

numberwithins_in_preamble

 numberwithins_in_preamble (document:str)

*Return the dict describing \numberwithin commands invoked in the preamble of document.

Assumes that \numberwithin commands are invoked exclusively in the preamble.

See also the numbered_newtheorems_counter_in_preamble function, which parses invocations of the \newtheorem command.*

	Type	Details
document	str	The LaTeX document
Returns	dict	The keys are the first arguments of `numberwithin` invocations and the values ar ethe second arguments of `numberwithin` invocations.

The numberwithins_in_preamble function returns a dict describing invocations of the \numberwithin commands. See also the numbered_newtheorems_counter_in_preamble function, which parses invocations of the \newtheorem command.

In the following example, there is an invocation of the \numberwithin command; for the LaTeX document in the example below, the equation counter is reset every time the section counter is incremented.

The numberwithins_in_preamble function returns a dict that is used by the divide_latex_text function to account for this fact.

text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
print(text)
test_eq(numberwithins_in_preamble(text), {'equation': 'section'})

\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem| be counted by the equation counter.
\end{theorem}

The following makes an equation labeled 1.2; 
\begin{equation}
5 + 7 = 12
\end{equation}

\begin{theorem*}
This Theorem is unnumbered
\end{theorem*}

\begin{corollary}
This is Corollary 1.3.
\end{corollary}

\section{Another section}
\begin{theorem}
This is theorem 2.1
\end{theorem}

The following is labeled 2.2:
\begin{equation}
3+5 = 8.
\end{equation}

\end{document}

Getting the display names of environment

For example, \newtheorem{theorem}{Theorem} defines a theorem-like environment called theorem whose display name is Theorem.

source

display_names_of_environments

 display_names_of_environments (document:str)

*Return the dict specifying the display names for each theorem-like environment.

Assumes that - invocations of the \newtheorem command are exclusively in the preamble of the LaTeX document. - theorem-like environments are defined using the \newtheorem command. - no environments of the same name are defined twice.*

	Type	Details
document	str	The LaTeX document
Returns	dict

Basic examples:

text = text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex') 
print(text)
display_names = display_names_of_environments(text)
test_eq(display_names,{'theorem': 'Theorem', 'lemma': 'Lemma', 'definition': 'Definition', 'corollary': 'Corollary', 'conjecture*': 'Conjecture', 'remark': 'Remark'})

\documentclass{article}
\usepackage{amsthm}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{lemma}
This is Lemma 2.
\end{lemma}

\begin{definition}
This is Definition 3.
\end{definition}

\end{document}

file = _test_directory() / 'latex_examples' / 'numbering_example_1_consecutive_numbering_scheme' / 'main.tex'
print(text)
display_names = display_names_of_environments(text)
print(display_names)

\documentclass{article}
\usepackage{amsthm}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{lemma}
This is Lemma 2.
\end{lemma}

\begin{definition}
This is Definition 3.
\end{definition}

\end{document}
{'theorem': 'Theorem', 'lemma': 'Lemma', 'definition': 'Definition', 'corollary': 'Corollary', 'conjecture*': 'Conjecture', 'remark': 'Remark'}

text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex') 
print(text)
display_names = display_names_of_environments(text)
test_eq(display_names,
{'theorem': 'Theorem',
 'corollary': 'Corollary',
 'lemma': 'Lemma',
 'proposition': 'Proposition',})


% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]

\begin{document}
\section{Introduction}
Theorems can easily be defined:

\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is 
a continuous function.
\end{theorem}

\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next 
equation 
\[ x^2 + y^2 = z^2 \]
\end{theorem}

And a consequence of theorem \ref{pythagorean} is the statement in the next 
corollary.

\begin{corollary}
There's no right rectangle whose sides measure 3cm, 4cm, and 6cm.
\end{corollary}

You can reference theorems such as \ref{pythagorean} when a label is assigned.

\begin{lemma}
Given two line segments whose lengths are \(a\) and \(b\) respectively there is a 
real number \(r\) such that \(b=ra\).
\end{lemma}

\end{document}

In the following example, there are multiple \newtheorem commands defined in a single line.

text = r"""\theoremstyle{definition}                 \newtheorem{conj}{Conjecture}
\newtheorem*{example}{Example}            \newtheorem{defn}{Definition}
\newtheorem{remark}{Remark} \newtheorem*{notation}{Notation}
\begin{document}
\end{document}"""
test_eq(display_names_of_environments(text), {'conj': 'Conjecture', 'example': 'Example', 'defn': 'Definition', 'remark': 'Remark', 'notation': 'Notation'})

Divide latex text into parts

source

text_from_node

 text_from_node (node:pylatexenc.latexwalker.LatexNode)

Return the str representing node.

source

get_node_from_simple_text

 get_node_from_simple_text (text:str)

Return the (first) LatexNode object from a str.

node = get_node_from_simple_text(
    r"""\begin{theorem}
lalalala
\begin{equation}
\end{equation}
\end{theorem}"""
)
text_from_node(node)

'\\begin{theorem}\nlalalala\n\\begin{equation}\n\\end{equation}\n\\end{theorem}'

text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
assert isinstance(node, LatexEnvironmentNode)
test_eq(node.environmentname, 'thm')


text = r"""\begin{thm}This is a theorem. \end{thm} \begin{proof} This is a proof. It is not captured by the `get_node_from_simple_text` function \end{proof}"""
node = get_node_from_simple_text(text)
assert isinstance(node, LatexEnvironmentNode)
test_eq(node.environmentname, 'thm')

text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
# Test a theoreem being counted by its own counter.
numbertheorem_counters = {'thm': ('thm', None)}
numberwithins = {}
counters = {'thm': 1}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')
# Test a theorem being countered by the equation counter.
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {}
counters = {'equation': 2}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '2')
# Test a theorem being countered by the equation counter.
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {}
counters = {'equation': 2}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '2')

text = r"""\begin{corollary}This is a corollary. \end{orollary}"""
node = get_node_from_simple_text(text)
# Test a theorem-like environment being counted by the counter of
# another theorem-like environment
numbertheorem_counters = {'corollary': ('theorem', None), 'theorem': ('theorem', None)}
numberwithins = {}
counters = {'theorem': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '0')

# Test a theorem-like environment whose counter is numbered within
# The section counter.
# First, see what happens when a theorem is called
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('theorem', None)}
numberwithins = {'theorem': 'section'}
counters = {'section': 1, 'theorem': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1.0')

# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')

# Test a theorem-like environment sharing a counter with equation
# and in turn equation is numbered within section.
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('equation', None)}
numberwithins = {'equation': 'section'}
counters = {'section': 1, 'equation': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1.0')
# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')

source

swap_numbers_invoked

 swap_numbers_invoked (preamble:str)

*Returns True if \swapnumbers is in the preamble.

Assume that a mention of \swapnumbers is an actual invocation.*

	Type	Details
preamble	str
Returns	bool

assert swap_numbers_invoked(r'\swapnumbers')
assert not swap_numbers_invoked(r'''
\documentclass{article}
\usepackage{amsthm}


\newtheorem{theorem}{Theorem} % \swapnumbers
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}
\newtheorem*{remark*}{Remark}''')

source

divide_latex_text

 divide_latex_text (document:str, dir:Optional[os.PathLike],
                    environments_to_not_divide_along:list[str]=['align',
                    'align*', 'diagram', 'displaymath', 'displaymath*',
                    'enumerate', 'eqnarray', 'eqnarray*', 'equation',
                    'equation*', 'gather', 'gather*', 'itemize', 'label',
                    'multiline', 'multiline*', 'multline', 'multline*',
                    'proof', 'quote', 'tabular', 'table'],
                    replace_commands_in_document_first:bool=True,
                    repeat_replacing_commands:int=-1)

*Divide LaTeX text to convert into Obsidian.md notes.

Assumes that the counters in the LaTeX document are either the predefined ones or specified by the \newtheorem command.

Proof environments are assigned to the same parts their prcededing theorem-like environments, if available.

TODO: Implement counters specified by \newcounter, cf. https://www.overleaf.com/learn/latex/Counters#LaTeX_commands_for_working_with_counters.*

	Type	Default	Details
document	str
dir	Optional		The directory where the included files and style files are to be found.
environments_to_not_divide_along	list	[‘align’, ‘align’, ’diagram’, ’displaymath’, ’displaymath’, ‘enumerate’, ‘eqnarray’, ‘eqnarray’, ’equation’, ’equation’, ‘gather’, ‘gather’, ’itemize’, ’label’, ’multiline’, ’multiline’, ‘multline’, ’multline*‘, ’proof’, ‘quote’, ‘tabular’, ‘table’]	A list of the names of the environemts along which to not make a new note, unless the environment starts a section (or the entire document).
replace_commands_in_document_first	bool	True	If `True`, invoke `replace_commands_in_latex_document` on `document` to first replace custom commands (in the document minus the preamble) before starting to divide the document.
repeat_replacing_commands	int	-1	If `replace_commands_in_document_first` is `True`, then this is passed as the `repeat` argument into the invocation of `replace_commands_in_latex_document`.
Returns	list		Each tuple is of the form `(note_title, text)`, where `note_title` often encapsulates the note type (i.e. section/subsection/display text of a theorem-like environment) along with the numbering and `text` is the text of the part. Sometimes `title` is just a number, which means that `text` is not of a `\section` or `\subsection` command and not of a theorem-like environment.

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_text_with_gather_environment'
file =  dir / 'main.tex'
sample_latex_text = text_from_file(file)
# print(sample_latex_text)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text, dir)
print(parts)

[['1. Introduction', '\\section{Introduction}\n\nThere is an equation\n\\begin{align*}\nasdf\n\\end{align*}\nbut this equation should not get to start its own part.\n\n']]

Examples for the `divide_latex_text` function

In the following example, we take a basic LaTeX file and divide it into parts:

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_proof_preceded_by_theorem'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
print(sample_latex_text)


\documentclass[10pt]{article}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}



\begin{document}
\section{Some section}

\begin{theorem}
This is a theorem.
\end{theorem}

\begin{proof}
This is a proof
\end{proof}

\end{document}

The divide_preamble function recognizes where the preamble ends and where the document begins.

preamble, document = divide_preamble(sample_latex_text)

print(preamble)


\documentclass[10pt]{article}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

print(document)

\begin{document}
\section{Some section}

\begin{theorem}
This is a theorem.
\end{theorem}

\begin{proof}
This is a proof
\end{proof}

\end{document}

The divide_latex_text function divides the LaTeX document into parts, generally based on setions and theorem-like environments:

parts = divide_latex_text(sample_latex_text, dir)
print(parts)
test_eq(len(parts), 2)

[['1. Some section', '\\section{Some section}\n\n'], ['Theorem 1.', '\\begin{theorem}\nThis is a theorem.\n\\end{theorem}\\begin{proof}\nThis is a proof\n\\end{proof}']]

In the next example, we have some enumerate environments to list out some things. The divide_latex_text does not create a new part of the enumerate environment.

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_text_preceded_by_undivided_environment' /_test_directory() / 'latex_examples' / 'divide_latex_example_text_preceded_by_undivided_environment'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
print(sample_latex_text)

% In this example, there are enumerate environments, which should not get their
% own `part`, cf. `divide_latex_text` in `16_latex.convert.ipynb`.
\documentclass[10pt]{article}
\usepackage{amsmath}
\usepackage{amsfonts}
\begin{document}

\section{Introduction}

Blahblahblah, this document has some lists.
The `divide_latex_text` should not create a separate part for the below `enumerate`
environment; after all, it seems better to include the list in the same file/note
as the text that provides context for the list.

\begin{enumerate}
  \item Rings
  \item Fields
\end{enumerate}

And here is another list, perhaps a grocery list:

\begin{enumerate}
  \setcounter{enumi}{3}
  \item apples
  \item bananas
  \item milk
\end{enumerate}

Lalalala

\end{document}

preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(document, dir)
print(parts)
test_eq(len(parts), 1)

[['1. Introduction', '\\section{Introduction}\n\nBlahblahblah, this document has some lists.\nThe `divide_latex_text` should not create a separate part for the below `enumerate`\nenvironment; after all, it seems better to include the list in the same file/note\nas the text that provides context for the list.\n\n\\begin{enumerate}\n  \\item Rings\n  \\item Fields\n\\end{enumerate}\n\nAnd here is another list, perhaps a grocery list:\n\n\\begin{enumerate}\n  \\setcounter{enumi}{3}\n  \\item apples\n  \\item bananas\n  \\item milk\n\\end{enumerate}\n\nLalalala\n\n']]

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_2'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(document, dir)
print(parts)

[['1', '\\maketitle'], ['abstract', '\\begin{abstract}\nThis is an abstract\n\\end{abstract}']]

The divide_latex_text function by default divides along a LaTeX environment (something which is invoked by \begin{...} \end{...}). One can use the optional environments_to_not_divide_along parameter in the function to specify which environments to not divide along. By default, this list is set as follows:

DEFAULT_ENVIRONMENTS_TO_NOT_DIVIDE_ALONG

['align',
 'align*',
 'diagram',
 'displaymath',
 'displaymath*',
 'enumerate',
 'eqnarray',
 'eqnarray*',
 'equation',
 'equation*',
 'gather',
 'gather*',
 'itemize',
 'label',
 'multiline',
 'multiline*',
 'multline',
 'multline*',
 'proof',
 'quote',
 'tabular',
 'table']

In the following example, the theorem, corollary, and definition environments share a counter, which is not reset even when a new section begins.

dir = _test_directory() / 'latex_examples' / 'numbering_example_1_consecutive_numbering_scheme'
file = dir / 'main.tex'
text = text_from_file(file)
print(text)

\documentclass{article}
\usepackage{amsthm}
%\usepackage{amsmath}

\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}
\newtheorem*{remark*}{Remark}

%\numberwithin{theorem}{part}

\begin{document}
For this document, the `theorem` counter is not reset whenever a new section begins.

A similar numbering scheme can be accomplished by importing \verb|amsmath| and invoking the code \verb|\numberwithin{theorem}{part}| in the preamble.

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{corollary}
This is Corollary 2.
\end{corollary}

\begin{remark*}
This is a remark. It is unnumbered and it does not affect the numberings of other environments.
\end{remark*}

\begin{definition}
This is Definition 3.
\end{definition}



\section{Another Section}

\begin{theorem}
This is Theorem 4.
\end{theorem}

And we might get a corollary!

\begin{corollary}
This is Corollary 5.
\end{corollary}

\begin{definition}
This is Definition 6.
\end{definition}

\end{document}

sample_output = divide_latex_text(text, dir)
sample_output

[['1',
  'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'],
 ['1. Introduction', '\\section{Introduction}\n\n'],
 ['Theorem 1.', '\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}'],
 ['Corollary 2.',
  '\\begin{corollary}\nThis is Corollary 2.\n\\end{corollary}'],
 ['Remark',
  '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
 ['Definition 3.',
  '\\begin{definition}\nThis is Definition 3.\n\\end{definition}'],
 ['2. Another Section', '\\section{Another Section}\n\n'],
 ['Theorem 4.', '\\begin{theorem}\nThis is Theorem 4.\n\\end{theorem}'],
 ['2', 'And we might get a corollary!'],
 ['Corollary 5.',
  '\\begin{corollary}\nThis is Corollary 5.\n\\end{corollary}'],
 ['Definition 6.',
  '\\begin{definition}\nThis is Definition 6.\n\\end{definition}']]

assert sample_output[0][0] == '1'
assert sample_output[1][0] == '1. Introduction'
assert sample_output[2][0] == 'Theorem 1.'
assert sample_output[3][0] == 'Corollary 2.'
assert sample_output[4][0] == 'Remark'

In the following example, the \numerwithin command is used to make the theorem-like environments numbered within sections. These environments are first numbered 1.1, 1.2, 1.3, etc., and then numbered 2.1, 2.2, 2.3, etc. once a new section starts.

dir = _test_directory() / 'latex_examples' / 'numbering_example_2_numbering_scheme_reset_at_each_section'
file = dir / 'main.tex'
text = text_from_file(file)
print(text)

% This is an example of a LaTeX document whose theorem-like environments are numbered with sections.

\documentclass{article}
\usepackage{amsthm}
\usepackage{amsmath}

\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}
\newtheorem*{remark*}{Remark}

\numberwithin{theorem}{section}

\begin{document}

This document resets its `theorem` counter whenever a new section begins.

\section{Introduction}

\begin{theorem}
This is Theorem 1.1.
\end{theorem}

\begin{corollary}
This is Corollary 1.2.
\end{corollary}

\begin{remark*}
This is a remark. It is unnumbered and it does not affect the numberings of other environments.
\end{remark*}


\begin{definition}
This is Definition 1.3.
\end{definition}



\section{Another Section}

\begin{theorem}
This is Theorem 2.1.
\end{theorem}

\begin{corollary}
This is Corollary 2.2.
\end{corollary}

\begin{definition}
This is Definition 2.3.
\end{definition}

\end{document}

divide_latex_text(text, dir)

[['1',
  'This document resets its `theorem` counter whenever a new section begins.'],
 ['1. Introduction', '\\section{Introduction}\n\n'],
 ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'],
 ['Corollary 1.2.',
  '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'],
 ['Remark',
  '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
 ['Definition 1.3.',
  '\\begin{definition}\nThis is Definition 1.3.\n\\end{definition}'],
 ['2. Another Section', '\\section{Another Section}\n\n'],
 ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1.\n\\end{theorem}'],
 ['Corollary 2.2.',
  '\\begin{corollary}\nThis is Corollary 2.2.\n\\end{corollary}'],
 ['Definition 2.3.',
  '\\begin{definition}\nThis is Definition 2.3.\n\\end{definition}']]

In this example, the various theorem-like environments share a counter with equation environments and this counter is reset at the start of each new section.

dir = _test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' 
file = dir / 'main.tex'
text = text_from_file(file)
print(text)

\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem| be counted by the equation counter.
\end{theorem}

The following makes an equation labeled 1.2; 
\begin{equation}
5 + 7 = 12
\end{equation}

\begin{theorem*}
This Theorem is unnumbered
\end{theorem*}

\begin{corollary}
This is Corollary 1.3.
\end{corollary}

\section{Another section}
\begin{theorem}
This is theorem 2.1
\end{theorem}

The following is labeled 2.2:
\begin{equation}
3+5 = 8.
\end{equation}

\end{document}

dir = _test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section'
file = dir / 'main.tex'
text = text_from_file(file)
divide_latex_text(text, dir)

[['1. Introduction', '\\section{Introduction}\n\n'],
 ['Theorem 1.1.',
  '\\begin{theorem}\nThis is Theorem 1.1. This is because the \\verb|\\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\\\\n\\verb|\\newtheorem{theorem}[equation]{Theorem}| command makes the environment \\verb|theorem| be counted by the equation counter.\n\\end{theorem}'],
 ['1',
  'The following makes an equation labeled 1.2; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}'],
 ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'],
 ['Corollary 1.3.',
  '\\begin{corollary}\nThis is Corollary 1.3.\n\\end{corollary}'],
 ['2. Another section', '\\section{Another section}\n'],
 ['Theorem 2.1.', '\\begin{theorem}\nThis is theorem 2.1\n\\end{theorem}'],
 ['2',
  'The following is labeled 2.2:\n\\begin{equation}\n3+5 = 8.\n\\end{equation}']]

dir = _test_directory() / 'latex_examples' / 'numbering_example_4_unnumbered_section'
file = dir / 'main.tex'
text = text_from_file(file)
print(divide_latex_text(text, dir))

[['1. This is section 1', '\\section{This is section 1}\n\n'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'], ['1.1. This is a subsection 1.1', '\\subsection{This is a subsection 1.1}\n\nThe following makes an equation labeled 1; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}\n\n'], ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'], ['1.2. This is subsection 1.2', '\\subsection{This is subsection 1.2}\n\n'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'], ['1. Unnumbered section', '\\section*{Unnumbered section}\n\n'], ['1.1. This is subsection 1.3', '\\subsection{This is subsection 1.3}\n'], ['1.1.1. This is subsubsection 1.3.1', '\\subsubsection{This is subsubsection 1.3.1}\n\n'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.3.\n\\end{theorem}'], ['2. Another section', '\\section{Another section}\n\n'], ['2.1. This is subsection 2.1', '\\subsection{This is subsection 2.1}\n\n'], ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1\n\\end{theorem}'], ['1', 'The following is labeled 2; note that the equation counter was not reset by section:\n\\begin{equation}\n3+5 = 8.\n\\end{equation}']]

dir = _test_directory() / 'latex_examples' / 'numbering_example_5_subsections_and_theorem_like_environments_share_counter'
file = dir / 'main.tex'
text = text_from_file(file)
sample_output = divide_latex_text(text, dir)
print(divide_latex_text(text, dir))
test_eq(sample_output[4][0], '1. Remark.')
test_eq(sample_output[5][0], 'Remark')

[['1. This is section 1', '\\section{This is section 1}\n\n'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.1. Theorem. Note that the \\verb|\\swapnumbers| command is invoked in the preamble.\n\\end{thm}'], ['1.2. This is 1.2. subsection.', '\\subsection{This is 1.2. subsection.}\n\nNote that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n'], ['1.2.1. This is 1.2.1. Subsubsection', '\\subsubsection{This is 1.2.1. Subsubsection}\n\n'], ['1. Remark.', '\\begin{remark}\nThis is an 1. Remark. Note that \\verb|\\remark| has a counter separate from those of many of the other theorem-like environments.\n\\end{remark}'], ['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}'], ['1.3. Proposition.', '\\begin{prop}\nThis is 1.3. Proposition.\n\\end{prop}'], ['1. Unnumbered section', '\\section*{Unnumbered section}\n\n'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.4. Theorem.\n\\end{thm}'], ['2. This is Section 2', '\\section{This is Section 2}\n\n'], ['2.1. Theorem.', '\\begin{thm}\nThis is 2.1. Theorem\n\\end{thm}']]

In the below example, the theorem count is specified to reset at every new section and the corollary environment is specified to reset at every new theorem.

In particular, note that there is a Theorem 1.2 and a subsequent Corollary 1.2.1 in the example:

dir = _test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section'
file = dir / 'main.tex'
text = text_from_file(file) 
print(text)
sample_output = divide_latex_text(text, dir )
print(divide_latex_text(text, dir))
test_eq(sample_output[4][0], 'Corollary 1.2.1.')


% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]

\begin{document}
\section{Introduction}
Theorems can easily be defined:

\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is 
a continuous function.
\end{theorem}

\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next 
equation 
\[ x^2 + y^2 = z^2 \]
\end{theorem}

And a consequence of theorem \ref{pythagorean} is the statement in the next 
corollary.

\begin{corollary}
There's no right rectangle whose sides measure 3cm, 4cm, and 6cm.
\end{corollary}

You can reference theorems such as \ref{pythagorean} when a label is assigned.

\begin{lemma}
Given two line segments whose lengths are \(a\) and \(b\) respectively there is a 
real number \(r\) such that \(b=ra\).
\end{lemma}

\end{document}


[['1. Introduction', '\\section{Introduction}\nTheorems can easily be defined:\n\n'], ['Theorem 1.1.', '\\begin{theorem}\nLet \\(f\\) be a function whose derivative exists in every point, then \\(f\\) is \na continuous function.\n\\end{theorem}'], ['Theorem 1.2.', '\\begin{theorem}[Pythagorean theorem]\n\\label{pythagorean}\nThis is a theorem about right triangles and can be summarised in the next \nequation \n\\[ x^2 + y^2 = z^2 \\]\n\\end{theorem}'], ['1', 'And a consequence of theorem \\ref{pythagorean} is the statement in the next \ncorollary.'], ['Corollary 1.2.1.', "\\begin{corollary}\nThere's no right rectangle whose sides measure 3cm, 4cm, and 6cm.\n\\end{corollary}"], ['2', 'You can reference theorems such as \\ref{pythagorean} when a label is assigned.'], ['Lemma 1.3.', '\\begin{lemma}\nGiven two line segments whose lengths are \\(a\\) and \\(b\\) respectively there is a \nreal number \\(r\\) such that \\(b=ra\\).\n\\end{lemma}']]

Note that part titles are stripped and are single-lined:

# TODO: fill in the following example
# part = parts[...]
# assert part[0].strip() == part[0]

In the following example, the subsections and the theorem-like environments share a counter:

dir = _test_directory() / 'latex_examples' / 'numbering_example_8_subsubsections_and_theorems_share_counter'
file = dir / 'main.tex'
text = text_from_file(file) 
print(text)
sample_output = divide_latex_text(text, dir)
print(sample_output)
test_eq(sample_output[-1][0], 'Theorem 1.1.2.')
test_eq(sample_output[-2][0], '1.1.1. section 1.1.1')


% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{cor}[subsubsection]{Corollary}
\newtheorem{lem}[subsubsection]{Lemma}
\newtheorem{prop}[subsubsection]{Proposition}
\newtheorem{propconstr}[subsubsection]{Proposition-Construction}
\newtheorem{lemconstr}[subsubsection]{Lemma-Construction}
\newtheorem{ax}[subsubsection]{Axiom}
\newtheorem{conj}[subsubsection]{Conjecture}
\newtheorem{thm}[subsubsection]{Theorem}
\newtheorem{qthm}[subsubsection]{Quasi-Theorem}
\newtheorem{qlem}[subsubsection]{Quasi-Lemma}
\newtheorem{defn}[subsubsection]{Definition}
\newtheorem{quest}[subsubsection]{Question}
\newtheorem{claim}[subsubsection]{Claim}

\begin{document}
\section{Introduction}

\subsection{section 1.1}

\subsubsection{section 1.1.1}

\begin{thm}
This is theorem 1.1.2
\end{thm}

\end{document}


[['1. Introduction', '\\section{Introduction}\n\n'], ['1.1. section 1.1', '\\subsection{section 1.1}\n\n'], ['1.1.1. section 1.1.1', '\\subsubsection{section 1.1.1}\n\n'], ['Theorem 1.1.2.', '\\begin{thm}\nThis is theorem 1.1.2\n\\end{thm}']]

In the below example, note that there is some text immediately following the subsubsection; the “part” for the start of the subsubsection is joined by this following text:

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_text_after_subsubsection'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
print(sample_latex_text)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text, dir)
print(parts)
test_eq(parts[2], ['1.1.1. section 1.1.1', '\\subsubsection{section 1.1.1}\nSome text beneath subsubsection\n'])


% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{cor}[subsubsection]{Corollary}
\newtheorem{lem}[subsubsection]{Lemma}
\newtheorem{prop}[subsubsection]{Proposition}
\newtheorem{propconstr}[subsubsection]{Proposition-Construction}
\newtheorem{lemconstr}[subsubsection]{Lemma-Construction}
\newtheorem{ax}[subsubsection]{Axiom}
\newtheorem{conj}[subsubsection]{Conjecture}
\newtheorem{thm}[subsubsection]{Theorem}
\newtheorem{qthm}[subsubsection]{Quasi-Theorem}
\newtheorem{qlem}[subsubsection]{Quasi-Lemma}
\newtheorem{defn}[subsubsection]{Definition}
\newtheorem{quest}[subsubsection]{Question}
\newtheorem{claim}[subsubsection]{Claim}

\begin{document}
\section{Introduction}

\subsection{section 1.1}

\subsubsection{section 1.1.1}
Some text beneath subsubsection
\begin{thm}
This is theorem 1.1.2
\end{thm}

\end{document}


[['1. Introduction', '\\section{Introduction}\n\n'], ['1.1. section 1.1', '\\subsection{section 1.1}\n\n'], ['1.1.1. section 1.1.1', '\\subsubsection{section 1.1.1}\nSome text beneath subsubsection\n'], ['Theorem 1.1.2.', '\\begin{thm}\nThis is theorem 1.1.2\n\\end{thm}']]

In the below example, theorem-like environments and equation environments share a counter and there is an equation within a theorem:

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_theorems_and_equations_share_counter_and_equation_in_theorem'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
print(sample_latex_text)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text, dir)
print(parts)
test_eq(parts[2], ['Corollary 1.3.', '\\begin{cor}\nThis is Corollary 1.3\n\\end{cor}'])


\documentclass[12pt]{amsart}
\usepackage{amsmath}
\usepackage{amsfonts}


\numberwithin{equation}{section}
\numberwithin{figure}{section}

\newtheorem{lemma}[equation]{Lemma}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{conjecture}[equation]{Conjecture}
\newtheorem{cor}[equation]{Corollary}
\newtheorem{prop}[equation]{Proposition}

\begin{document}

\section{Introduction}

\begin{theorem}
\begin{equation}
asdf
\end{equation}
\end{theorem}

\begin{cor}
This is Corollary 1.3
\end{cor}

\end{document}
[['1. Introduction', '\\section{Introduction}\n\n'], ['Theorem 1.1.', '\\begin{theorem}\n\\begin{equation}\nasdf\n\\end{equation}\n\\end{theorem}'], ['Corollary 1.3.', '\\begin{cor}\nThis is Corollary 1.3\n\\end{cor}']]

In the below example, theorem-like environments and equation environments share a counter and there is an equation within the proof of a proposition:

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_equation_in_proof'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
print(sample_latex_text)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text, dir)
print(parts)
test_eq(parts[2], ['Corollary 1.3.', '\\begin{cor}\nThis is Corollary 1.3\n\\end{cor}'])


\documentclass[12pt]{amsart}
\usepackage{amsmath}
\usepackage{amsfonts}


\numberwithin{equation}{section}
\numberwithin{figure}{section}

\newtheorem{lemma}[equation]{Lemma}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{conjecture}[equation]{Conjecture}
\newtheorem{cor}[equation]{Corollary}
\newtheorem{prop}[equation]{Proposition}

\begin{document}

\section{Introduction}

\begin{prop}
This is Proposition 1.1
\end{prop}
\begin{proof}
\begin{equation}
\end{equation}
\end{proof}

\begin{cor}
This is Corollary 1.3
\end{cor}

\end{document}
[['1. Introduction', '\\section{Introduction}\n\n'], ['Proposition 1.1.', '\\begin{prop}\nThis is Proposition 1.1\n\\end{prop}\\begin{proof}\n\\begin{equation}\n\\end{equation}\n\\end{proof}'], ['Corollary 1.3.', '\\begin{cor}\nThis is Corollary 1.3\n\\end{cor}']]

In the below example, theorem-like environments and equation environments share a counter and there is an eqnarray after the start of a section:

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_eqnarray_after_start_of_section'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
print(sample_latex_text)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text, dir)
print(parts)
test_eq(parts[1], ['Proposition 1.2.', '\\begin{prop}\nThis is Proposition 1.2\n\\end{prop}'])


\documentclass[12pt]{amsart}
\usepackage{amsmath}
\usepackage{amsfonts}


\numberwithin{equation}{section}
\numberwithin{figure}{section}

\newtheorem{lemma}[equation]{Lemma}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{conjecture}[equation]{Conjecture}
\newtheorem{cor}[equation]{Corollary}
\newtheorem{prop}[equation]{Proposition}

\begin{document}

\section{Introduction}
lalalala some stuff $5$

Hello I am saying stuff
\begin{eqnarray}
\end{eqnarray}
\begin{prop}
This is Proposition 1.2
\end{prop}

\end{document}
[['1. Introduction', '\\section{Introduction}\nlalalala some stuff $5$\n\nHello I am saying stuff\n\\begin{eqnarray}\n\\end{eqnarray}\n'], ['Proposition 1.2.', '\\begin{prop}\nThis is Proposition 1.2\n\\end{prop}']]

In the below example, there are many custom commands deifned using the \def command. Note that use of \u in the LaTeX file, which causes problems for pylatexenc.

The divide_latex_text function had difficulties parsing through the latex document for the below example, specifically because this \u command made pylatexenc unable to find the \section following the \u. Now that divide_latex_text provides the option to replace custom commands from the LaTeX document (with their underlying “meaning/definitions” via the replace_commands_in_latex_document function) before parsing through the LaTeX document, this is no longer a problem.

dir = _test_directory() / 'latex_examples' / 'divide_latex_example_unknown_section_division_problem'
file = dir / 'main.tex'
sample_latex_text = text_from_file(file)
# print(sample_latex_text)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text, dir)
# test_eq(parts[1], ['Proposition 1.2.', '\\begin{prop}\nThis is Proposition 1.2\n\\end{prop}'])
test_eq(len(parts), 3)
print(parts)

[['1. Background and Notation', '\\section{Background and Notation}\n\n'], ['1.1. Unitary groups', '\\subsection{Unitary groups}\n\\label{subsecunitary}\n\nwhere $\\bm{{\\rm R}}_{{\\mathcal O}_E/{\\mathbb Z}}$ is the restriction of scalars functor.\nThen $SU$ is the derived group of $GU$ and of ${\\rm U}$,\n\n'], ['2. Mumford-Tate groups and endomorphism rings', '\\section{Mumford-Tate groups and endomorphism rings}\n\n\\label{secmt}\n\nCarlson and Toledo have \n\n\\bibliographystyle{hamsplain}\n\\bibliography{jda}\n\n']]

# TODO: example with a multilined section title forced to single-lined:
# e.g. `\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`

# TODO: Find a list of environment names commonly used.

# TODO: examples with different numbering convention and different numbered environments

# Here are some latex files with different conventions:
# - Different environment types have different counts and the counts do not show the section number.
#   - vankataramana_imbrd https://arxiv.org/abs/1205.6543: 
#       - e.g. section 1 has Theorem 1, Remark 1, Remark 2, Remark 3, subsection 1.1.3 has Remark 4, Subsection 2.2 has Definition 1

Divide LaTeX file into parts

Get the Document Node

NoDocumentNodeError

find_document_node

Detect environment names used in a file

environment_names_used

Identify the numbering convention of a LaTeX document

numbered_newtheorems_counters_in_preamble

numberwithins_in_preamble

Getting the display names of environment

display_names_of_environments

Divide latex text into parts

text_from_node

get_node_from_simple_text

swap_numbers_invoked

divide_latex_text

Examples for the divide_latex_text function

Examples for the `divide_latex_text` function