import glob
import shutil
import tempfile
from fastcore.test import ExceptionExpected, test_eq
from pathvalidate import validate_filename
from trouver.helper import _test_directory# , non_utf8_chars_in_file
latex.convert
This module contains functions and methods to automatically make Obsidian notes from LaTeX files of mathematical papers, most notably those on arXiv.
See the Potential Problems section below for some common errors that arise from this module and how to circumvent them.
Potential problems
The following are some frequently problems that arise when using this module:
UnicodeDecodeErrors arise when reading LaTeX files
By default, the text_from_file
method in trouver.helper
reads files and attempts to decode them in utf-8
. If a LaTeX file has characters that cannot be decoded into utf-8
, then a UnicodeDecodeError
may be raised. In this case, one can find identify these characters using the trouver.helper.non_utf8_chars_in_file
method and modify the LaTeX file manually. It may be useful to use a text editor to jump to the positions that the characters are at and to change the encoding of the LaTeX file into utf-8
; for example, the author of trouver
has opened some ANSI
-encoded LaTeX documents in Notepad++
and converted their encoding into UTF-8
.
NoDocumentNodeErrors
arise even though the LaTeX file has a document environemt (i.e. \begin{document}...\end{document}
)
The find_document_node
method in this module sometimes is not able to detect the docment environment of a LaTeX file. This error is known to arise when - there are macros (which include commands) defined that represents/expands to characters including \begin{...}... \end{...}
. For example
# TODO in the above explanation, include an example.
LaTeX comments
remove_comments
remove_comments (text:str)
= r"""% Commands with parameters
text \newcommand{\field}[1]{\mathbb{#1}}
\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
#3 & #4\end{array}\right]}
\newcommand{\dual}[1]{#1^{\vee}}
\newcommand{\compl}[1]{\hat{#1}}
"""
assert '%' not in remove_comments(text)
print(remove_comments(text))
= r"""Hi. I'm not commented. %But I am!"""
text "Hi. I'm not commented. ") test_eq(remove_comments(text),
\newcommand{\field}[1]{\mathbb{#1}}
\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
#3 & #4\end{array}\right]}
\newcommand{\dual}[1]{#1^{\vee}}
\newcommand{\compl}[1]{\hat{#1}}
Divide LaTeX file into parts
To make Obsidian notes from a LaTeX file, I use sections/subsections, and environments as places to make new notes.
Things to think about: Sections/subsections environments, including theorems, corollaries, propositions, lemmas, definitions, notations citations Macros defined in the preamble?
LatexMacroNodes include: sections/subsections, citations, references, and labels, e.g.
\section{Introduction}
> \cite{ellenberg2nilpotent}
\subsection{The section conjecture}
\'e
\ref{fundamental-exact-sequence}
\cite{stix2010period}
\ref{fundamental-exact-sequence}
\cite{stix2012rational}
\cite[Appendix C]{stix2010period}
\subsection{The tropical section conjecture}
\label{subsec:tropical-section-conjecture}
Divide the preamble from the rest of the document
Some macros and commands defined in the preamble seem to prevent the pylatexenc
methods from properly identifying the document environment/node in a LaTeX document. To circumvent this, we define a function to divide the preamble from the rest of the document
divide_preamble
divide_preamble (text:str, document_environment_name:str='document')
Divide the preamble from the rest of a LaTeX document.
Type | Default | Details | |
---|---|---|---|
text | str | LaTeX document | |
document_environment_name | str | document | |
Returns | tuple |
= _test_directory() / 'latex_examples' / 'example_with_a_command_with_begin.tex'
latex_file_path = text_from_file(latex_file_path)
text
= divide_preamble(text)
preamble, document assert r'\begin{displaymath}' in preamble
assert r'Hyun Jong Kim' in preamble
assert r'Hyun Jong Kim' not in document
assert document.startswith(r'\begin{document}')
assert document.endswith('\\end{document}')
Get the Document Node
NoDocumentNodeError
NoDocumentNodeError (text)
Exception raised when a LatexEnvironmentNode corresponding to the document environment is expected in a LaTeX string, but no such node exists.
Attributes - text - str - The text in which the document environment is not found.
find_document_node
find_document_node (text:str, document_environment_name:str='document')
Find the LatexNode
object for the main document in text
.
Raises - NoDocumentNodeError - If document environment node is not detected.
Type | Default | Details | |
---|---|---|---|
text | str | LaTeX str | |
document_environment_name | str | document | The name of the document environment. |
Returns | LatexEnvironmentNode |
The main content of virtually all LaTeX math articles belongs to a document environment, which pylatexenc can often detect. The find_document_node
function returns this LatexEnvironmentNode
object:
= _test_directory() / 'latex_examples' / 'latex_example_1' / 'main.tex'
latex_file_path = text_from_file(latex_file_path)
text = find_document_node(text) document_node
If the LaTeX file has no document
environment, then a NoDocumentNodeError
is raised:
# This latex document has its `document` environment commented out.
= _test_directory() / 'latex_examples' / 'latex_example_2' / 'main.tex'
latex_file_path = text_from_file(latex_file_path)
text with ExceptionExpected(NoDocumentNodeError):
= find_document_node(text) document_node
At the time of this writinga NoDocumentNodeError
may be raised even if the LaTeX file has a proper document
environment
= _test_directory() / 'latex_examples' / 'example_with_a_command_with_begin.tex'
latex_file_path = text_from_file(latex_file_path)
text
# Perhaps in the future, pylatexenc will be able to find the document node for this file.
# When that time comes, delete this example.
with ExceptionExpected(NoDocumentNodeError):
find_document_node(text)
The divide_preamble
function can be used to circumvent this problem:
= divide_preamble(text)
preamble, document = find_document_node(document)
document_node 'document')
test_eq(document_node.environmentname, assert document_node.isNodeType(LatexEnvironmentNode)
# hide
# Find no document node error causes
# latex_file_path = r'_tests\latex_full\litt_cfag\main.tex'
# text = text_from_file(latex_file_path)
# document_node = find_document_node(text)
Detect environment names used in a file
environment_names_used
environment_names_used (text:str)
Return the set of all environment names used in the main document of the latex code.
Type | Details | |
---|---|---|
text | str | LaTeX document |
Returns | set | The set of all environment names used in the main document. |
Writers often use different environment names. For examples, writers often use theorem
, thm
, or theo
for theorem environments or lemma
or lem
for lemma environments. The environment_names_used
function returns the environment names actually used in the tex file.
In the example below, note that only the environments that are actually used are returned. For instance, the preamble of the document defines the theorem environments problem
, and lemma
(among other things), but these are not actually used in the document itself.
= _test_directory() / 'latex_examples' / 'has_fully_written_out_environment_names.tex'
latex_file_path = text_from_file(latex_file_path)
sample_text_1 = environment_names_used(sample_text_1)
sample_output_1 'corollary', 'proof', 'maincorollary', 'abstract', 'proposition'}, sample_output_1) test_eq({
The document in the example below uses shorter names for theorem environments:
= _test_directory() / 'latex_examples' / 'has_shorter_environment_names.tex'
latex_file_path = text_from_file(latex_file_path)
sample_text_2 = environment_names_used(sample_text_2)
sample_output_2 'conj', 'notation', 'corollary', 'defn'}, sample_output_2) test_eq({
Identify the numbering convention of a LaTeX document
LaTeX documents have various number conventions. Here are some examples of papers on the arXiv and notes on their numbering schemes. Note that the source code to these articles are publicly available on the arXiv.
- Ellenberg, Venkatesh, and Westerland, Homological stability for Hurwitz spaces and the Cohen-Lenstra conjecture over function fields,
The subsections and theorem-like environments of each section share a numbering scheme, e.g. section 1 has subsection
1.1 The Cohen-Lenstra heuristics
,1.2 Theorem
,1.3 Hurwitz spaces
. This is accomplished by defining theorem-like environments using thesubsection
counter, e.g.\theoremstyle{plain} \newtheorem{thm}[subsection]{Theorem} \newtheorem{prop}[subsection]{Proposition} \newtheorem{cor}[subsection]{Corollary} \newtheorem{remark}{Remark} \newtheorem{conj}[subsection]{Conjecture} \newtheorem*{conj*}{Conjecture}
defines the
thm
,prop
,cor
, andconj
environments to be numbered using thesubsection
counter, theremark
environmment to be defiend as an unnumbered environment, and theconj*
environment to be defined as an unnumbered environment with a different name than theconj
environment.The
\swapnumbers
command is included in the preamble to change the way that theorems are numbered in the document, e.g. the article has1.2 Theorem
as opposed toTheorem 1.2
.The equations are numbered along the subsections - this is accomplished by the lines
\numberwithin{equation}{subsection} \renewcommand{\theequation}{\thesubsection.\arabic{equation}}
in the preamble.
- Hoyois, A quadratic refinement of the Grothendieck-Lefschetz-Verdier Trace Formula
- The theorem-like environments are numbered
Theorem 1.1, Theorem 1.3, Corollary 1.4, Theorem 1.5
, etc.- The theorem-like environments that are numbered are assigned the
equation
counter. In particular, the equation environments share their numberings with the theorem-like environments. For example, section 1 has Equation(1.2)
- This equation counter is reset at the beginning of each section and the section number is included in the numbering via
\numberwithin{equation}{section}
- The theorem-like environments that are numbered are assigned the
- The theorem-like environments are numbered
# TODO: consider different arxiv articles to see how they are numbered
numbered_newtheorems_counters_in_preamble
numbered_newtheorems_counters_in_preamble (document:str)
Return the dict specifying the numbered \newtheorem
command invocations
Assumes that
- invocations of the
\newtheorem
command are exclusively in the preamble of the LaTeX document. - theorem-like environments are defined using the
\newtheorem
command. - no environments of the same name are defined twice.
This function does not take into account numberwithins
being used.
This function uses two separate regex patterns, one to detect the invocations of \newtheorem
in which the optional parameter is the second parameter and one to detect those in which the optional parameter is the third parameter.
Type | Details | |
---|---|---|
document | str | The LaTeX document |
Returns | dict | The keys are the command names of the environments. The value a key is a tuple (<counter>, <reset_by_counter>) , where <counter>`` is the counter that the environment belongs to, which can be custom defined or predefined in LaTeX, and |
The numbered_newtheorems_counter_in_preamble
function parses the preamble of a LaTeX document for invocations of the newtheorem
command and returns what counters each theorem-like environment command belongs to.
= text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex')
text print(text)
= numbered_newtheorems_counters_in_preamble(text)
counters
test_eq(counters,'theorem': ('theorem', None), 'lemma': ('theorem', None), 'definition': ('theorem', None), 'corollary': ('corollary', None), 'remark': ('remark', 'theorem')}) {
\documentclass{article}
\usepackage{amsthm}
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter
\begin{document}
\section{Introduction}
\begin{theorem}
This is Theorem 1.
\end{theorem}
\begin{lemma}
This is Lemma 2.
\end{lemma}
\begin{definition}
This is Definition 3.
\end{definition}
\end{document}
= r"""
text \theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture}
\begin{document}
\end{document}
"""
= numbered_newtheorems_counters_in_preamble(text)
counters
test_eq(
counters,'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'remark': ('remark', None), 'conj': ('subsection', None)}) {
numbered_newtheorems_counters_in_preamble
ignores commented out text:
= r"""
text \theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture} %\newtheorem{fakeenv}{This won't be picked up!}
\begin{document}
\end{document}
"""
= numbered_newtheorems_counters_in_preamble(text)
counters
test_eq(
counters,'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'conj': ('subsection', None)}) {
numbered_newtheorems_counters_in_preamble
does not account for \numberwithin
command invocations:
= text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
text print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers the theorem-like
# environemnts as being counted by 'equation'.
# Note that the command `\numberwithin{equation}{section}` resets the equation counter
# every time the `section` counter is incremented.
test_eq(numbered_newtheorems_counters_in_preamble(text), 'theorem': ('equation', None), 'proposition': ('equation', None), 'lemma': ('equation', None), 'corollary': ('equation', None), 'definition': ('equation', None), 'example': ('equation', None), 'remark': ('equation', None)}) {
\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}
\numberwithin{equation}{section}
\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}
\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}
\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}
\begin{document}
\section{Introduction}
\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem| be counted by the equation counter.
\end{theorem}
The following makes an equation labeled 1.2;
\begin{equation}
5 + 7 = 12
\end{equation}
\begin{theorem*}
This Theorem is unnumbered
\end{theorem*}
\begin{corollary}
This is Corollary 1.3.
\end{corollary}
\section{Another section}
\begin{theorem}
This is theorem 2.1
\end{theorem}
The following is labeled 2.2:
\begin{equation}
3+5 = 8.
\end{equation}
\end{document}
The \newtheorem
command can be used to specify the counter of the newly defined theorem-like environment to be reset upon another counter’s incrementation; for example \newtheorem{theorem}{Theorem}[section]
specifies for a new environment named theorem
(with display text Theorem
) that is reset whenever the section
counter is incremented.
# TODO: reimplement the numbered_newtheorems_counters_in_preamble function to
# account for this example.
= text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex')
text print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers the theorem-like
# environemnts as being counted by 'equation'.
# Note that the command `\numberwithin{equation}{section}` resets the equation counter
# every time the `section` counter is incremented.
test_eq(numbered_newtheorems_counters_in_preamble(text), 'lemma': ('theorem', None), 'theorem': ('theorem', 'section'), 'corollary': ('corollary', 'theorem'), 'proposition': ('proposition', 'section')}) {
% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas
\documentclass[12 pt]{amsart}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]
\begin{document}
\section{Introduction}
Theorems can easily be defined:
\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is
a continuous function.
\end{theorem}
\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next
equation
\[ x^2 + y^2 = z^2 \]
\end{theorem}
And a consequence of theorem \ref{pythagorean} is the statement in the next
corollary.
\begin{corollary}
There's no right rectangle whose sides measure 3cm, 4cm, and 6cm.
\end{corollary}
You can reference theorems such as \ref{pythagorean} when a label is assigned.
\begin{lemma}
Given two line segments whose lengths are \(a\) and \(b\) respectively there is a
real number \(r\) such that \(b=ra\).
\end{lemma}
\end{document}
numberwithins_in_preamble
numberwithins_in_preamble (document:str)
Return the dict describing numberwithin
commands invoked in the preamble of document
.
Type | Details | |
---|---|---|
document | str | The LaTeX document |
Returns | dict | The keys are the first arguments of numberwithin invocations and the values ar ethe second arguments of numberwithin invocations. |
= text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
text print(text)
'equation': 'section'}) test_eq(numberwithins_in_preamble(text), {
\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}
\numberwithin{equation}{section}
\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}
\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}
\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}
\begin{document}
\section{Introduction}
\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem| be counted by the equation counter.
\end{theorem}
The following makes an equation labeled 1.2;
\begin{equation}
5 + 7 = 12
\end{equation}
\begin{theorem*}
This Theorem is unnumbered
\end{theorem*}
\begin{corollary}
This is Corollary 1.3.
\end{corollary}
\section{Another section}
\begin{theorem}
This is theorem 2.1
\end{theorem}
The following is labeled 2.2:
\begin{equation}
3+5 = 8.
\end{equation}
\end{document}
Getting the display names of environment
For example, \newtheorem{theorem}{Theorem}
defines a theorem-like environment called theorem
whose display name is Theorem
.
display_names_of_environments
display_names_of_environments (document:str)
Return the dict specifying the display names for each theorem-like environment.
This function uses two separate regex patterns, one to detect the invocations of \newtheorem
in which the optional parameter is the second parameter and one to detect those in which the optional parameter is the third parameter.
Assumes that - invocations of the \newtheorem
command are exclusively in the preamble of the LaTeX document. - theorem-like environments are defined using the \newtheorem
command. - no environments of the same name are defined twice.
Type | Details | |
---|---|---|
document | str | The LaTeX document |
Returns | dict |
= text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex')
text print(text)
= display_names_of_environments(text)
display_names 'theorem': 'Theorem',
test_eq(display_names, {'lemma': 'Lemma',
'definition': 'Definition',
'corollary': 'Corollary',
'conjecture*': 'Conjecture',
'remark': 'Remark'})
\documentclass{article}
\usepackage{amsthm}
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter
\begin{document}
\section{Introduction}
\begin{theorem}
This is Theorem 1.
\end{theorem}
\begin{lemma}
This is Lemma 2.
\end{lemma}
\begin{definition}
This is Definition 3.
\end{definition}
\end{document}
= text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex')
text print(text)
= display_names_of_environments(text)
display_names
test_eq(display_names,'theorem': 'Theorem',
{'corollary': 'Corollary',
'lemma': 'Lemma',
'proposition': 'Proposition',})
% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas
\documentclass[12 pt]{amsart}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]
\begin{document}
\section{Introduction}
Theorems can easily be defined:
\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is
a continuous function.
\end{theorem}
\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next
equation
\[ x^2 + y^2 = z^2 \]
\end{theorem}
And a consequence of theorem \ref{pythagorean} is the statement in the next
corollary.
\begin{corollary}
There's no right rectangle whose sides measure 3cm, 4cm, and 6cm.
\end{corollary}
You can reference theorems such as \ref{pythagorean} when a label is assigned.
\begin{lemma}
Given two line segments whose lengths are \(a\) and \(b\) respectively there is a
real number \(r\) such that \(b=ra\).
\end{lemma}
\end{document}
Divide latex text into parts
= _setup_counters(
sample_counters 'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'remark': ('remark', None), 'conj': ('subsection', None)})
{assert 'remark' in sample_counters
'remark'], 0)
test_eq(sample_counters[assert 'thm' not in sample_counters # 'thm' is an environment name, but not a counter.
get_node_from_simple_text
get_node_from_simple_text (text:str)
Return the (first) LatexNode
object from a str.
= r"""\begin{thm}This is a theorem. \end{thm}"""
text = get_node_from_simple_text(text)
node assert isinstance(node, LatexEnvironmentNode)
'thm')
test_eq(node.environmentname,
= r"""\begin{thm}This is a theorem. \end{thm} \begin{proof} This is a proof. It is not captured by the `get_node_from_simple_text` function \end{proof}"""
text = get_node_from_simple_text(text)
node assert isinstance(node, LatexEnvironmentNode)
'thm') test_eq(node.environmentname,
= r"""\begin{thm}This is a theorem. \end{thm}"""
text = get_node_from_simple_text(text)
node # Test a theoreem being counted by its own counter.
= {'thm': ('thm', None)}
numbertheorem_counters = {}
numberwithins = {'thm': 1}
counters = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'1')
test_eq(sample_numbering, # Test a theorem being countered by the equation counter.
= {'thm': ('equation', None)}
numbertheorem_counters = {}
numberwithins = {'equation': 2}
counters = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'2')
test_eq(sample_numbering, # Test a theorem being countered by the equation counter.
= {'thm': ('equation', None)}
numbertheorem_counters = {}
numberwithins = {'equation': 2}
counters = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'2')
test_eq(sample_numbering,
= r"""\begin{corollary}This is a corollary. \end{orollary}"""
text = get_node_from_simple_text(text)
node # Test a theorem-like environment being counted by the counter of
# another theorem-like environment
= {'corollary': ('theorem', None), 'theorem': ('theorem', None)}
numbertheorem_counters = {}
numberwithins = {'theorem': 0}
counters = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'0')
test_eq(sample_numbering,
# Test a theorem-like environment whose counter is numbered within
# The section counter.
# First, see what happens when a theorem is called
= r"""\begin{theorem}This is a theorem. \end{theorem}"""
text = get_node_from_simple_text(text)
node = {'theorem': ('theorem', None)}
numbertheorem_counters = {'theorem': 'section'}
numberwithins = {'section': 1, 'theorem': 0}
counters = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'1.0')
test_eq(sample_numbering,
# Next, see what happens when a new section is invoked:
= r"""\section{New section! The theorem counter should be reset}"""
text = get_node_from_simple_text(text)
node = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'1')
test_eq(sample_numbering,
# Test a theorem-like environment sharing a counter with equation
# and in turn equation is numbered within section.
= r"""\begin{theorem}This is a theorem. \end{theorem}"""
text = get_node_from_simple_text(text)
node = {'theorem': ('equation', None)}
numbertheorem_counters = {'equation': 'section'}
numberwithins = {'section': 1, 'equation': 0}
counters = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'1.0')
test_eq(sample_numbering, # Next, see what happens when a new section is invoked:
= r"""\section{New section! The theorem counter should be reset}"""
text = get_node_from_simple_text(text)
node = _node_numbering(
sample_numbering
node, numbertheorem_counters, numberwithins, counters)'1') test_eq(sample_numbering,
swap_numbers_invoked
swap_numbers_invoked (preamble:str)
Returns True
if \swapnumbers
is in the preamble.
Assume that a mention of \swapnumbers
is an actual invocation.
Type | Details | |
---|---|---|
preamble | str | |
Returns | bool |
assert swap_numbers_invoked('\swapnumbers')
assert not swap_numbers_invoked(r'''
\documentclass{article}
\usepackage{amsthm}
\newtheorem{theorem}{Theorem} % \swapnumbers
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}
\newtheorem*{remark*}{Remark}''')
divide_latex_text
divide_latex_text (document:str, environments_to_not_divide_along:list[str]=['equation' , 'equation*', 'proof', 'align', 'align*', 'enumerate', 'itemize', 'label', 'eqnarray', 'quote', 'tabular', 'table'])
Divide LaTeX text to convert into Obsidian.md notes.
Assumes that the counters in the LaTeX document are either the predefined ones or specified by the \newtheorem
command.
This function does not divide out \subsubsection
’s.
Proof environments are assigned to the same parts their prcededing theorem-like environments, if available.
TODO: Implement counters specified by \newcounter
, cf. https://www.overleaf.com/learn/latex/Counters#LaTeX_commands_for_working_with_counters.
Type | Default | Details | |
---|---|---|---|
document | str | ||
environments_to_not_divide_along | list | [‘equation’, ‘equation’, ’proof’, ’align’, ’align’, ‘enumerate’, ‘itemize’, ‘label’, ‘eqnarray’, ‘quote’, ‘tabular’, ‘table’] | A list of the names of the environemts along which to not make a new note, unless the environment starts a section (or the entire document). |
Returns | list | Each tuple is of the form (note_title, text) , where note_title often encapsulates the note type (i.e. section/subsection/display text of a theorem-like environment) along with the numbering and text is the text of the part. Sometimes title is just a number, which means that text is not of a \section or \subsection command and not of a theorem-like environment. |
# TODO: explain examples
file = _test_directory() / 'latex_examples' / 'numbering_example_6' / 'main.tex'
= text_from_file(file)
sample_latex_text = divide_preamble(sample_latex_text)
preamble, document = divide_latex_text(sample_latex_text) parts
file = _test_directory() / 'latex_examples' / 'divide_latex_example_proof_preceded_by_theorem' / 'main.tex'
= text_from_file(file)
sample_latex_text = divide_latex_text(sample_latex_text)
parts print(parts)
len(parts), 2) test_eq(
[['1. Some section', '\\section{Some section}'], ['Theorem 1.', '\\begin{theorem}\nThis is a theorem.\n\\end{theorem}\\begin{proof}\nThis is a proof\n\\end{proof}']]
# sample_latex_file = Path(r'C:\Users\hyunj\Documents\Math\latex_image_data\latex_full\ellenberg_venkatesh_westerland_hshsclhff\main.tex')
# sample_latex_text = text_from_file(sample_latex_file)
# preamble, document = divide_preamble(sample_latex_text)
# parts = divide_latex_text(sample_latex_text)
file = _test_directory() / 'latex_examples' / 'divide_latex_example_text_preceded_by_undivided_environment' / 'main.tex'
= text_from_file(file)
sample_latex_text = divide_preamble(sample_latex_text)
preamble, document = divide_latex_text(document)
parts print(parts)
len(parts), 2) test_eq(
[['1. CONTENTS', '\\section{CONTENTS}'], ['1', "\\begin{enumerate}\n \\item Introduction 2\n\n \\item Preliminaries $\\quad 7$\n\n\\end{enumerate}\n2.1. Categorical preliminaries $\\quad 7$\n\n2.2. On the motivic Spanier-Whitehead category and Milnor-Witt K-theory 8\n\n2.3. $\\mathbb{A}^{1}$-derived category and $\\mathbb{A}^{1}$-homology 9\n\n\\begin{enumerate}\n \\setcounter{enumi}{3}\n \\item $\\mathbb{A}^{1}$-Spanier-Whitehead category of cellular smooth schemes 11\n\\end{enumerate}\n3.1. Cellular schemes 12\n\n3.2. Cellular Spanier-Whitehead category 13\n\n\\begin{enumerate}\n \\setcounter{enumi}{4}\n \\item The cellular homology of Morel-Sawant on cellular Thom spaces 14\n\n \\item Spanier-Whitehead cellular complex 18\n\n\\end{enumerate}\n5.1. Definitions and basic properties 18\n\n5.2. Endomorphisms, traces, and characteristic polynomials 19\n\n5.3. Cellular Grothendieck-Lefschetz Trace Formula 21\n\n\\begin{enumerate}\n \\setcounter{enumi}{6}\n \\item Rationality of the zeta function 22\n\n \\item $\\mathbb{A}^{1}$-logarithmic zeta functions and real points 24\n\n \\item Computing $\\mathbb{A}^{1}$-logarithmic zeta functions and examples 27\n\n\\end{enumerate}\n8.1. $\\mathbb{A}^{1}$-zeta function of Frobenius endomorphisms using Theorem $6.2 \\quad 28$\n\n8.2. $\\mathbb{A}^{1}$-logarithmic zeta functions via Hoyois's trace formula 31\n\n8.3. The logarithmic zeta function of non-cellular schemes 34\n\n\\begin{enumerate}\n \\setcounter{enumi}{9}\n \\item Motivic measures 34\n\\end{enumerate}\n9.1. The $\\mathbb{A}^{1}$-trace as a motivic measure 35\n\n9.2. The enriched zeta function as a motivic measure 35\n\nReferences 36\n\nDate: October 3, 2022."]]
file = _test_directory() / 'latex_examples' / 'divide_latex_example_2' / 'main.tex'
= text_from_file(file)
sample_latex_text = divide_preamble(sample_latex_text)
preamble, document = divide_latex_text(document)
parts print(parts)
[['1', '\\maketitle'], ['abstract', '\\begin{abstract}\nThis is an abstract\n\\end{abstract}']]
The divide_latex_text
function divides latex text
file = _test_directory() / 'latex_examples' / 'numbering_example_1_consecutive_numbering_scheme' / 'main.tex'
= text_from_file(file)
text = divide_latex_text(text)
sample_output print(sample_output)
assert sample_output[0][0] == '1'
assert sample_output[1][0] == '1. Introduction'
assert sample_output[2][0] == 'Theorem 1.'
assert sample_output[3][0] == 'Corollary 2.'
assert sample_output[4][0] == 'Remark'
[['1', 'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'], ['1. Introduction', '\\section{Introduction}'], ['Theorem 1.', '\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}'], ['Corollary 2.', '\\begin{corollary}\nThis is Corollary 2.\n\\end{corollary}'], ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'], ['Definition 3.', '\\begin{definition}\nThis is Definition 3.\n\\end{definition}'], ['2. Another Section', '\\section{Another Section}'], ['Theorem 4.', '\\begin{theorem}\nThis is Theorem 4.\n\\end{theorem}'], ['2', 'And we might get a corollary!'], ['Corollary 5.', '\\begin{corollary}\nThis is Corollary 5.\n\\end{corollary}'], ['Definition 6.', '\\begin{definition}\nThis is Definition 6.\n\\end{definition}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_2_numbering_scheme_reset_at_each_section' / 'main.tex'
= text_from_file(file)
text print(divide_latex_text(text))
[['1', 'This document resets its `theorem` counter whenever a new section begins.'], ['1. Introduction', '\\section{Introduction}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'], ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'], ['Definition 1.3.', '\\begin{definition}\nThis is Definition 1.3.\n\\end{definition}'], ['2. Another Section', '\\section{Another Section}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1.\n\\end{theorem}'], ['Corollary 2.2.', '\\begin{corollary}\nThis is Corollary 2.2.\n\\end{corollary}'], ['Definition 2.3.', '\\begin{definition}\nThis is Definition 2.3.\n\\end{definition}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex'
= text_from_file(file)
text print(divide_latex_text(text))
[['1. Introduction', '\\section{Introduction}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1. This is because the \\verb|\\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\\\\n\\verb|\\newtheorem{theorem}[equation]{Theorem}| command makes the environment \\verb|theorem| be counted by the equation counter.\n\\end{theorem}'], ['1', 'The following makes an equation labeled 1.2; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}'], ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.3.\n\\end{corollary}'], ['2. Another section', '\\section{Another section}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is theorem 2.1\n\\end{theorem}'], ['2', 'The following is labeled 2.2:\n\\begin{equation}\n3+5 = 8.\n\\end{equation}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_4_unnumbered_section' / 'main.tex'
= text_from_file(file)
text print(divide_latex_text(text))
[['1. This is section 1', '\\section{This is section 1}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'], ['1.1. This is a subsection 1.1', '\\subsection{This is a subsection 1.1}'], ['1', 'The following makes an equation labeled 1; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}'], ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'], ['1.2. This is subsection 1.2', '\\subsection{This is subsection 1.2}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'], ['Unnumbered section', '\\section*{Unnumbered section}'], ['1.1. This is subsection 1.3', '\\subsection{This is subsection 1.3}'], ['2', '\\subsubsection{This is subsubsection 1.3.1}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.3.\n\\end{theorem}'], ['2. Another section', '\\section{Another section}'], ['2.1. This is subsection 2.1', '\\subsection{This is subsection 2.1}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1\n\\end{theorem}'], ['3', 'The following is labeled 2; note that the equation counter was not reset by section:\n\\begin{equation}\n3+5 = 8.\n\\end{equation}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_5_subsections_and_theorem_like_environments_share_counter' / 'main.tex'
= text_from_file(file)
text = divide_latex_text(text)
sample_output print(divide_latex_text(text))
4][0], '1. Remark.')
test_eq(sample_output[5][0], 'Remark') test_eq(sample_output[
[['1. This is section 1', '\\section{This is section 1}'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.1. Theorem. Note that the \\verb|\\swapnumbers| command is invoked in the preamble.\n\\end{thm}'], ['1.2. This is 1.2. subsection.', '\\subsection{This is 1.2. subsection.}'], ['1', 'Note that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n\\subsubsection{This is 1.2.1. Subsubsection}'], ['1. Remark.', '\\begin{remark}\nThis is an 1. Remark. Note that \\verb|\\remark| has a counter separate from those of many of the other theorem-like environments.\n\\end{remark}'], ['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}'], ['1.3. Proposition.', '\\begin{prop}\nThis is 1.3. Proposition.\n\\end{prop}'], ['Unnumbered section', '\\section*{Unnumbered section}'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.4. Theorem.\n\\end{thm}'], ['2. This is Section 2', '\\section{This is Section 2}'], ['2.1. Theorem.', '\\begin{thm}\nThis is 2.1. Theorem\n\\end{thm}']]
Note that part titles are stripped and are single-lined:
# TODO: fill in the following example
# part = parts[...]
# assert part[0].strip() == part[0]
# TODO: example with a multilined section title forced to single-lined:
# e.g. `\section{Exceptional maximal subgroups of
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`
# TODO: Find a list of environment names commonly used.
# TODO: examples with different numbering convention and different numbered environments
# TODO: make numbering_convention work correctly.
# Here are some latex files with different conventions:
# - All subsections in a section share numbering,
# - achter_pries_imht https://arxiv.org/abs/math/0608038: e.g. Lemmas 2.1, 2.2, 2.3 are in subsection 2.2 and Lemma 2.4 and Remark 2.5 are in subsection 2.4.as_integer_ratio
# - pauli_wickelgren https://arxiv.org/abs/2010.09374: e.g. Example 3.5, 3.11 are in subsubsection 3.3.2, Exercise 4.1, Remark 4.2, are in subsection 4.1, Theorem 4.3 is in subsection 4.2, Theorem 4.4 is in subsection 4.3
# - Different environment types have different counts and the counts do not show the section number.
# - vankataramana_imbrd https://arxiv.org/abs/1205.6543:
# - e.g. section 1 has Theorem 1, Remark 1, Remark 2, Remark 3, subsection 1.1.3 has Remark 4, Subsection 2.2 has Definition 1
Identify sections and subsections to make folders for a reference.
section_and_subsection_titles_from_latex_parts
section_and_subsection_titles_from_latex_parts (parts:list[tuple[str,str ]])
Return a list of lists of titles for the sections and subsections in parts
Unnumbered sections get their own list. Unnumbered subsections are also included in lists. All the titles are striped (of leading and trailing whitespaces).
Type | Details | |
---|---|---|
parts | list | An output of divide_latex_text |
Returns | list | Each list corresponds to a section. The first entry of the list is the title of the section and the other entries are the titles of the subsections. |
In the following example, the Environments are numbered Theorem 1, Corollary 2, Definition 3, etc. Also note that there is some content before the very first (explicitly defined) section, so there is a section given by the UNTITLED_SECTION_TITLE
constant.
= [
parts '1', 'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'],
['1. Introduction', '\\section{Introduction}'],
['Theorem 1.', '\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}'],
['Corollary 2.', '\\begin{corollary}\nThis is Corollary 2.\n\\end{corollary}'],
['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
['Definition 3.', '\\begin{definition}\nThis is Definition 3.\n\\end{definition}'],
['2. Another Section', '\\section{Another Section}'],
['Theorem 4.', '\\begin{theorem}\nThis is Theorem 4.\n\\end{theorem}'], ['2', 'And we might get a corollary!'],
['Corollary 5.', '\\begin{corollary}\nThis is Corollary 5.\n\\end{corollary}'],
['Definition 6.', '\\begin{definition}\nThis is Definition 6.\n\\end{definition}']]
[= section_and_subsection_titles_from_latex_parts(parts)
sample_output '1. Introduction'], ['2. Another Section']]) test_eq(sample_output, [[UNTITLED_SECTION_TITLE], [
In contrast, the following example has environments numbered by sections:
= [
parts '1', 'This document resets its `theorem` counter whenever a new section begins.'],
['1. Introduction', '\\section{Introduction}'],
['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'],
['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'],
['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
['Definition 1.3.', '\\begin{definition}\nThis is Definition 1.3.\n\\end{definition}'],
['2. Another Section', '\\section{Another Section}'],
['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1.\n\\end{theorem}'],
['Corollary 2.2.', '\\begin{corollary}\nThis is Corollary 2.2.\n\\end{corollary}'],
['Definition 2.3.', '\\begin{definition}\nThis is Definition 2.3.\n\\end{definition}']]
['1. Introduction'], ['2. Another Section']]) test_eq(sample_output, [[UNTITLED_SECTION_TITLE], [
The below example is derived from a LaTeX document in which significant content is present before any particular sections. See the nbs\_tests\latex_examples\latex_example_with_content_before_sections
folder. Also see https://arxiv.org/abs/1111.3607 for an example of a paper with significant content priori to any explicitly defined sections.
= [
parts 'abstract', "\\begin{abstract}\nI'm an abstract\n\\end{abstract}"],
['1', '\\maketitle\n\nI want to talk about things but notice that this part does not belong to a section!'],
['Theorem 1.', "\\begin{theorem}\\label{th:some_theorem}\nI'm a theorem.\n\\end{theorem}"],
['2', 'Blah blah blah'],
['Theorem 2.', '\\begin{theorem}\\label{th:some_other_theorem}\nImpart me with mathematical knowledge!\n\\end{theorem}'],
['3', 'Maybe a corollary'],
['Corollary 3.', '\\begin{corollary}\\label{cor:a_corollary}\nI immediately follow from the above theorem.\n\\end{corollary}'],
['4', 'More stuff!'],
['Corollary 4.', '\\begin{corollary}\\label{cor:another_corollary}\nMore delicious mathematical knowledge.\n\\end{corollary}'],
['5', 'Maybe you could describe how we demonstrate this corollary.'],
['1. Proof of Theorem~\\ref{th:main}', '\\section{Proof of Theorem~\\ref{th:main}}'],
['6', 'Now this is finally in a section.'],
['Lemma 5.', '\\begin{lemma}\nSome lemma\n\\end{lemma}\\begin{proof}\nMaximum effort!\n\\end{proof}'],
['7', 'Blah blah blah.'],
['1.1. This is a subsection', '\\subsection{This is a subsection}'],
['8', "I'm about one thing."],
['1.2. This is another subsection', '\\subsection{This is another subsection}'],
['9', "I'm about another thing."]]
[= section_and_subsection_titles_from_latex_parts(parts)
sample_output '1. Proof of Theorem~\\ref{th:main}', '1.1. This is a subsection', '1.2. This is another subsection']]) test_eq(sample_output, [[UNTITLED_SECTION_TITLE], [
The below example is derived from a LaTeX document with a \numberwithin{equation}{subsection}
in which the theorem-like environments are numbered with the equation
counter. In particular, theorem-like environments and subsections are counted together.
Also, note that the below example starts with an explicitly defined section, so there is no section given by the UNTITLED_SECTION_TITLE
constant.
= [
parts '1. This is section 1', '\\section{This is section 1}'],
['1.1. Theorem.', '\\begin{thm}\nThis is 1.1. Theorem. Note that the \\verb|\\swapnumbers| command is invoked in the preamble.\n\\end{thm}'],
['1.2. This is 1.2. subsection.', '\\subsection{This is 1.2. subsection.}'],
['1', 'Note that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n\\subsubsection{This is 1.2.1. Subsubsection}'],
['1. Remark.', '\\begin{remark}\nThis is an 1. Remark. Note that \\verb|\\remark| has a counter separate from those of many of the other theorem-like environments.\n\\end{remark}'],
['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}'],
['1.3. Proposition.', '\\begin{prop}\nThis is 1.3. Proposition.\n\\end{prop}'],
['Unnumbered section', '\\section*{Unnumbered section}'],
['1.1. Theorem.', '\\begin{thm}\nThis is 1.4. Theorem.\n\\end{thm}'],
['2. This is Section 2', '\\section{This is Section 2}'],
['2.1. Theorem.', '\\begin{thm}\nThis is 2.1. Theorem\n\\end{thm}']]
[= section_and_subsection_titles_from_latex_parts(parts)
sample_output '1. This is section 1', '1.2. This is 1.2. subsection.'], ['Unnumbered section'], ['2. This is Section 2']]) test_eq(sample_output, [[
The titles of the sections are stripped of their leading and trailing whitespaces (if available)
# The below example makes sure that titles are stripped
= [
parts ' 1. Section with an unnumbered subsection ', '\\section{Section with an unnumbered subsection}'],
['1', 'This is a section with an unnumbered subsection'],
['1.1. ', '\\subsection{}']
[
]= section_and_subsection_titles_from_latex_parts(parts)
sample_output '1. Section with an unnumbered subsection', '1.1.']]) test_eq(sample_output, [[
file = _test_directory() / 'latex_examples' / 'latex_example_with_plenty_of_sections_and_subsections' / 'main.tex'
= text_from_file(file)
text = divide_latex_text(text)
parts print(parts)
= section_and_subsection_titles_from_latex_parts(parts)
sample_output
test_eq(sample_output,'1. This is section 1', '1.1. This is section 1.1', '1.2. This is section 1.2'],
[['2. This is section 2'],
['3. This is section 3', '3.1. This is section 3.1', '3.2. This is section 3.2', '3.3. This is section 3.3', '3.4. This is section 3.4']]) [
[['1. This is section 1', '\\section{This is section 1}'], ['1.1. This is section 1.1', '\\subsection{This is section 1.1}'], ['1.2. This is section 1.2', '\\subsection{This is section 1.2}'], ['2. This is section 2', '\\section{This is section 2}'], ['3. This is section 3', '\\section{This is section 3}'], ['3.1. This is section 3.1', '\\subsection{This is section 3.1}'], ['3.2. This is section 3.2', '\\subsection{This is section 3.2}'], ['3.3. This is section 3.3', '\\subsection{This is section 3.3}'], ['3.4. This is section 3.4', '\\subsection{This is section 3.4}']]
Formatting modifications
Identify macros and commands to replace
Authors usually define a lot of custom commands and macros in their LaTeX files. Such customizations vary from author to author and most customized commands are not recognized by Obsidian.
See nbs/_tests/latex_examples/commands_example/main.tex
for some examples of custom commands.
custom_commands
custom_commands (preamble:str)
Return a dict mapping commands (and math operators) defined in preamble
to the number of arguments display text of the commands.
Assumes that the newcommands only have at most one default parameter (newcommands with multiple default parameters are not valid in LaTeX).
Ignores all comented newcommands.
Type | Details | |
---|---|---|
preamble | str | The preamble of a LaTeX document. |
Returns | list | Each tuple consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command. |
# Basic
= r'\newcommand{\con}{\mathcal{C}}'
text_1 'con', 0, None, r'\mathcal{C}')])
test_eq(custom_commands(text_1), [(
# With a parameter
= r'\newcommand{\field}[1]{\mathbb{#1}}'
text_2 'field', 1, None, r'\mathbb{#1}')])
test_eq(custom_commands(text_2), [(
# With multiple parameters, the first of which has a default value of `2`
= r'\newcommand{\plusbinomial}[3][2]{(#2 + #3)^#1}'
text_3 'plusbinomial', 3, '2', r'(#2 + #3)^#1')])
test_eq(custom_commands(text_3), [(
# The display text has backslashes `\` and curly brances `{}``
= r'\newcommand{\beq}{\begin{displaymath}}'
text_4 'beq', 0, None, '\\begin{displaymath}')])
test_eq(custom_commands(text_4), [(
# Basic with spaces in the newcommand declaration
= r'\newcommand {\con} {\mathcal{C}}'
text_6 'con', 0, None, r'\mathcal{C}')])
test_eq(custom_commands(text_6), [(
# With a parameter and spaces in the newcommand declaration
= r'\newcommand {\field} [1] {\mathbb{#1}}'
text_7 'field', 1, None, r'\mathbb{#1}')])
test_eq(custom_commands(text_7), [(
# With multiple parameters, a default value, and spaces in the newcommand declaration
= r'\newcommand {\plusbinomial} [3] [2] {(#2 + #3)^#1}'
text_8 'plusbinomial', 3, '2', r'(#2 + #3)^#1')])
test_eq(custom_commands(text_8), [(
# With a comment `%'; commented out command declarations should not be detected.
= r'% \newcommand{\con}{\mathcal{C}}'
text_9
test_eq(custom_commands(text_9), [])
# Spanning multiple lines
= r'''\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
text_10 #3 & #4\end{array}\right]}'''
test_eq(
custom_commands(text_10),'mat', 4, None,
[('\\left[\\begin{array}{cc}#1 & #2 \\\\\n #3 & #4\\end{array}\\right]')])
# Math operator
= r'\DeclareMathOperator{\Hom}{Hom}'
text_11 'Hom', 0, None, 'Hom')])
test_eq(custom_commands(text_11), [(
= r'\DeclareMathOperator{\tConf}{\widetilde{Conf}}'
text_12 'tConf', 0, None, r'\widetilde{Conf}')])
test_eq(custom_commands(text_12), [(
# TODO:
# `\def` commands
# \def is a bit complicated because arguments can either be provided with []
# or can be provided with {}.
# text_13 = r'\def\A{{\cO_{K}}}'
# newcommand and renewcommand don't require {} for the
# command name, cf. https://arxiv.org/abs/1703.05365
= r'\newcommand\A{{\mathbb A}}'
text_14 'A', 0, None, r'{\mathbb A}')]) test_eq(custom_commands(text_14), [(
# TODO: use a regexp pattern like this one to extract balanced curly braces
# \\mat\{((?>[^{}]+|\{(?1)\})*)\}\{((?>[^{}]+|\{(?2)\})*)\}
regex_pattern_detecting_command
regex_pattern_detecting_command (command_tuple:tuple[str,int,typing.Opti onal[str],str])
Return a regex.pattern
object (not a re.pattern
object) detecting the command with the specified number of parameters, optional argument, and display text.
Assumes that the curly braces used to write the invocations of the commands are balanced and properly nested. Assumes that there are no two commands of the same name.
Type | Details | |
---|---|---|
command_tuple | tuple | Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command. |
Returns | Pattern |
# Basic
= regex_pattern_detecting_command(('Sur', 0, None, r'\mathrm{Sur}'))
pattern = r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ...'
text = pattern.search(text)
match = match.span()
start, end r'\Sur')
test_eq(text[start:end],
# One parameter
= regex_pattern_detecting_command(('field', 1, None, r'\mathbb{#1}'))
pattern = r'\field{Q}'
text # print(pattern.pattern)
= pattern.search(text)
match = match.span()
start, end
test_eq(text[start:end], text)
# Multiple parameters
= regex_pattern_detecting_command(('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]'))
pattern = r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
text = pattern.search(text)
match = match.span()
start, end
test_eq(text[start:end], text)1), r'{123}')
test_eq(match.group(
# Multiple parameters, one of which is optional parameter
= regex_pattern_detecting_command(('plusbinomial', 3, '2', r'(#2 + #3)^#1'))
pattern # When the optional parameter is used
= r'\plusbinomial{x}{y}'
text = pattern.search(text)
match = match.span()
start, end
test_eq(text[start:end], text)
# When the optional parameter is not used
= r'\plusbinomial[4]{x}{y}'
text = pattern.search(text)
match = match.span()
start, end
test_eq(text[start:end], text)
# One parameter that is optional.
= regex_pattern_detecting_command(('greet', 1, 'world', r'Hello #1!'))
pattern # When the optional parameter is used
= r'\greet'
text = pattern.search(text)
match = match.span()
start, end
test_eq(text[start:end], text)
# When the optional parameter is not used
= r'\greet[govna]'
text = pattern.search(text)
match = match.span()
start, end
test_eq(text[start:end], text)
# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should detected as invocations of `\del``
= (r'del', 0, None, r'\delta')
command_tuple = regex_pattern_detecting_command(command_tuple)
pattern = r'\del should be detected.'
text = pattern.search(text)
match = match.span()
start, end r'\del')
test_eq(text[start:end], = r'\delta should not be detected.'
text = pattern.search(text)
match assert match is None
# test_eq(replace_command_in_text(text, command_tuple), r'\delta should be replaced. \delta should not.')
replace_command_in_text
replace_command_in_text (text:str, command_tuple:tuple[str,int,typing.Optional[str] ,str])
Replaces all invocations of the specified command in text
with the display text with the arguments used in the display text.
Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.
Type | Details | |
---|---|---|
text | str | |
command_tuple | tuple | Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command. |
# Basic
= ('Sur', 0, None, r'\mathrm{Sur}')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\Sur$ is nonempty.'
text 'The number of element of $\mathrm{Sur}(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\mathrm{Sur}$ is nonempty.')
test_eq(replace_command_in_text(text, command_tuple),
# One parameter
= ('field', 1, None, r'\mathbb{#1}')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'$\field{Q}$ is the field of rational numbers. $\field{C}$ is the field of complex numbers'
text '$\mathbb{Q}$ is the field of rational numbers. $\mathbb{C}$ is the field of complex numbers')
test_eq(replace_command_in_text(text, command_tuple),
# Multiple parameters
= ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
text r'\left[\begin{array}{cc}{123} & asdfasdf{}{} \\ {{}} & {asdf}{asdf}{}\end{array}\right]')
test_eq(replace_command_in_text(text, command_tuple),
# Multiple parameters, one of which is optional parameter
= ('plusbinomial', 3, '2', r'(#2 + #3)^#1')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
= r'\plusbinomial{x}{y}'
text r'(x + y)^2')
test_eq(replace_command_in_text(text, command_tuple),
# When the optional parameter is not used
= r'\plusbinomial[4]{x}{y}'
text r'(x + y)^4')
test_eq(replace_command_in_text(text, command_tuple),
# One parameter that is optional.
= ('greet', 1, 'world', r'Hello #1!')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
= r'\greet'
text r'Hello world!')
test_eq(replace_command_in_text(text, command_tuple),
# When the optional parameter is not used
= r'\greet[govna]'
text r'Hello govna!')
test_eq(replace_command_in_text(text, command_tuple),
# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should NOT be replaced into `\deltata` should NOT be replaced into `\deltata`
= (r'del', 0, None, r'\delta')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'\del should be replaced. \delta should not.'
text r'\delta should be replaced. \delta should not.') test_eq(replace_command_in_text(text, command_tuple),
replace_commands_in_text
replace_commands_in_text (text:str, command_tuples:tuple[str,int,typing.Optional[st r],str])
Replaces all invocations of the specified commands in text
with the display text with the arguments used in the display text.
Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.
Type | Details | |
---|---|---|
text | str | The text in which to replace the commands. This should not include the preamble of a latex document. |
command_tuples | tuple | An output of custom_commands . Each tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command. |
Returns | str |
= r'''Here is a matrix over $\field{Q}$: $\mat{1/2}{2}{-1}{5/7}$.
text Note that it is not over $\field{F}_7$ and not over $\field{F}_2$.'''
= [
command_tuples 'field', 1, None, r'\mathbb{#1}'),
('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')]
(
= replace_commands_in_text(text, command_tuples)
sample_output
test_eq(sample_output, r'''Here is a matrix over $\mathbb{Q}$: $\left[\begin{array}{cc}1/2 & 2 \\ -1 & 5/7\end{array}\right]$.
Note that it is not over $\mathbb{F}_7$ and not over $\mathbb{F}_2$.''')
replace_commands_in_latex_document
replace_commands_in_latex_document (docment:str)
Return the latex document (without the preamble) with invocations of custom commands/operators replaced with their display text.
Assumes that all custom commands and operators are defined in the preamble.
Assumes that, if commands with the same name are defined multiple times, only the finally defined command is used.
Even replaces these invocations incommented out text.
file = _test_directory() / 'latex_examples' / 'commands_recursive_example' / 'main.tex'
= text_from_file(file)
document = replace_commands_in_latex_document(document)
commands_replaced assert commands_replaced.startswith(r'\begin{document}')
assert commands_replaced.endswith(r'\end{document}')
assert r'\S' not in commands_replaced
assert r'\mathbb{S}1' in commands_replaced # Note that $\S$ is defined twice in the preamble; only the latter definition is used.
assert r'\field{Q}$' not in commands_replaced
assert r'\mathbb{Q}$' in commands_replaced
assert r'\commentedout' not in commands_replaced
assert r'This is actually a command that is commented out, but it is also replaced!' in commands_replaced
print(commands_replaced)
\begin{document}
$\mathbb{S}1$
%$\mathbf{Q}$
%$\mathbf{Q}$
%This is actually a command that is commented out, but it is also replaced!
$\mathbb{Q}$
\end{document}
Replace commonly used syntax
Obsidian does not compile all LaTeX syntax. For example
\( \)
and\[ \]
are not recognized as math mode delimiters.\begin{equation} \end{equation}
and\begin{align} \end{align}
(as well as their unnumbered versions with*
) require surrounding$$
.
adjust_common_syntax_to_markdown
adjust_common_syntax_to_markdown (text)
Adjust some common syntax, such as math mode delimiters and equation/align environments, for Markdown.
Assumes that the tokens for math mode delimiters (e.g. \( \)
and \[ \]
) are not used otherwise.
= r'''
text I want to talk about \(\mathbb{Z}[i]\). It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
It has a multiplication structure:
\[ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.\]
Here is an equation:
\begin{equation}
5+7 = 12
\end{equation}
Here is another:
\begin{equation*}
5+6 = 11
\end{equation*}
Here is an align:
\begin{align}
5+7 = 12
\end{align}
Here is another:
\begin{align*}
5+6 = 11
\end{align*}
\begin{eqnarray}
asdf
\end{eqnarray}
'''
= adjust_common_syntax_to_markdown(text)
sample_output print(sample_output)
assert r'\(' not in sample_output
assert r'\)' not in sample_output
assert r'\[' not in sample_output
assert r'\]' not in sample_output
assert r'$$\begin{align}' in sample_output
assert r'\end{align}$$' in sample_output
assert r'$$\begin{equation}' in sample_output
assert r'\end{equation}$$' in sample_output
assert r'$$\begin{align*}' in sample_output
assert r'\end{align*}$$' in sample_output
assert r'$$\begin{equation*}' in sample_output
assert r'\end{equation*}$$' in sample_output
assert r'$$\begin{eqnarray}' in sample_output
assert r'\end{eqnarray}$$' in sample_output
I want to talk about $\mathbb{Z}[i]$. It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
It has a multiplication structure:
$$ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.$$
Here is an equation:
$$\begin{equation}
5+7 = 12
\end{equation}$$
Here is another:
$$\begin{equation*}
5+6 = 11
\end{equation*}$$
Here is an align:
$$\begin{align}
5+7 = 12
\end{align}$$
Here is another:
$$\begin{align*}
5+6 = 11
\end{align*}$$
$$\begin{eqnarray}
asdf
\end{eqnarray}$$
Setup an Obsidian vault reference
setup_reference_from_latex_parts
setup_reference_from_latex_parts (parts:list[tuple[str,str]], custom_commands:list[tuple[str,int,typi ng.Optional[str],str]], vault:os.PathLike, location:os.PathLike, reference_name:os.PathLike, authors:Union[str,list[str]], author_fo lder:os.PathLike='_mathematicians', ref erences_folder:os.PathLike='_references ', templates_folder:os.PathLike='_templ ates', template_file_name:str='_templat e_common', notation_index_template_file _name:str='_template_notation_index', g lossary_template_file_name:str='_templa te_glossary', setup_temp_folder:bool=True, make_secon d_template_file_in_reference_directory: bool=True, copy_obsidian_configs:Option al[os.PathLike]='.obsidian', overwrite:Optional[str]=None, confirm_overwrite:bool=True, verbose:bool=False, replace_custom_commands:bool=True, adju st_common_latex_syntax_to_markdown:bool =True)
Set up a reference folder in vault
using an output of divide_latex_text
, create notes from parts
, and link notes in index files in the reference folder.
Assumes that
parts
is derived from a LaTeX document in which- all of the text belongs to sections.
- all of the sections/subsections are uniquely named
- The template file is has a section
# Topic
- The last line of the template file is a footnote indicating where the note comes from.
- There is at most one reference folder in the vault whose name is given by
reference_name
.
parts
itself is not modified, even if replace_custom_commands
and/or adjust_common_latex_syntax_to_markdown
are set to True
.
cf. setup_folder_for_new_reference
for how the reference folder is set up..
The names for the subfolders of the reference folder are the section titles, except for sections with common titles such as Introduction
, Notations
, Conventions
, Preliminaries
, and Notations and Conventions
. This ensures that the index file names for sections in different reference folders do not have the same name.
Text/parts that precede explicitly given sections are included in the first section’s folder and are linked in the first section’s index file.
Type | Default | Details | |
---|---|---|---|
parts | list | Output of divide_latex_text |
|
custom_commands | list | Output of custom_commands applied to the preamble of the LaTeX ddocument.| | vault | PathLike | | An Obsidian.md vault, | | location | PathLike | | The path to make the new reference folder. Relative to vault. | | reference_name | PathLike | | The name of the new reference. | | authors | typing.Union[str, list[str]] | | Each str is the family name of each author. | | author_folder | PathLike | _mathematicians | The directory where the author files are stored in. Relative to vault. | | references_folder | PathLike | _references | The directory where the references files are stored in. Relative to vault. | | templates_folder | PathLike | _templates | The directory where the template files are stored in. Relative to vault. | | template_file_name | str | _template_common | The template file from which to base the template file of the new reference. | | notation_index_template_file_name | str | _template_notation_index | The template file from which to base the notation index file of the new reference. | | glossary_template_file_name | str | _template_glossary | The template file from which to base the glossary file of the new reference. | | setup_temp_folder | bool | True | If True, creates a _tempfolder with an index file. This folder serves to house notes auto-created from LaTeX text files before moving them to their correct directories. Defaults to True. | | make_second_template_file_in_reference_directory | bool | True | If True, creates a copy of the template note within the directory for the reference. | | copy_obsidian_configs | typing.Optional[os.PathLike] | .obsidian | The folder relative to vaultfrom which to copy obsidian configs. If None, then no obsidian configs are copied to the reference folder. Defaults to .obsidian. | | overwrite | typing.Optional[str] | None | Specifies if and how to overwrite the reference folder if it already exists. - If ‘w’, then deletes the contents of the existing reference folder, as well as the template and reference file before setting up the reference folder before creating the new reference folder. - If ‘a’, then overwrites the contents of the reference folder, but does not remove existing files/folders. - If None, then does not modify the existing reference folder and raises a FileExistsError. | | confirm_overwrite | bool | True | Specifies whether or not to confirm the deletion of the reference folder if it already exists and if overwriteis ‘w’. Defaults to True. | | verbose | bool | False | | | replace_custom_commands | bool | True | If True, replace the custom commands in the text of partswhen making the notes. | | adjust_common_latex_syntax_to_markdown | bool | True | If True, apply [ adjust_common_syntax_to_markdown](https://hyunjongkimmath.github.io/trouver/latex.convert.html#adjust_common_syntax_to_markdown) to the text in partswhen making the notes. |
|
Returns | None |
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
= Path(temp_dir) / 'test_vault_5'
temp_vault / 'test_vault_5', temp_vault)
shutil.copytree(_test_directory()
= _test_directory() / 'latex_examples' / 'latex_example_with_untitled_subsections_setup_to_a_vault' / 'main.tex'
sample_latex_file = text_from_file(sample_latex_file)
sample_latex_text = divide_preamble(sample_latex_text)
preamble, _ = divide_latex_text(sample_latex_text)
parts = custom_commands(preamble)
cust_comms
setup_reference_from_latex_parts(''),
parts, cust_comms, temp_vault, Path('test_ref',
'Kim', 'Kim', 'Kim'])
[# os.startfile(temp_vault)
# input()
# TODO: give an example for a LaTeX document with a multiline section
# TODO: give an example for a LaTeX document with a section that must be sanitized first, e.g.
# in banwait_et_al_cnpgrg2c, there is a section of the string
# `\section{Exceptional maximal subgroups of
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`
The following example demonstrates setting up a reference folder from a latex document with significant content before any explicitly specified sections. In particular, the reference folder contains a subfolder dedicated to the content that comes before the explicitly specified sections.
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
= Path(temp_dir) / 'test_vault_5'
temp_vault / 'test_vault_5', temp_vault)
shutil.copytree(_test_directory()
= _test_directory() / 'latex_examples' / 'latex_example_with_content_before_sections' / 'main.tex'
sample_latex_file = text_from_file(sample_latex_file)
sample_latex_text = divide_preamble(sample_latex_text)
preamble, _ = divide_latex_text(sample_latex_text)
parts = custom_commands(preamble)
cust_comms
setup_reference_from_latex_parts(''),
parts, cust_comms, temp_vault, Path('test_ref',
'Kim', 'Kim', 'Kim'])
[= temp_vault / 'test_ref'
reference_folder
= list(reference_folder.glob('**'))
subdirectories = [
relative_subdirectories
os.path.relpath(subdirectory, reference_folder)for subdirectory in subdirectories]
print("The following are the subdirectories of `reference_folder` (relative to `temp_vault`):")
print(relative_subdirectories)
assert convert_title_to_folder_name(f'test_ref {UNTITLED_SECTION_TITLE}') in relative_subdirectories
# os.startfile(temp_vault)
# input()
The following are the subdirectories of `reference_folder` (relative to `temp_vault`):
['.', '.obsidian', '.obsidian\\plugins', '.obsidian\\plugins\\fast-link-edit', '1_proof_of_theorem~refthmain', '1_proof_of_theorem~refthmain\\11_this_is_a_subsection', '1_proof_of_theorem~refthmain\\12_this_is_another_subsection', 'test_ref_untitled_section', '_temp']