latex.convert

Convert LaTeX files into Obsidian.md notes

This module contains functions and methods to automatically make Obsidian notes from LaTeX files of mathematical papers, most notably those on arXiv.

See the Potential Problems section below for some common errors that arise from this module and how to circumvent them.

import glob
import shutil
import tempfile


from fastcore.test import ExceptionExpected, test_eq
from pathvalidate import validate_filename
from trouver.helper import _test_directory# , non_utf8_chars_in_file

Potential problems

The following are some frequently problems that arise when using this module:

UnicodeDecodeErrors arise when reading LaTeX files

By default, the text_from_file method in trouver.helper reads files and attempts to decode them in utf-8. If a LaTeX file has characters that cannot be decoded into utf-8, then a UnicodeDecodeError may be raised. In this case, one can find identify these characters using the trouver.helper.non_utf8_chars_in_file method and modify the LaTeX file manually. It may be useful to use a text editor to jump to the positions that the characters are at and to change the encoding of the LaTeX file into utf-8; for example, the author of trouver has opened some ANSI-encoded LaTeX documents in Notepad++ and converted their encoding into UTF-8.

NoDocumentNodeErrors arise even though the LaTeX file has a document environemt (i.e. \begin{document}...\end{document})

The find_document_node method in this module sometimes is not able to detect the docment environment of a LaTeX file. This error is known to arise when - there are macros (which include commands) defined that represents/expands to characters including \begin{...}... \end{...}. For example

# TODO in the above explanation, include an example.

LaTeX comments


source

remove_comments

 remove_comments (text:str)
text = r"""% Commands with parameters
\newcommand{\field}[1]{\mathbb{#1}}
\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
                                         #3 & #4\end{array}\right]}
\newcommand{\dual}[1]{#1^{\vee}}
\newcommand{\compl}[1]{\hat{#1}}
"""
assert '%' not in remove_comments(text)
print(remove_comments(text))

text = r"""Hi. I'm not commented. %But I am!"""
test_eq(remove_comments(text), "Hi. I'm not commented. ")

\newcommand{\field}[1]{\mathbb{#1}}
\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
                                         #3 & #4\end{array}\right]}
\newcommand{\dual}[1]{#1^{\vee}}
\newcommand{\compl}[1]{\hat{#1}}

Divide LaTeX file into parts

To make Obsidian notes from a LaTeX file, I use sections/subsections, and environments as places to make new notes.

Things to think about: Sections/subsections environments, including theorems, corollaries, propositions, lemmas, definitions, notations citations Macros defined in the preamble?

LatexMacroNodes include: sections/subsections, citations, references, and labels, e.g.

> \section{Introduction}
\cite{ellenberg2nilpotent}
\subsection{The section conjecture}
\'e
\ref{fundamental-exact-sequence}
\cite{stix2010period}
\ref{fundamental-exact-sequence}
\cite{stix2012rational}
\cite[Appendix C]{stix2010period}
\subsection{The tropical section conjecture}
\label{subsec:tropical-section-conjecture}

Divide the preamble from the rest of the document

Some macros and commands defined in the preamble seem to prevent the pylatexenc methods from properly identifying the document environment/node in a LaTeX document. To circumvent this, we define a function to divide the preamble from the rest of the document


source

divide_preamble

 divide_preamble (text:str, document_environment_name:str='document')

Divide the preamble from the rest of a LaTeX document.

Type Default Details
text str LaTeX document
document_environment_name str document
Returns tuple
latex_file_path = _test_directory() / 'latex_examples' / 'example_with_a_command_with_begin.tex'
text = text_from_file(latex_file_path)

preamble, document = divide_preamble(text)
assert r'\begin{displaymath}' in preamble
assert r'Hyun Jong Kim' in preamble

assert r'Hyun Jong Kim' not in document
assert document.startswith(r'\begin{document}')
assert document.endswith('\\end{document}')

Get the Document Node


source

NoDocumentNodeError

 NoDocumentNodeError (text)

Exception raised when a LatexEnvironmentNode corresponding to the document environment is expected in a LaTeX string, but no such node exists.

Attributes - text - str - The text in which the document environment is not found.


source

find_document_node

 find_document_node (text:str, document_environment_name:str='document')

Find the LatexNode object for the main document in text.

Raises - NoDocumentNodeError - If document environment node is not detected.

Type Default Details
text str LaTeX str
document_environment_name str document The name of the document environment.
Returns LatexEnvironmentNode

The main content of virtually all LaTeX math articles belongs to a document environment, which pylatexenc can often detect. The find_document_node function returns this LatexEnvironmentNode object:

latex_file_path = _test_directory() / 'latex_examples' / 'latex_example_1' / 'main.tex'
text = text_from_file(latex_file_path)
document_node = find_document_node(text)

If the LaTeX file has no document environment, then a NoDocumentNodeError is raised:

# This latex document has its `document` environment commented out.
latex_file_path = _test_directory() / 'latex_examples' / 'latex_example_2' / 'main.tex'
text = text_from_file(latex_file_path)
with ExceptionExpected(NoDocumentNodeError):
    document_node = find_document_node(text)

At the time of this writinga NoDocumentNodeError may be raised even if the LaTeX file has a proper document environment

latex_file_path = _test_directory() / 'latex_examples' / 'example_with_a_command_with_begin.tex'
text = text_from_file(latex_file_path)

# Perhaps in the future, pylatexenc will be able to find the document node for this file.
# When that time comes, delete this example.
with ExceptionExpected(NoDocumentNodeError):
    find_document_node(text)

The divide_preamble function can be used to circumvent this problem:

preamble, document = divide_preamble(text)
document_node = find_document_node(document)
test_eq(document_node.environmentname, 'document')
assert document_node.isNodeType(LatexEnvironmentNode)
# hide
# Find no document node error causes

# latex_file_path = r'_tests\latex_full\litt_cfag\main.tex'
# text = text_from_file(latex_file_path)
# document_node = find_document_node(text)

Detect environment names used in a file


source

environment_names_used

 environment_names_used (text:str)

Return the set of all environment names used in the main document of the latex code.

Type Details
text str LaTeX document
Returns set The set of all environment names used in the main document.

Writers often use different environment names. For examples, writers often use theorem, thm, or theo for theorem environments or lemma or lem for lemma environments. The environment_names_used function returns the environment names actually used in the tex file.

In the example below, note that only the environments that are actually used are returned. For instance, the preamble of the document defines the theorem environments problem, and lemma (among other things), but these are not actually used in the document itself.

latex_file_path = _test_directory() / 'latex_examples' / 'has_fully_written_out_environment_names.tex'
sample_text_1 = text_from_file(latex_file_path)
sample_output_1 = environment_names_used(sample_text_1)
test_eq({'corollary', 'proof', 'maincorollary', 'abstract', 'proposition'}, sample_output_1)

The document in the example below uses shorter names for theorem environments:

latex_file_path = _test_directory() / 'latex_examples' / 'has_shorter_environment_names.tex'
sample_text_2 = text_from_file(latex_file_path)
sample_output_2 = environment_names_used(sample_text_2)
test_eq({'conj', 'notation', 'corollary', 'defn'}, sample_output_2)

Identify the numbering convention of a LaTeX document

LaTeX documents have various number conventions. Here are some examples of papers on the arXiv and notes on their numbering schemes. Note that the source code to these articles are publicly available on the arXiv.

  • Ellenberg, Venkatesh, and Westerland, Homological stability for Hurwitz spaces and the Cohen-Lenstra conjecture over function fields,
    • The subsections and theorem-like environments of each section share a numbering scheme, e.g. section 1 has subsection 1.1 The Cohen-Lenstra heuristics, 1.2 Theorem, 1.3 Hurwitz spaces. This is accomplished by defining theorem-like environments using the subsection counter, e.g.

      \theoremstyle{plain}
      \newtheorem{thm}[subsection]{Theorem}
      \newtheorem{prop}[subsection]{Proposition}
      \newtheorem{cor}[subsection]{Corollary}
      \newtheorem{remark}{Remark}
      \newtheorem{conj}[subsection]{Conjecture}
      \newtheorem*{conj*}{Conjecture}

      defines the thm, prop, cor, and conj environments to be numbered using the subsection counter, the remark environmment to be defiend as an unnumbered environment, and the conj* environment to be defined as an unnumbered environment with a different name than the conj environment.

    • The \swapnumbers command is included in the preamble to change the way that theorems are numbered in the document, e.g. the article has 1.2 Theorem as opposed to Theorem 1.2.

    • The equations are numbered along the subsections - this is accomplished by the lines

      \numberwithin{equation}{subsection}
      \renewcommand{\theequation}{\thesubsection.\arabic{equation}}

      in the preamble.

  • Hoyois, A quadratic refinement of the Grothendieck-Lefschetz-Verdier Trace Formula
    • The theorem-like environments are numbered Theorem 1.1, Theorem 1.3, Corollary 1.4, Theorem 1.5, etc.
      • The theorem-like environments that are numbered are assigned the equation counter. In particular, the equation environments share their numberings with the theorem-like environments. For example, section 1 has Equation (1.2)
      • This equation counter is reset at the beginning of each section and the section number is included in the numbering via
      \numberwithin{equation}{section}
# TODO: consider different arxiv articles to see how they are numbered

source

numbered_newtheorems_counters_in_preamble

 numbered_newtheorems_counters_in_preamble (document:str)

Return the dict specifying the numbered \newtheorem command invocations

Assumes that

  • invocations of the \newtheorem command are exclusively in the preamble of the LaTeX document.
  • theorem-like environments are defined using the \newtheorem command.
  • no environments of the same name are defined twice.

This function does not take into account numberwithins being used.

This function uses two separate regex patterns, one to detect the invocations of \newtheorem in which the optional parameter is the second parameter and one to detect those in which the optional parameter is the third parameter.

Type Details
document str The LaTeX document
Returns dict The keys are the command names of the environments. The value a key is a tuple (<counter>, <reset_by_counter>), where <counter>`` is the counter that the environment belongs to, which can be custom defined or predefined in LaTeX, and` is a counter whose incrementation resets the # counter of the environment, if available.

The numbered_newtheorems_counter_in_preamble function parses the preamble of a LaTeX document for invocations of the newtheorem command and returns what counters each theorem-like environment command belongs to.

text = text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex') 
print(text)

counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(counters,
    {'theorem': ('theorem', None), 'lemma': ('theorem', None), 'definition': ('theorem', None), 'corollary': ('corollary', None), 'remark': ('remark', 'theorem')})
\documentclass{article}
\usepackage{amsthm}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{lemma}
This is Lemma 2.
\end{lemma}

\begin{definition}
This is Definition 3.
\end{definition}

\end{document}
text = r"""
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture}
\begin{document}
\end{document}
"""
counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(
    counters,
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'remark': ('remark', None), 'conj': ('subsection', None)})

numbered_newtheorems_counters_in_preamble ignores commented out text:

text = r"""
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}

\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture} %\newtheorem{fakeenv}{This won't be picked up!}
\begin{document}
\end{document}
"""
counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(
    counters,
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'conj': ('subsection', None)})

numbered_newtheorems_counters_in_preamble does not account for \numberwithin command invocations:

text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers the theorem-like
#  environemnts as being counted by 'equation'.
# Note that the command  `\numberwithin{equation}{section}` resets the equation counter
# every time the `section` counter is incremented.
test_eq(numbered_newtheorems_counters_in_preamble(text), 
        {'theorem': ('equation', None), 'proposition': ('equation', None), 'lemma': ('equation', None), 'corollary': ('equation', None), 'definition': ('equation', None), 'example': ('equation', None), 'remark': ('equation', None)})
\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem| be counted by the equation counter.
\end{theorem}

The following makes an equation labeled 1.2; 
\begin{equation}
5 + 7 = 12
\end{equation}

\begin{theorem*}
This Theorem is unnumbered
\end{theorem*}

\begin{corollary}
This is Corollary 1.3.
\end{corollary}

\section{Another section}
\begin{theorem}
This is theorem 2.1
\end{theorem}

The following is labeled 2.2:
\begin{equation}
3+5 = 8.
\end{equation}

\end{document}

The \newtheorem command can be used to specify the counter of the newly defined theorem-like environment to be reset upon another counter’s incrementation; for example \newtheorem{theorem}{Theorem}[section] specifies for a new environment named theorem (with display text Theorem) that is reset whenever the section counter is incremented.

# TODO: reimplement the numbered_newtheorems_counters_in_preamble function to
# account for this example.
text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex') 
print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers the theorem-like
#  environemnts as being counted by 'equation'.
# Note that the command  `\numberwithin{equation}{section}` resets the equation counter
# every time the `section` counter is incremented.

test_eq(numbered_newtheorems_counters_in_preamble(text), 
        {'lemma': ('theorem', None), 'theorem': ('theorem', 'section'), 'corollary': ('corollary', 'theorem'), 'proposition': ('proposition', 'section')})

% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]

\begin{document}
\section{Introduction}
Theorems can easily be defined:

\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is 
a continuous function.
\end{theorem}

\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next 
equation 
\[ x^2 + y^2 = z^2 \]
\end{theorem}

And a consequence of theorem \ref{pythagorean} is the statement in the next 
corollary.

\begin{corollary}
There's no right rectangle whose sides measure 3cm, 4cm, and 6cm.
\end{corollary}

You can reference theorems such as \ref{pythagorean} when a label is assigned.

\begin{lemma}
Given two line segments whose lengths are \(a\) and \(b\) respectively there is a 
real number \(r\) such that \(b=ra\).
\end{lemma}

\end{document}


source

numberwithins_in_preamble

 numberwithins_in_preamble (document:str)

Return the dict describing numberwithin commands invoked in the preamble of document.

Type Details
document str The LaTeX document
Returns dict The keys are the first arguments of numberwithin invocations and the values ar ethe second arguments of numberwithin invocations.
text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
print(text)
test_eq(numberwithins_in_preamble(text), {'equation': 'section'})
\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem| be counted by the equation counter.
\end{theorem}

The following makes an equation labeled 1.2; 
\begin{equation}
5 + 7 = 12
\end{equation}

\begin{theorem*}
This Theorem is unnumbered
\end{theorem*}

\begin{corollary}
This is Corollary 1.3.
\end{corollary}

\section{Another section}
\begin{theorem}
This is theorem 2.1
\end{theorem}

The following is labeled 2.2:
\begin{equation}
3+5 = 8.
\end{equation}

\end{document}

Getting the display names of environment

For example, \newtheorem{theorem}{Theorem} defines a theorem-like environment called theorem whose display name is Theorem.


source

display_names_of_environments

 display_names_of_environments (document:str)

Return the dict specifying the display names for each theorem-like environment.

This function uses two separate regex patterns, one to detect the invocations of \newtheorem in which the optional parameter is the second parameter and one to detect those in which the optional parameter is the third parameter.

Assumes that - invocations of the \newtheorem command are exclusively in the preamble of the LaTeX document. - theorem-like environments are defined using the \newtheorem command. - no environments of the same name are defined twice.

Type Details
document str The LaTeX document
Returns dict
text = text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex') 
print(text)
display_names = display_names_of_environments(text)
test_eq(display_names, {'theorem': 'Theorem',
 'lemma': 'Lemma',
 'definition': 'Definition',
 'corollary': 'Corollary',
 'conjecture*': 'Conjecture',
 'remark': 'Remark'})
\documentclass{article}
\usepackage{amsthm}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{lemma}
This is Lemma 2.
\end{lemma}

\begin{definition}
This is Definition 3.
\end{definition}

\end{document}
text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex') 
print(text)
display_names = display_names_of_environments(text)
test_eq(display_names,
{'theorem': 'Theorem',
 'corollary': 'Corollary',
 'lemma': 'Lemma',
 'proposition': 'Proposition',})

% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]

\begin{document}
\section{Introduction}
Theorems can easily be defined:

\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is 
a continuous function.
\end{theorem}

\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next 
equation 
\[ x^2 + y^2 = z^2 \]
\end{theorem}

And a consequence of theorem \ref{pythagorean} is the statement in the next 
corollary.

\begin{corollary}
There's no right rectangle whose sides measure 3cm, 4cm, and 6cm.
\end{corollary}

You can reference theorems such as \ref{pythagorean} when a label is assigned.

\begin{lemma}
Given two line segments whose lengths are \(a\) and \(b\) respectively there is a 
real number \(r\) such that \(b=ra\).
\end{lemma}

\end{document}

Divide latex text into parts

sample_counters = _setup_counters(
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'remark': ('remark', None), 'conj': ('subsection', None)})
assert 'remark' in sample_counters
test_eq(sample_counters['remark'], 0)
assert 'thm' not in sample_counters  # 'thm' is an environment name, but not a counter.

source

get_node_from_simple_text

 get_node_from_simple_text (text:str)

Return the (first) LatexNode object from a str.

text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
assert isinstance(node, LatexEnvironmentNode)
test_eq(node.environmentname, 'thm')


text = r"""\begin{thm}This is a theorem. \end{thm} \begin{proof} This is a proof. It is not captured by the `get_node_from_simple_text` function \end{proof}"""
node = get_node_from_simple_text(text)
assert isinstance(node, LatexEnvironmentNode)
test_eq(node.environmentname, 'thm')
text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
# Test a theoreem being counted by its own counter.
numbertheorem_counters = {'thm': ('thm', None)}
numberwithins = {}
counters = {'thm': 1}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')
# Test a theorem being countered by the equation counter.
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {}
counters = {'equation': 2}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '2')
# Test a theorem being countered by the equation counter.
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {}
counters = {'equation': 2}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '2')

text = r"""\begin{corollary}This is a corollary. \end{orollary}"""
node = get_node_from_simple_text(text)
# Test a theorem-like environment being counted by the counter of
# another theorem-like environment
numbertheorem_counters = {'corollary': ('theorem', None), 'theorem': ('theorem', None)}
numberwithins = {}
counters = {'theorem': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '0')

# Test a theorem-like environment whose counter is numbered within
# The section counter.
# First, see what happens when a theorem is called
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('theorem', None)}
numberwithins = {'theorem': 'section'}
counters = {'section': 1, 'theorem': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1.0')

# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')

# Test a theorem-like environment sharing a counter with equation
# and in turn equation is numbered within section.
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('equation', None)}
numberwithins = {'equation': 'section'}
counters = {'section': 1, 'equation': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1.0')
# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')

source

swap_numbers_invoked

 swap_numbers_invoked (preamble:str)

Returns True if \swapnumbers is in the preamble.

Assume that a mention of \swapnumbers is an actual invocation.

Type Details
preamble str
Returns bool
assert swap_numbers_invoked('\swapnumbers')
assert not swap_numbers_invoked(r'''
\documentclass{article}
\usepackage{amsthm}


\newtheorem{theorem}{Theorem} % \swapnumbers
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}
\newtheorem*{remark*}{Remark}''')

source

divide_latex_text

 divide_latex_text (document:str,
                    environments_to_not_divide_along:list[str]=['equation'
                    , 'equation*', 'proof', 'align', 'align*',
                    'enumerate', 'itemize', 'label', 'eqnarray', 'quote',
                    'tabular', 'table'])

Divide LaTeX text to convert into Obsidian.md notes.

Assumes that the counters in the LaTeX document are either the predefined ones or specified by the \newtheorem command.

This function does not divide out \subsubsection’s.

Proof environments are assigned to the same parts their prcededing theorem-like environments, if available.

TODO: Implement counters specified by \newcounter, cf.  https://www.overleaf.com/learn/latex/Counters#LaTeX_commands_for_working_with_counters.

Type Default Details
document str
environments_to_not_divide_along list [‘equation’, ‘equation’, ’proof’, ’align’, ’align’, ‘enumerate’, ‘itemize’, ‘label’, ‘eqnarray’, ‘quote’, ‘tabular’, ‘table’] A list of the names of the environemts along which to not make a new note, unless the environment starts a section (or the entire document).
Returns list Each tuple is of the form (note_title, text), where note_title often encapsulates the note type (i.e. section/subsection/display text of a theorem-like environment) along with the numbering and text is the text of the part. Sometimes title is just a number, which means that text is not of a \section or \subsection command and not of a theorem-like environment.
# TODO: explain examples
file = _test_directory() / 'latex_examples' / 'numbering_example_6' / 'main.tex'
sample_latex_text = text_from_file(file)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text)
file = _test_directory() / 'latex_examples' / 'divide_latex_example_proof_preceded_by_theorem' / 'main.tex'
sample_latex_text = text_from_file(file)
parts = divide_latex_text(sample_latex_text)
print(parts)
test_eq(len(parts), 2)
[['1. Some section', '\\section{Some section}'], ['Theorem 1.', '\\begin{theorem}\nThis is a theorem.\n\\end{theorem}\\begin{proof}\nThis is a proof\n\\end{proof}']]
# sample_latex_file = Path(r'C:\Users\hyunj\Documents\Math\latex_image_data\latex_full\ellenberg_venkatesh_westerland_hshsclhff\main.tex')
# sample_latex_text = text_from_file(sample_latex_file)
# preamble, document = divide_preamble(sample_latex_text)
# parts = divide_latex_text(sample_latex_text)
file = _test_directory() / 'latex_examples' / 'divide_latex_example_text_preceded_by_undivided_environment' / 'main.tex'
sample_latex_text = text_from_file(file)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(document)
print(parts)
test_eq(len(parts), 2)
[['1. CONTENTS', '\\section{CONTENTS}'], ['1', "\\begin{enumerate}\n  \\item Introduction 2\n\n  \\item Preliminaries $\\quad 7$\n\n\\end{enumerate}\n2.1. Categorical preliminaries $\\quad 7$\n\n2.2. On the motivic Spanier-Whitehead category and Milnor-Witt K-theory 8\n\n2.3. $\\mathbb{A}^{1}$-derived category and $\\mathbb{A}^{1}$-homology 9\n\n\\begin{enumerate}\n  \\setcounter{enumi}{3}\n  \\item $\\mathbb{A}^{1}$-Spanier-Whitehead category of cellular smooth schemes 11\n\\end{enumerate}\n3.1. Cellular schemes 12\n\n3.2. Cellular Spanier-Whitehead category 13\n\n\\begin{enumerate}\n  \\setcounter{enumi}{4}\n  \\item The cellular homology of Morel-Sawant on cellular Thom spaces 14\n\n  \\item Spanier-Whitehead cellular complex 18\n\n\\end{enumerate}\n5.1. Definitions and basic properties 18\n\n5.2. Endomorphisms, traces, and characteristic polynomials 19\n\n5.3. Cellular Grothendieck-Lefschetz Trace Formula 21\n\n\\begin{enumerate}\n  \\setcounter{enumi}{6}\n  \\item Rationality of the zeta function 22\n\n  \\item $\\mathbb{A}^{1}$-logarithmic zeta functions and real points 24\n\n  \\item Computing $\\mathbb{A}^{1}$-logarithmic zeta functions and examples 27\n\n\\end{enumerate}\n8.1. $\\mathbb{A}^{1}$-zeta function of Frobenius endomorphisms using Theorem $6.2 \\quad 28$\n\n8.2. $\\mathbb{A}^{1}$-logarithmic zeta functions via Hoyois's trace formula 31\n\n8.3. The logarithmic zeta function of non-cellular schemes 34\n\n\\begin{enumerate}\n  \\setcounter{enumi}{9}\n  \\item Motivic measures 34\n\\end{enumerate}\n9.1. The $\\mathbb{A}^{1}$-trace as a motivic measure 35\n\n9.2. The enriched zeta function as a motivic measure 35\n\nReferences 36\n\nDate: October 3, 2022."]]
file = _test_directory() / 'latex_examples' / 'divide_latex_example_2' / 'main.tex'
sample_latex_text = text_from_file(file)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(document)
print(parts)
[['1', '\\maketitle'], ['abstract', '\\begin{abstract}\nThis is an abstract\n\\end{abstract}']]

The divide_latex_text function divides latex text

file = _test_directory() / 'latex_examples' / 'numbering_example_1_consecutive_numbering_scheme' / 'main.tex'
text = text_from_file(file)
sample_output = divide_latex_text(text)
print(sample_output)
assert sample_output[0][0] == '1'
assert sample_output[1][0] == '1. Introduction'
assert sample_output[2][0] == 'Theorem 1.'
assert sample_output[3][0] == 'Corollary 2.'
assert sample_output[4][0] == 'Remark'
[['1', 'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'], ['1. Introduction', '\\section{Introduction}'], ['Theorem 1.', '\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}'], ['Corollary 2.', '\\begin{corollary}\nThis is Corollary 2.\n\\end{corollary}'], ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'], ['Definition 3.', '\\begin{definition}\nThis is Definition 3.\n\\end{definition}'], ['2. Another Section', '\\section{Another Section}'], ['Theorem 4.', '\\begin{theorem}\nThis is Theorem 4.\n\\end{theorem}'], ['2', 'And we might get a corollary!'], ['Corollary 5.', '\\begin{corollary}\nThis is Corollary 5.\n\\end{corollary}'], ['Definition 6.', '\\begin{definition}\nThis is Definition 6.\n\\end{definition}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_2_numbering_scheme_reset_at_each_section' / 'main.tex'
text = text_from_file(file)
print(divide_latex_text(text))
[['1', 'This document resets its `theorem` counter whenever a new section begins.'], ['1. Introduction', '\\section{Introduction}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'], ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'], ['Definition 1.3.', '\\begin{definition}\nThis is Definition 1.3.\n\\end{definition}'], ['2. Another Section', '\\section{Another Section}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1.\n\\end{theorem}'], ['Corollary 2.2.', '\\begin{corollary}\nThis is Corollary 2.2.\n\\end{corollary}'], ['Definition 2.3.', '\\begin{definition}\nThis is Definition 2.3.\n\\end{definition}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex'
text = text_from_file(file)
print(divide_latex_text(text))
[['1. Introduction', '\\section{Introduction}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1. This is because the \\verb|\\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\\\\n\\verb|\\newtheorem{theorem}[equation]{Theorem}| command makes the environment \\verb|theorem| be counted by the equation counter.\n\\end{theorem}'], ['1', 'The following makes an equation labeled 1.2; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}'], ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.3.\n\\end{corollary}'], ['2. Another section', '\\section{Another section}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is theorem 2.1\n\\end{theorem}'], ['2', 'The following is labeled 2.2:\n\\begin{equation}\n3+5 = 8.\n\\end{equation}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_4_unnumbered_section' / 'main.tex'
text = text_from_file(file)
print(divide_latex_text(text))
[['1. This is section 1', '\\section{This is section 1}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'], ['1.1. This is a subsection 1.1', '\\subsection{This is a subsection 1.1}'], ['1', 'The following makes an equation labeled 1; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}'], ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'], ['1.2. This is subsection 1.2', '\\subsection{This is subsection 1.2}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'], ['Unnumbered section', '\\section*{Unnumbered section}'], ['1.1. This is subsection 1.3', '\\subsection{This is subsection 1.3}'], ['2', '\\subsubsection{This is subsubsection 1.3.1}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.3.\n\\end{theorem}'], ['2. Another section', '\\section{Another section}'], ['2.1. This is subsection 2.1', '\\subsection{This is subsection 2.1}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1\n\\end{theorem}'], ['3', 'The following is labeled 2; note that the equation counter was not reset by section:\n\\begin{equation}\n3+5 = 8.\n\\end{equation}']]
file = _test_directory() / 'latex_examples' / 'numbering_example_5_subsections_and_theorem_like_environments_share_counter' / 'main.tex'
text = text_from_file(file)
sample_output = divide_latex_text(text)
print(divide_latex_text(text))
test_eq(sample_output[4][0], '1. Remark.')
test_eq(sample_output[5][0], 'Remark')
[['1. This is section 1', '\\section{This is section 1}'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.1. Theorem. Note that the \\verb|\\swapnumbers| command is invoked in the preamble.\n\\end{thm}'], ['1.2. This is 1.2. subsection.', '\\subsection{This is 1.2. subsection.}'], ['1', 'Note that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n\\subsubsection{This is 1.2.1. Subsubsection}'], ['1. Remark.', '\\begin{remark}\nThis is an 1. Remark. Note that \\verb|\\remark| has a counter separate from those of many of the other theorem-like environments.\n\\end{remark}'], ['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}'], ['1.3. Proposition.', '\\begin{prop}\nThis is 1.3. Proposition.\n\\end{prop}'], ['Unnumbered section', '\\section*{Unnumbered section}'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.4. Theorem.\n\\end{thm}'], ['2. This is Section 2', '\\section{This is Section 2}'], ['2.1. Theorem.', '\\begin{thm}\nThis is 2.1. Theorem\n\\end{thm}']]

Note that part titles are stripped and are single-lined:

# TODO: fill in the following example
# part = parts[...]
# assert part[0].strip() == part[0]
# TODO: example with a multilined section title forced to single-lined:
# e.g. `\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`
# TODO: Find a list of environment names commonly used.
# TODO: examples with different numbering convention and different numbered environments
# TODO: make numbering_convention work correctly.
# Here are some latex files with different conventions:
# - All subsections in a section share numbering, 
#   - achter_pries_imht https://arxiv.org/abs/math/0608038: e.g. Lemmas 2.1, 2.2, 2.3 are in subsection 2.2 and Lemma 2.4 and Remark 2.5 are in subsection 2.4.as_integer_ratio
#   - pauli_wickelgren https://arxiv.org/abs/2010.09374: e.g. Example 3.5, 3.11 are in subsubsection 3.3.2, Exercise 4.1, Remark 4.2, are in subsection 4.1, Theorem 4.3 is in subsection 4.2, Theorem 4.4 is in subsection 4.3
# - Different environment types have different counts and the counts do not show the section number.
#   - vankataramana_imbrd https://arxiv.org/abs/1205.6543: 
#       - e.g. section 1 has Theorem 1, Remark 1, Remark 2, Remark 3, subsection 1.1.3 has Remark 4, Subsection 2.2 has Definition 1

Identify sections and subsections to make folders for a reference.


source

section_and_subsection_titles_from_latex_parts

 section_and_subsection_titles_from_latex_parts
                                                 (parts:list[tuple[str,str
                                                 ]])

Return a list of lists of titles for the sections and subsections in parts

Unnumbered sections get their own list. Unnumbered subsections are also included in lists. All the titles are striped (of leading and trailing whitespaces).

Type Details
parts list An output of divide_latex_text
Returns list Each list corresponds to a section. The first entry of the list is the title of the section and the other entries are the titles of the subsections.

In the following example, the Environments are numbered Theorem 1, Corollary 2, Definition 3, etc. Also note that there is some content before the very first (explicitly defined) section, so there is a section given by the UNTITLED_SECTION_TITLE constant.

parts = [
    ['1', 'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'],
    ['1. Introduction', '\\section{Introduction}'],
    ['Theorem 1.', '\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}'],
    ['Corollary 2.', '\\begin{corollary}\nThis is Corollary 2.\n\\end{corollary}'],
    ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
    ['Definition 3.', '\\begin{definition}\nThis is Definition 3.\n\\end{definition}'],
    ['2. Another Section', '\\section{Another Section}'],
    ['Theorem 4.', '\\begin{theorem}\nThis is Theorem 4.\n\\end{theorem}'], ['2', 'And we might get a corollary!'],
    ['Corollary 5.', '\\begin{corollary}\nThis is Corollary 5.\n\\end{corollary}'],
    ['Definition 6.', '\\begin{definition}\nThis is Definition 6.\n\\end{definition}']]
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [[UNTITLED_SECTION_TITLE], ['1. Introduction'], ['2. Another Section']])

In contrast, the following example has environments numbered by sections:

parts = [
    ['1', 'This document resets its `theorem` counter whenever a new section begins.'], 
    ['1. Introduction', '\\section{Introduction}'],
    ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'],
    ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'],
    ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
    ['Definition 1.3.', '\\begin{definition}\nThis is Definition 1.3.\n\\end{definition}'],
    ['2. Another Section', '\\section{Another Section}'],
    ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1.\n\\end{theorem}'],
    ['Corollary 2.2.', '\\begin{corollary}\nThis is Corollary 2.2.\n\\end{corollary}'],
    ['Definition 2.3.', '\\begin{definition}\nThis is Definition 2.3.\n\\end{definition}']]
test_eq(sample_output, [[UNTITLED_SECTION_TITLE], ['1. Introduction'], ['2. Another Section']])

The below example is derived from a LaTeX document in which significant content is present before any particular sections. See the nbs\_tests\latex_examples\latex_example_with_content_before_sections folder. Also see https://arxiv.org/abs/1111.3607 for an example of a paper with significant content priori to any explicitly defined sections.

parts = [
    ['abstract', "\\begin{abstract}\nI'm an abstract\n\\end{abstract}"],
    ['1', '\\maketitle\n\nI want to talk about things but notice that this part does not belong to a section!'],
    ['Theorem 1.', "\\begin{theorem}\\label{th:some_theorem}\nI'm a theorem.\n\\end{theorem}"],
    ['2', 'Blah blah blah'],
    ['Theorem 2.', '\\begin{theorem}\\label{th:some_other_theorem}\nImpart me with mathematical knowledge!\n\\end{theorem}'],
    ['3', 'Maybe a corollary'],
    ['Corollary 3.', '\\begin{corollary}\\label{cor:a_corollary}\nI immediately follow from the above theorem.\n\\end{corollary}'],
    ['4', 'More stuff!'],
    ['Corollary 4.', '\\begin{corollary}\\label{cor:another_corollary}\nMore delicious mathematical knowledge.\n\\end{corollary}'],
    ['5', 'Maybe you could describe how we demonstrate this corollary.'],
    ['1. Proof of Theorem~\\ref{th:main}', '\\section{Proof of Theorem~\\ref{th:main}}'],
    ['6', 'Now this is finally in a section.'],
    ['Lemma 5.', '\\begin{lemma}\nSome lemma\n\\end{lemma}\\begin{proof}\nMaximum effort!\n\\end{proof}'],
    ['7', 'Blah blah blah.'],
    ['1.1. This is a subsection', '\\subsection{This is a subsection}'],
    ['8', "I'm about one thing."],
    ['1.2. This is another subsection', '\\subsection{This is another subsection}'],
    ['9', "I'm about another thing."]] 
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [[UNTITLED_SECTION_TITLE], ['1. Proof of Theorem~\\ref{th:main}', '1.1. This is a subsection', '1.2. This is another subsection']])

The below example is derived from a LaTeX document with a \numberwithin{equation}{subsection} in which the theorem-like environments are numbered with the equation counter. In particular, theorem-like environments and subsections are counted together.

Also, note that the below example starts with an explicitly defined section, so there is no section given by the UNTITLED_SECTION_TITLE constant.

parts = [
    ['1. This is section 1', '\\section{This is section 1}'],
    ['1.1. Theorem.', '\\begin{thm}\nThis is 1.1. Theorem. Note that the \\verb|\\swapnumbers| command is invoked in the preamble.\n\\end{thm}'],
    ['1.2. This is 1.2. subsection.', '\\subsection{This is 1.2. subsection.}'],
    ['1', 'Note that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n\\subsubsection{This is 1.2.1. Subsubsection}'],
    ['1. Remark.', '\\begin{remark}\nThis is an 1. Remark. Note that \\verb|\\remark| has a counter separate from those of many of the other theorem-like environments.\n\\end{remark}'],
    ['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}'],
    ['1.3. Proposition.', '\\begin{prop}\nThis is 1.3. Proposition.\n\\end{prop}'],
    ['Unnumbered section', '\\section*{Unnumbered section}'],
    ['1.1. Theorem.', '\\begin{thm}\nThis is 1.4. Theorem.\n\\end{thm}'],
    ['2. This is Section 2', '\\section{This is Section 2}'],
    ['2.1. Theorem.', '\\begin{thm}\nThis is 2.1. Theorem\n\\end{thm}']]
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [['1. This is section 1', '1.2. This is 1.2. subsection.'], ['Unnumbered section'], ['2. This is Section 2']])

The titles of the sections are stripped of their leading and trailing whitespaces (if available)

# The below example makes sure that titles are stripped
parts = [
    ['   1. Section with an unnumbered subsection   ', '\\section{Section with an unnumbered subsection}'],
    ['1', 'This is a section with an unnumbered subsection'],
    ['1.1. ', '\\subsection{}']
]
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [['1. Section with an unnumbered subsection', '1.1.']])
file = _test_directory() / 'latex_examples' / 'latex_example_with_plenty_of_sections_and_subsections' / 'main.tex'
text = text_from_file(file)
parts = divide_latex_text(text) 
print(parts)
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output,
        [['1. This is section 1', '1.1. This is section 1.1', '1.2. This is section 1.2'],
         ['2. This is section 2'],
         ['3. This is section 3', '3.1. This is section 3.1', '3.2. This is section 3.2', '3.3. This is section 3.3', '3.4. This is section 3.4']])
[['1. This is section 1', '\\section{This is section 1}'], ['1.1. This is section 1.1', '\\subsection{This is section 1.1}'], ['1.2. This is section 1.2', '\\subsection{This is section 1.2}'], ['2. This is section 2', '\\section{This is section 2}'], ['3. This is section 3', '\\section{This is section 3}'], ['3.1. This is section 3.1', '\\subsection{This is section 3.1}'], ['3.2. This is section 3.2', '\\subsection{This is section 3.2}'], ['3.3. This is section 3.3', '\\subsection{This is section 3.3}'], ['3.4. This is section 3.4', '\\subsection{This is section 3.4}']]

Formatting modifications

Identify macros and commands to replace

Authors usually define a lot of custom commands and macros in their LaTeX files. Such customizations vary from author to author and most customized commands are not recognized by Obsidian.

See nbs/_tests/latex_examples/commands_example/main.tex for some examples of custom commands.


source

custom_commands

 custom_commands (preamble:str)

Return a dict mapping commands (and math operators) defined in preamble to the number of arguments display text of the commands.

Assumes that the newcommands only have at most one default parameter (newcommands with multiple default parameters are not valid in LaTeX).

Ignores all comented newcommands.

Type Details
preamble str The preamble of a LaTeX document.
Returns list Each tuple consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command.
# Basic
text_1 = r'\newcommand{\con}{\mathcal{C}}'
test_eq(custom_commands(text_1), [('con', 0, None, r'\mathcal{C}')])

# With a parameter
text_2 = r'\newcommand{\field}[1]{\mathbb{#1}}'
test_eq(custom_commands(text_2), [('field', 1, None, r'\mathbb{#1}')]) 

# With multiple parameters, the first of which has a default value of `2`
text_3 = r'\newcommand{\plusbinomial}[3][2]{(#2 + #3)^#1}'
test_eq(custom_commands(text_3), [('plusbinomial', 3, '2', r'(#2 + #3)^#1')])

# The display text has backslashes `\` and curly brances `{}``
text_4 = r'\newcommand{\beq}{\begin{displaymath}}'
test_eq(custom_commands(text_4), [('beq', 0, None, '\\begin{displaymath}')])


# Basic with spaces in the newcommand declaration
text_6 = r'\newcommand {\con}  {\mathcal{C}}'
test_eq(custom_commands(text_6), [('con', 0, None, r'\mathcal{C}')])

# With a parameter and spaces in the newcommand declaration
text_7 = r'\newcommand   {\field}   [1] {\mathbb{#1}}'
test_eq(custom_commands(text_7), [('field', 1, None, r'\mathbb{#1}')])

# With multiple parameters, a default value, and spaces in the newcommand declaration
text_8 = r'\newcommand {\plusbinomial} [3] [2] {(#2 + #3)^#1}'
test_eq(custom_commands(text_8), [('plusbinomial', 3, '2', r'(#2 + #3)^#1')]) 

# With a comment `%'; commented out command declarations should not be detected.
text_9 = r'% \newcommand{\con}{\mathcal{C}}'
test_eq(custom_commands(text_9), [])


# Spanning multiple lines
text_10 = r'''\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
                                         #3 & #4\end{array}\right]}'''
test_eq(
    custom_commands(text_10),
    [('mat', 4, None,
             '\\left[\\begin{array}{cc}#1 & #2 \\\\\n                                         #3 & #4\\end{array}\\right]')])

# Math operator
text_11 = r'\DeclareMathOperator{\Hom}{Hom}'
test_eq(custom_commands(text_11), [('Hom', 0, None, 'Hom')])

text_12 = r'\DeclareMathOperator{\tConf}{\widetilde{Conf}}'
test_eq(custom_commands(text_12), [('tConf', 0, None, r'\widetilde{Conf}')])

# TODO: 
# `\def` commands
# \def is a bit complicated because arguments can either be provided with []
# or can be provided with {}.
# text_13 = r'\def\A{{\cO_{K}}}'

# newcommand and renewcommand don't require {} for the
# command name, cf. https://arxiv.org/abs/1703.05365
text_14 = r'\newcommand\A{{\mathbb A}}'
test_eq(custom_commands(text_14), [('A', 0, None, r'{\mathbb A}')])
# TODO: use a regexp pattern like this one to extract balanced curly braces
# \\mat\{((?>[^{}]+|\{(?1)\})*)\}\{((?>[^{}]+|\{(?2)\})*)\}

source

regex_pattern_detecting_command

 regex_pattern_detecting_command
                                  (command_tuple:tuple[str,int,typing.Opti
                                  onal[str],str])

Return a regex.pattern object (not a re.pattern object) detecting the command with the specified number of parameters, optional argument, and display text.

Assumes that the curly braces used to write the invocations of the commands are balanced and properly nested. Assumes that there are no two commands of the same name.

Type Details
command_tuple tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command.
Returns Pattern
# Basic
pattern = regex_pattern_detecting_command(('Sur', 0, None, r'\mathrm{Sur}'))
text = r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ...'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], r'\Sur')

# One parameter
pattern = regex_pattern_detecting_command(('field', 1, None, r'\mathbb{#1}'))
text = r'\field{Q}'
# print(pattern.pattern)
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# Multiple parameters
pattern = regex_pattern_detecting_command(('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]'))
text = r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)
test_eq(match.group(1), r'{123}')

# Multiple parameters, one of which is optional parameter
pattern = regex_pattern_detecting_command(('plusbinomial', 3, '2', r'(#2 + #3)^#1'))
# When the optional parameter is used
text = r'\plusbinomial{x}{y}'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# When the optional parameter is not used
text = r'\plusbinomial[4]{x}{y}'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# One parameter that is optional.
pattern = regex_pattern_detecting_command(('greet', 1, 'world', r'Hello #1!'))
# When the optional parameter is used
text = r'\greet'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# When the optional parameter is not used
text = r'\greet[govna]'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should detected as invocations of `\del``
command_tuple = (r'del', 0, None, r'\delta')
pattern = regex_pattern_detecting_command(command_tuple)
text = r'\del should be detected.'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], r'\del')
text = r'\delta should not be detected.'
match = pattern.search(text)
assert match is None
# test_eq(replace_command_in_text(text, command_tuple), r'\delta should be replaced. \delta should not.')

source

replace_command_in_text

 replace_command_in_text (text:str,
                          command_tuple:tuple[str,int,typing.Optional[str]
                          ,str])

Replaces all invocations of the specified command in text with the display text with the arguments used in the display text.

Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.

Type Details
text str
command_tuple tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command.
# Basic
command_tuple = ('Sur', 0, None, r'\mathrm{Sur}')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\Sur$ is nonempty.'
test_eq(replace_command_in_text(text, command_tuple), 'The number of element of $\mathrm{Sur}(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\mathrm{Sur}$ is nonempty.')


# One parameter
command_tuple = ('field', 1, None, r'\mathbb{#1}')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'$\field{Q}$ is the field of rational numbers. $\field{C}$ is the field of complex numbers'
test_eq(replace_command_in_text(text, command_tuple), '$\mathbb{Q}$ is the field of rational numbers. $\mathbb{C}$ is the field of complex numbers')

# Multiple parameters
command_tuple = ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
test_eq(replace_command_in_text(text, command_tuple), r'\left[\begin{array}{cc}{123} & asdfasdf{}{} \\ {{}} & {asdf}{asdf}{}\end{array}\right]')

# Multiple parameters, one of which is optional parameter
command_tuple = ('plusbinomial', 3, '2', r'(#2 + #3)^#1')
# pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
text = r'\plusbinomial{x}{y}'
test_eq(replace_command_in_text(text, command_tuple), r'(x + y)^2')

# When the optional parameter is not used
text = r'\plusbinomial[4]{x}{y}'
test_eq(replace_command_in_text(text, command_tuple), r'(x + y)^4')


# One parameter that is optional.
command_tuple = ('greet', 1, 'world', r'Hello #1!')
# pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
text = r'\greet'
test_eq(replace_command_in_text(text, command_tuple), r'Hello world!')

# When the optional parameter is not used
text = r'\greet[govna]'
test_eq(replace_command_in_text(text, command_tuple), r'Hello govna!')

# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should NOT be replaced into `\deltata` should NOT be replaced into `\deltata`
command_tuple = (r'del', 0, None, r'\delta')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'\del should be replaced. \delta should not.'
test_eq(replace_command_in_text(text, command_tuple), r'\delta should be replaced. \delta should not.')

source

replace_commands_in_text

 replace_commands_in_text (text:str,
                           command_tuples:tuple[str,int,typing.Optional[st
                           r],str])

Replaces all invocations of the specified commands in text with the display text with the arguments used in the display text.

Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.

Type Details
text str The text in which to replace the commands. This should not include the preamble of a latex document.
command_tuples tuple An output of custom_commands. Each tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command.
Returns str
text = r'''Here is a matrix over $\field{Q}$: $\mat{1/2}{2}{-1}{5/7}$.
           Note that it is not over $\field{F}_7$ and not over $\field{F}_2$.'''

command_tuples = [
    ('field', 1, None, r'\mathbb{#1}'),
    ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')]

sample_output = replace_commands_in_text(text, command_tuples)

test_eq(sample_output, 
        r'''Here is a matrix over $\mathbb{Q}$: $\left[\begin{array}{cc}1/2 & 2 \\ -1 & 5/7\end{array}\right]$.
           Note that it is not over $\mathbb{F}_7$ and not over $\mathbb{F}_2$.''')

source

replace_commands_in_latex_document

 replace_commands_in_latex_document (docment:str)

Return the latex document (without the preamble) with invocations of custom commands/operators replaced with their display text.

Assumes that all custom commands and operators are defined in the preamble.

Assumes that, if commands with the same name are defined multiple times, only the finally defined command is used.

Even replaces these invocations incommented out text.

file = _test_directory() / 'latex_examples' / 'commands_recursive_example' / 'main.tex'
document = text_from_file(file)
commands_replaced = replace_commands_in_latex_document(document)
assert commands_replaced.startswith(r'\begin{document}')
assert commands_replaced.endswith(r'\end{document}')
assert r'\S' not in commands_replaced
assert r'\mathbb{S}1' in commands_replaced  # Note that $\S$ is defined twice in the preamble; only the latter definition is used.
assert r'\field{Q}$' not in commands_replaced
assert r'\mathbb{Q}$' in commands_replaced
assert r'\commentedout' not in commands_replaced
assert r'This is actually a command that is commented out, but it is also replaced!' in commands_replaced
print(commands_replaced)
\begin{document}

$\mathbb{S}1$
%$\mathbf{Q}$
%$\mathbf{Q}$
%This is actually a command that is commented out, but it is also replaced!
$\mathbb{Q}$

\end{document}

Replace commonly used syntax

Obsidian does not compile all LaTeX syntax. For example

  • \( \) and \[ \] are not recognized as math mode delimiters.
  • \begin{equation} \end{equation} and \begin{align} \end{align} (as well as their unnumbered versions with *) require surrounding $$.

source

adjust_common_syntax_to_markdown

 adjust_common_syntax_to_markdown (text)

Adjust some common syntax, such as math mode delimiters and equation/align environments, for Markdown.

Assumes that the tokens for math mode delimiters (e.g. \( \) and \[ \]) are not used otherwise.

text = r'''
I want to talk about \(\mathbb{Z}[i]\). It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
It has a multiplication structure:
\[ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.\]

Here is an equation:
\begin{equation}
5+7 = 12
\end{equation}

Here is another:
\begin{equation*}
5+6 = 11
\end{equation*}

Here is an align:
\begin{align}
5+7 = 12
\end{align}

Here is another:
\begin{align*}
5+6 = 11
\end{align*}

\begin{eqnarray}
asdf
\end{eqnarray}
'''
sample_output = adjust_common_syntax_to_markdown(text)
print(sample_output)
assert r'\(' not in sample_output
assert r'\)' not in sample_output
assert r'\[' not in sample_output
assert r'\]' not in sample_output
assert r'$$\begin{align}' in sample_output
assert r'\end{align}$$' in sample_output
assert r'$$\begin{equation}' in sample_output
assert r'\end{equation}$$' in sample_output
assert r'$$\begin{align*}' in sample_output
assert r'\end{align*}$$' in sample_output
assert r'$$\begin{equation*}' in sample_output
assert r'\end{equation*}$$' in sample_output
assert r'$$\begin{eqnarray}' in sample_output
assert r'\end{eqnarray}$$' in sample_output

I want to talk about $\mathbb{Z}[i]$. It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
It has a multiplication structure:
$$ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.$$

Here is an equation:
$$\begin{equation}
5+7 = 12
\end{equation}$$

Here is another:
$$\begin{equation*}
5+6 = 11
\end{equation*}$$

Here is an align:
$$\begin{align}
5+7 = 12
\end{align}$$

Here is another:
$$\begin{align*}
5+6 = 11
\end{align*}$$

$$\begin{eqnarray}
asdf
\end{eqnarray}$$

Setup an Obsidian vault reference


source

setup_reference_from_latex_parts

 setup_reference_from_latex_parts (parts:list[tuple[str,str]],
                                   custom_commands:list[tuple[str,int,typi
                                   ng.Optional[str],str]],
                                   vault:os.PathLike,
                                   location:os.PathLike,
                                   reference_name:os.PathLike,
                                   authors:Union[str,list[str]], author_fo
                                   lder:os.PathLike='_mathematicians', ref
                                   erences_folder:os.PathLike='_references
                                   ', templates_folder:os.PathLike='_templ
                                   ates', template_file_name:str='_templat
                                   e_common', notation_index_template_file
                                   _name:str='_template_notation_index', g
                                   lossary_template_file_name:str='_templa
                                   te_glossary',
                                   setup_temp_folder:bool=True, make_secon
                                   d_template_file_in_reference_directory:
                                   bool=True, copy_obsidian_configs:Option
                                   al[os.PathLike]='.obsidian',
                                   overwrite:Optional[str]=None,
                                   confirm_overwrite:bool=True,
                                   verbose:bool=False,
                                   replace_custom_commands:bool=True, adju
                                   st_common_latex_syntax_to_markdown:bool
                                   =True)

Set up a reference folder in vault using an output of divide_latex_text, create notes from parts, and link notes in index files in the reference folder.

Assumes that

  • parts is derived from a LaTeX document in which
    • all of the text belongs to sections.
    • all of the sections/subsections are uniquely named
  • The template file is has a section # Topic
  • The last line of the template file is a footnote indicating where the note comes from.
  • There is at most one reference folder in the vault whose name is given by reference_name.

parts itself is not modified, even if replace_custom_commands and/or adjust_common_latex_syntax_to_markdown are set to True.

cf. setup_folder_for_new_reference for how the reference folder is set up..

The names for the subfolders of the reference folder are the section titles, except for sections with common titles such as Introduction, Notations, Conventions, Preliminaries, and Notations and Conventions. This ensures that the index file names for sections in different reference folders do not have the same name.

Text/parts that precede explicitly given sections are included in the first section’s folder and are linked in the first section’s index file.

Type Default Details
parts list Output of divide_latex_text
custom_commands list Output of custom_commands applied to the preamble of the LaTeX ddocument.| | vault | PathLike | | An Obsidian.md vault, | | location | PathLike | | The path to make the new reference folder. Relative tovault. | | reference_name | PathLike | | The name of the new reference. | | authors | typing.Union[str, list[str]] | | Each str is the family name of each author. | | author_folder | PathLike | _mathematicians | The directory where the author files are stored in. Relative tovault. | | references_folder | PathLike | _references | The directory where the references files are stored in. Relative tovault. | | templates_folder | PathLike | _templates | The directory where the template files are stored in. Relative tovault. | | template_file_name | str | _template_common | The template file from which to base the template file of the new reference. | | notation_index_template_file_name | str | _template_notation_index | The template file from which to base the notation index file of the new reference. | | glossary_template_file_name | str | _template_glossary | The template file from which to base the glossary file of the new reference. | | setup_temp_folder | bool | True | IfTrue, creates a_tempfolder with an index file. This folder serves to house notes auto-created from LaTeX text files before moving them to their correct directories. Defaults toTrue. | | make_second_template_file_in_reference_directory | bool | True | IfTrue, creates a copy of the template note within the directory for the reference. | | copy_obsidian_configs | typing.Optional[os.PathLike] | .obsidian | The folder relative tovaultfrom which to copy obsidian configs. IfNone, then no obsidian configs are copied to the reference folder. Defaults to.obsidian. | | overwrite | typing.Optional[str] | None | Specifies if and how to overwrite the reference folder if it already exists. - If‘w’, then deletes the contents of the existing reference folder, as well as the template and reference file before setting up the reference folder before creating the new reference folder. - If‘a’, then overwrites the contents of the reference folder, but does not remove existing files/folders. - IfNone, then does not modify the existing reference folder and raises aFileExistsError. | | confirm_overwrite | bool | True | Specifies whether or not to confirm the deletion of the reference folder if it already exists and ifoverwriteis‘w’. Defaults toTrue. | | verbose | bool | False | | | replace_custom_commands | bool | True | IfTrue, replace the custom commands in the text ofpartswhen making the notes. | | adjust_common_latex_syntax_to_markdown | bool | True | IfTrue, apply [adjust_common_syntax_to_markdown](https://hyunjongkimmath.github.io/trouver/latex.convert.html#adjust_common_syntax_to_markdown) to the text inpartswhen making the notes.
Returns None
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    sample_latex_file = _test_directory() / 'latex_examples' / 'latex_example_with_untitled_subsections_setup_to_a_vault' / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    # os.startfile(temp_vault)
    # input()
# TODO: give an example for a LaTeX document with a multiline section
# TODO: give an example for a LaTeX document with a section that must be sanitized first, e.g.
# in banwait_et_al_cnpgrg2c, there is a section of the string
# `\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`

The following example demonstrates setting up a reference folder from a latex document with significant content before any explicitly specified sections. In particular, the reference folder contains a subfolder dedicated to the content that comes before the explicitly specified sections.

with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    sample_latex_file = _test_directory() / 'latex_examples' / 'latex_example_with_content_before_sections' / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    reference_folder = temp_vault / 'test_ref'

    subdirectories = list(reference_folder.glob('**'))
    relative_subdirectories = [
        os.path.relpath(subdirectory, reference_folder)
        for subdirectory in subdirectories]
    print("The following are the subdirectories of `reference_folder` (relative to `temp_vault`):")
    print(relative_subdirectories)
    assert convert_title_to_folder_name(f'test_ref {UNTITLED_SECTION_TITLE}') in relative_subdirectories

    # os.startfile(temp_vault)
    # input()
The following are the subdirectories of `reference_folder` (relative to `temp_vault`):
['.', '.obsidian', '.obsidian\\plugins', '.obsidian\\plugins\\fast-link-edit', '1_proof_of_theorem~refthmain', '1_proof_of_theorem~refthmain\\11_this_is_a_subsection', '1_proof_of_theorem~refthmain\\12_this_is_another_subsection', 'test_ref_untitled_section', '_temp']