latex.convert

Convert LaTeX files into Obsidian.md notes (or vice versa)

This module contains functions and methods to automatically make Obsidian notes from LaTeX files of mathematical papers, most notably those on arXiv.

See the Potential Problems section below for some common errors that arise from this module and how to circumvent them.

import glob
import shutil
import tempfile


from fastcore.test import ExceptionExpected, test_eq
from pathvalidate import validate_filename

from trouver.helper.tests import _test_directory# , non_utf8_chars_in_file
from trouver.markdown.obsidian.personal.reference import (
    delete_reference_folder
)

Potential problems

The following are some frequently problems that arise when using this module:

UnicodeDecodeErrors arise when reading LaTeX files

By default, the text_from_file method in trouver.helper reads files and attempts to decode them in utf-8. If a LaTeX file has characters that cannot be decoded into utf-8, then a UnicodeDecodeError may be raised. In this case, one can find identify these characters using the trouver.helper.non_utf8_chars_in_file method and modify the LaTeX file manually. It may be useful to use a text editor to jump to the positions that the characters are at and to change the encoding of the LaTeX file into utf-8; for example, the author of trouver has opened some ANSI-encoded LaTeX documents in Notepad++ and converted their encoding into UTF-8.

`NoDocumentNodeErrors` arise even though the LaTeX file has a document environemt (i.e. `\begin{document}...\end{document}`)

The find_document_node method in this module sometimes is not able to detect the docment environment of a LaTeX file. This error is known to arise when - there are macros (which include commands) defined that represents/expands to characters including \begin{...}... \end{...}. For example

# TODO in the above explanation, include an example.

Setup an Obsidian vault reference

source

setup_reference_from_latex_parts

 setup_reference_from_latex_parts (parts:list[tuple[str,str]],
                                   custom_commands:list[tuple[str,int,typi
                                   ng.Optional[str],str]],
                                   vault:os.PathLike,
                                   location:os.PathLike,
                                   reference_name:os.PathLike,
                                   authors:Union[str,list[str]], author_fo
                                   lder:os.PathLike='_mathematicians', cre
                                   ate_reference_file_in_references_folder
                                   :bool=True, references_folder:os.PathLi
                                   ke='_references', create_template_file_
                                   in_templates_folder:bool=True, template
                                   s_folder:os.PathLike='_templates', temp
                                   late_file_name:str='_template_common', 
                                   notation_index_template_file_name:str='
                                   _template_notation_index', glossary_tem
                                   plate_file_name:str='_template_glossary
                                   ', setup_temp_folder:bool=True, make_se
                                   cond_template_file_in_reference_directo
                                   ry:bool=True, copy_obsidian_configs:Opt
                                   ional[os.PathLike]='.obsidian',
                                   overwrite:Optional[str]=None,
                                   confirm_overwrite:bool=True,
                                   verbose:bool=False,
                                   replace_custom_commands:bool=True, adju
                                   st_common_latex_syntax_to_markdown:bool
                                   =True,
                                   repeat_replacing_custom_commands:int=1)

*Set up a reference folder in vault using an output of divide_latex_text, create notes from parts, and link notes in index files in the reference folder.

Assumes that

parts is derived from a LaTeX document in which
- all of the text belongs to sections.
- all of the sections/subsections are uniquely named
The template file is has a section # Topic
The last line of the template file is a footnote indicating where the note comes from.
There is at most one reference folder in the vault whose name is given by reference_name.

parts itself is not modified, even if replace_custom_commands and/or adjust_common_latex_syntax_to_markdown are set to True.

cf. setup_folder_for_new_reference for how the reference folder is set up..

The names for the subfolders of the reference folder are the section titles, except for sections with common titles such as Introduction, Notations, Conventions, Preliminaries, and Notations and Conventions. This ensures that the index file names for sections in different reference folders do not have the same name.

Text/parts that precede explicitly given sections are included in the first section’s folder and are linked in the first section’s index file.*

	Type	Default	Details
parts	list		Output of `divide_latex_text`
custom_commands	list		Output of `custom_commands` applied to the preamble of the LaTeX ddocument.`\| \| vault \| PathLike \| \| An Obsidian.md vault, \| \| location \| PathLike \| \| The path to make the new reference folder. Relative to`vault`. \| \| reference_name \| PathLike \| \| The name of the new reference. \| \| authors \| Union \| \| Each str is the family name of each author. \| \| author_folder \| PathLike \| _mathematicians \| The directory where the author files are stored in. Relative to`vault`. \| \| create_reference_file_in_references_folder \| bool \| True \| If`True`, then the reference file creation is attempted within`references_folder`. Otherwise, the reference file creation is attempted at the base of the newly setup folder for the reference.. \| \| references_folder \| PathLike \| _references \| The directory where the references files are stored in. Relative to`vault`. \| \| create_template_file_in_templates_folder \| bool \| True \| If`True`, then the template file creation is attempted within`templates_folder`. Otherwise, the template file creation is attempted at the base of the newly setup folder for the reference. \| \| templates_folder \| PathLike \| _templates \| The directory where the template files are stored in. Relative to`vault`. \| \| template_file_name \| str \| _template_common \| The template file from which to base the template file of the new reference. \| \| notation_index_template_file_name \| str \| _template_notation_index \| The template file from which to base the notation index file of the new reference. \| \| glossary_template_file_name \| str \| _template_glossary \| The template file from which to base the glossary file of the new reference. \| \| setup_temp_folder \| bool \| True \| If`True`, creates a`_temp`folder with an index file. This folder serves to house notes auto-created from LaTeX text files before moving them to their correct directories. Defaults to`True`. \| \| make_second_template_file_in_reference_directory \| bool \| True \| If`True`, creates a copy of the template note within the directory for the reference. \| \| copy_obsidian_configs \| Optional \| .obsidian \| The folder relative to`vault`from which to copy obsidian configs. If`None`, then no obsidian configs are copied to the reference folder. Defaults to`.obsidian`. \| \| overwrite \| Optional \| None \| Specifies if and how to overwrite the reference folder if it already exists. - If`‘w’`, then deletes the contents of the existing reference folder, as well as the template and reference file before setting up the reference folder before creating the new reference folder. - If`‘a’`, then overwrites the contents of the reference folder, but does not remove existing files/folders. - If`None`, then does not modify the existing reference folder and raises a`FileExistsError`. \| \| confirm_overwrite \| bool \| True \| Specifies whether or not to confirm the deletion of the reference folder if it already exists and if`overwrite`is`‘w’`. Defaults to`True`. \| \| verbose \| bool \| False \| \| \| replace_custom_commands \| bool \| True \| If`True`, replace the custom commands in the text of`parts`when making the notes. \| \| adjust_common_latex_syntax_to_markdown \| bool \| True \| If`True`, apply [`adjust_common_syntax_to_markdown`](https://hyunjongkimmath.github.io/trouver/latex.formatting.html#adjust_common_syntax_to_markdown) to the text in`parts`when making the notes.`
repeat_replacing_custom_commands	int	1	The number of times to repeat replacing the custom commands throughout the text; note that some custom commands could be “nested”, i.e. the custom commands are defined in terms of other custom commands. Defaults to `1`, in which custom commands are replaced throughout the entire document once. If set to any negative number (e.g. `-1``), then this function attempts to replace custom commands until no commands to replace are found.
Returns	None

with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    dir = _test_directory() / 'latex_examples' / 'latex_example_with_untitled_subsections_setup_to_a_vault'
    sample_latex_file = dir / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text, dir)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    # os.startfile(temp_vault)
    # input()

# TODO: give an example for a LaTeX document with a multiline section
# TODO: give an example for a LaTeX document with a section that must be sanitized first, e.g.
# in banwait_et_al_cnpgrg2c, there is a section of the string
# `\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`

The following example demonstrates setting up a reference folder from a latex document with significant content before any explicitly specified sections. In particular, the reference folder contains a subfolder dedicated to the content that comes before the explicitly specified sections.

with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    dir = _test_directory() / 'latex_examples' / 'latex_example_with_content_before_sections'
    sample_latex_file = dir / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text, dir)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    reference_folder = temp_vault / 'test_ref'

    subdirectories = list(reference_folder.glob('**'))
    relative_subdirectories = [
        os.path.relpath(subdirectory, reference_folder)
        for subdirectory in subdirectories]
    print("The following are the subdirectories of `reference_folder` (relative to `temp_vault`):")
    print(relative_subdirectories)
    assert convert_title_to_folder_name(f'test_ref {UNTITLED_SECTION_TITLE}') in relative_subdirectories

    # os.startfile(temp_vault)
    # input()

The following are the subdirectories of `reference_folder` (relative to `temp_vault`):
['.', '.obsidian', '1_proof_of_theorem~refthmain', 'test_ref_untitled_section', '_temp', '.obsidian\\plugins', '.obsidian\\plugins\\fast-link-edit', '.obsidian\\plugins\\obsidian-vimrc-support', '1_proof_of_theorem~refthmain\\11_this_is_a_subsection', '1_proof_of_theorem~refthmain\\12_this_is_another_subsection']

Compile `Obsidian.md` vault notes into LaTeX code

As a side note, the remove_dollar_signs_around_equationlike_envs function is one function used to revert some markdown-formatted code into code better suited for LaTeX.

output = remove_dollar_signs_around_equationlike_envs(
r'''$$\begin{align*}asdf\end{align*}$$''')
print(output)
assert '$' not in output

\begin{align*}asdf\end{align*}

source

convert_notes_to_latex_code

 convert_notes_to_latex_code
                              (notes:list[trouver.markdown.obsidian.vault.
                              VaultNote], vault:os.PathLike, preamble:str)

Compile the contents of the VaultNote’s into code for a LaTeX file.

body = r"""
Let us explain why this point of view is useful for proving homological stability for Hurwitz spaces.  In most situations where homological stability is understood, one has a sequence of (usually connected) spaces $X_n$ and stabilization maps $f_n: X_n \to X_{n+1}$; the goal is to show that each $f_n$ induces homology isomorphisms in a range of dimensions.  Let $X= \sqcup_n X_n$, and consider the homology

<span notation="" style="border-width:1px;border-style:solid;padding:3px">$$M_p = H_p(X) =\oplus_n H_p(X_n)$$</span>

Give $M_p$ the structure of a $k[x]$-module by making the indeterminate $x$ act via the stabilization map.  $M_p$ admits a grading by the number $n$, and $x$ acts as a degree 1 operator.  Homological stability is rephrased as the statement that $x$ is an isomorphism in sufficiently high degree.  Equivalently, we need the quotient and $x$-torsion

$$\begin{array}{ccc} Tor_0^{k[x]}(k, M_p) = M_p/xM_p & {\rm and} & Tor_1^{k[x]}(k, M_p) = M_p[x] \end{array}$$

to be concentrated in low degrees.
"""
print(_replace_html_with_latex_command_markings(body))


Let us explain why this point of view is useful for proving homological stability for Hurwitz spaces.  In most situations where homological stability is understood, one has a sequence of (usually connected) spaces $X_n$ and stabilization maps $f_n: X_n \to X_{n+1}$; the goal is to show that each $f_n$ induces homology isomorphisms in a range of dimensions.  Let $X= \sqcup_n X_n$, and consider the homology

$$\mathcolorbox{lightgray}{M_p = H_p(X) =\oplus_n H_p(X_n)}$$

Give $M_p$ the structure of a $k[x]$-module by making the indeterminate $x$ act via the stabilization map.  $M_p$ admits a grading by the number $n$, and $x$ acts as a degree 1 operator.  Homological stability is rephrased as the statement that $x$ is an isomorphism in sufficiently high degree.  Equivalently, we need the quotient and $x$-torsion

$$\begin{array}{ccc} Tor_0^{k[x]}(k, M_p) = M_p/xM_p & {\rm and} & Tor_1^{k[x]}(k, M_p) = M_p[x] \end{array}$$

to be concentrated in low degrees.