import glob
import shutil
import tempfile
from fastcore.test import ExceptionExpected, test_eq
from pathvalidate import validate_filename
from trouver.helper.tests import _test_directory# , non_utf8_chars_in_file
from trouver.markdown.obsidian.personal.reference import (
delete_reference_folder )
latex.convert
This module contains functions and methods to automatically make Obsidian notes from LaTeX files of mathematical papers, most notably those on arXiv.
See the Potential Problems section below for some common errors that arise from this module and how to circumvent them.
Potential problems
The following are some frequently problems that arise when using this module:
UnicodeDecodeErrors arise when reading LaTeX files
By default, the text_from_file
method in trouver.helper
reads files and attempts to decode them in utf-8
. If a LaTeX file has characters that cannot be decoded into utf-8
, then a UnicodeDecodeError
may be raised. In this case, one can find identify these characters using the trouver.helper.non_utf8_chars_in_file
method and modify the LaTeX file manually. It may be useful to use a text editor to jump to the positions that the characters are at and to change the encoding of the LaTeX file into utf-8
; for example, the author of trouver
has opened some ANSI
-encoded LaTeX documents in Notepad++
and converted their encoding into UTF-8
.
NoDocumentNodeErrors
arise even though the LaTeX file has a document environemt (i.e. \begin{document}...\end{document}
)
The find_document_node
method in this module sometimes is not able to detect the docment environment of a LaTeX file. This error is known to arise when - there are macros (which include commands) defined that represents/expands to characters including \begin{...}... \end{...}
. For example
# TODO in the above explanation, include an example.
Setup an Obsidian vault reference
setup_reference_from_latex_parts
setup_reference_from_latex_parts (parts:list[tuple[str,str]], custom_commands:list[tuple[str,int,typi ng.Optional[str],str]], vault:os.PathLike, location:os.PathLike, reference_name:os.PathLike, authors:Union[str,list[str]], author_fo lder:os.PathLike='_mathematicians', cre ate_reference_file_in_references_folder :bool=True, references_folder:os.PathLi ke='_references', create_template_file_ in_templates_folder:bool=True, template s_folder:os.PathLike='_templates', temp late_file_name:str='_template_common', notation_index_template_file_name:str=' _template_notation_index', glossary_tem plate_file_name:str='_template_glossary ', setup_temp_folder:bool=True, make_se cond_template_file_in_reference_directo ry:bool=True, copy_obsidian_configs:Opt ional[os.PathLike]='.obsidian', overwrite:Optional[str]=None, confirm_overwrite:bool=True, verbose:bool=False, replace_custom_commands:bool=True, adju st_common_latex_syntax_to_markdown:bool =True, repeat_replacing_custom_commands:int=1)
*Set up a reference folder in vault
using an output of divide_latex_text
, create notes from parts
, and link notes in index files in the reference folder.
Assumes that
parts
is derived from a LaTeX document in which- all of the text belongs to sections.
- all of the sections/subsections are uniquely named
- The template file is has a section
# Topic
- The last line of the template file is a footnote indicating where the note comes from.
- There is at most one reference folder in the vault whose name is given by
reference_name
.
parts
itself is not modified, even if replace_custom_commands
and/or adjust_common_latex_syntax_to_markdown
are set to True
.
cf. setup_folder_for_new_reference
for how the reference folder is set up..
The names for the subfolders of the reference folder are the section titles, except for sections with common titles such as Introduction
, Notations
, Conventions
, Preliminaries
, and Notations and Conventions
. This ensures that the index file names for sections in different reference folders do not have the same name.
Text/parts that precede explicitly given sections are included in the first section’s folder and are linked in the first section’s index file.*
Type | Default | Details | |
---|---|---|---|
parts | list | Output of divide_latex_text |
|
custom_commands | list | Output of custom_commands applied to the preamble of the LaTeX ddocument.| | vault | PathLike | | An Obsidian.md vault, | | location | PathLike | | The path to make the new reference folder. Relative to vault. | | reference_name | PathLike | | The name of the new reference. | | authors | Union | | Each str is the family name of each author. | | author_folder | PathLike | _mathematicians | The directory where the author files are stored in. Relative to vault. | | create_reference_file_in_references_folder | bool | True | If True, then the reference file creation is attempted within references_folder. Otherwise, the reference file creation is attempted at the base of the newly setup folder for the reference.. | | references_folder | PathLike | _references | The directory where the references files are stored in. Relative to vault. | | create_template_file_in_templates_folder | bool | True | If True, then the template file creation is attempted within templates_folder. Otherwise, the template file creation is attempted at the base of the newly setup folder for the reference. | | templates_folder | PathLike | _templates | The directory where the template files are stored in. Relative to vault. | | template_file_name | str | _template_common | The template file from which to base the template file of the new reference. | | notation_index_template_file_name | str | _template_notation_index | The template file from which to base the notation index file of the new reference. | | glossary_template_file_name | str | _template_glossary | The template file from which to base the glossary file of the new reference. | | setup_temp_folder | bool | True | If True, creates a _tempfolder with an index file. This folder serves to house notes auto-created from LaTeX text files before moving them to their correct directories. Defaults to True. | | make_second_template_file_in_reference_directory | bool | True | If True, creates a copy of the template note within the directory for the reference. | | copy_obsidian_configs | Optional | .obsidian | The folder relative to vaultfrom which to copy obsidian configs. If None, then no obsidian configs are copied to the reference folder. Defaults to .obsidian. | | overwrite | Optional | None | Specifies if and how to overwrite the reference folder if it already exists. - If ‘w’, then deletes the contents of the existing reference folder, as well as the template and reference file before setting up the reference folder before creating the new reference folder. - If ‘a’, then overwrites the contents of the reference folder, but does not remove existing files/folders. - If None, then does not modify the existing reference folder and raises a FileExistsError. | | confirm_overwrite | bool | True | Specifies whether or not to confirm the deletion of the reference folder if it already exists and if overwriteis ‘w’. Defaults to True. | | verbose | bool | False | | | replace_custom_commands | bool | True | If True, replace the custom commands in the text of partswhen making the notes. | | adjust_common_latex_syntax_to_markdown | bool | True | If True, apply [ adjust_common_syntax_to_markdown](https://hyunjongkimmath.github.io/trouver/latex.formatting.html#adjust_common_syntax_to_markdown) to the text in partswhen making the notes. |
|
repeat_replacing_custom_commands | int | 1 | The number of times to repeat replacing the custom commands throughout the text; note that some custom commands could be “nested”, i.e. the custom commands are defined in terms of other custom commands. Defaults to 1 , in which custom commands are replaced throughout the entire document once. If set to any negative number (e.g. `-1``), then this function attempts to replace custom commands until no commands to replace are found. |
Returns | None |
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
= Path(temp_dir) / 'test_vault_5'
temp_vault / 'test_vault_5', temp_vault)
shutil.copytree(_test_directory()
dir = _test_directory() / 'latex_examples' / 'latex_example_with_untitled_subsections_setup_to_a_vault'
= dir / 'main.tex'
sample_latex_file = text_from_file(sample_latex_file)
sample_latex_text = divide_preamble(sample_latex_text)
preamble, _ = divide_latex_text(sample_latex_text, dir)
parts = custom_commands(preamble)
cust_comms
setup_reference_from_latex_parts(''),
parts, cust_comms, temp_vault, Path('test_ref',
'Kim', 'Kim', 'Kim'])
[# os.startfile(temp_vault)
# input()
# TODO: give an example for a LaTeX document with a multiline section
# TODO: give an example for a LaTeX document with a section that must be sanitized first, e.g.
# in banwait_et_al_cnpgrg2c, there is a section of the string
# `\section{Exceptional maximal subgroups of
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`
The following example demonstrates setting up a reference folder from a latex document with significant content before any explicitly specified sections. In particular, the reference folder contains a subfolder dedicated to the content that comes before the explicitly specified sections.
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
= Path(temp_dir) / 'test_vault_5'
temp_vault / 'test_vault_5', temp_vault)
shutil.copytree(_test_directory()
dir = _test_directory() / 'latex_examples' / 'latex_example_with_content_before_sections'
= dir / 'main.tex'
sample_latex_file = text_from_file(sample_latex_file)
sample_latex_text = divide_preamble(sample_latex_text)
preamble, _ = divide_latex_text(sample_latex_text, dir)
parts = custom_commands(preamble)
cust_comms
setup_reference_from_latex_parts(''),
parts, cust_comms, temp_vault, Path('test_ref',
'Kim', 'Kim', 'Kim'])
[= temp_vault / 'test_ref'
reference_folder
= list(reference_folder.glob('**'))
subdirectories = [
relative_subdirectories
os.path.relpath(subdirectory, reference_folder)for subdirectory in subdirectories]
print("The following are the subdirectories of `reference_folder` (relative to `temp_vault`):")
print(relative_subdirectories)
assert convert_title_to_folder_name(f'test_ref {UNTITLED_SECTION_TITLE}') in relative_subdirectories
# os.startfile(temp_vault)
# input()
The following are the subdirectories of `reference_folder` (relative to `temp_vault`):
['.', '.obsidian', '1_proof_of_theorem~refthmain', 'test_ref_untitled_section', '_temp', '.obsidian\\plugins', '.obsidian\\plugins\\fast-link-edit', '.obsidian\\plugins\\obsidian-vimrc-support', '1_proof_of_theorem~refthmain\\11_this_is_a_subsection', '1_proof_of_theorem~refthmain\\12_this_is_another_subsection']
Compile Obsidian.md
vault notes into LaTeX code
As a side note, the remove_dollar_signs_around_equationlike_envs
function is one function used to revert some markdown-formatted code into code better suited for LaTeX.
= remove_dollar_signs_around_equationlike_envs(
output r'''$$\begin{align*}asdf\end{align*}$$''')
print(output)
assert '$' not in output
\begin{align*}asdf\end{align*}
convert_notes_to_latex_code
convert_notes_to_latex_code (notes:list[trouver.markdown.obsidian.vault. VaultNote], vault:os.PathLike, preamble:str)
Compile the contents of the VaultNote
’s into code for a LaTeX file.
= r"""
body Let us explain why this point of view is useful for proving homological stability for Hurwitz spaces. In most situations where homological stability is understood, one has a sequence of (usually connected) spaces $X_n$ and stabilization maps $f_n: X_n \to X_{n+1}$; the goal is to show that each $f_n$ induces homology isomorphisms in a range of dimensions. Let $X= \sqcup_n X_n$, and consider the homology
<span notation="" style="border-width:1px;border-style:solid;padding:3px">$$M_p = H_p(X) =\oplus_n H_p(X_n)$$</span>
Give $M_p$ the structure of a $k[x]$-module by making the indeterminate $x$ act via the stabilization map. $M_p$ admits a grading by the number $n$, and $x$ acts as a degree 1 operator. Homological stability is rephrased as the statement that $x$ is an isomorphism in sufficiently high degree. Equivalently, we need the quotient and $x$-torsion
$$\begin{array}{ccc} Tor_0^{k[x]}(k, M_p) = M_p/xM_p & {\rm and} & Tor_1^{k[x]}(k, M_p) = M_p[x] \end{array}$$
to be concentrated in low degrees.
"""
print(_replace_html_with_latex_command_markings(body))
Let us explain why this point of view is useful for proving homological stability for Hurwitz spaces. In most situations where homological stability is understood, one has a sequence of (usually connected) spaces $X_n$ and stabilization maps $f_n: X_n \to X_{n+1}$; the goal is to show that each $f_n$ induces homology isomorphisms in a range of dimensions. Let $X= \sqcup_n X_n$, and consider the homology
$$\mathcolorbox{lightgray}{M_p = H_p(X) =\oplus_n H_p(X_n)}$$
Give $M_p$ the structure of a $k[x]$-module by making the indeterminate $x$ act via the stabilization map. $M_p$ admits a grading by the number $n$, and $x$ acts as a degree 1 operator. Homological stability is rephrased as the statement that $x$ is an isomorphism in sufficiently high degree. Equivalently, we need the quotient and $x$-torsion
$$\begin{array}{ccc} Tor_0^{k[x]}(k, M_p) = M_p/xM_p & {\rm and} & Tor_1^{k[x]}(k, M_p) = M_p[x] \end{array}$$
to be concentrated in low degrees.