latex.formatting

Adjust formatting for text from LaTeX files to be more usable by Markdown files for Obsidian.md (or vice versa)

See helper.latex.macros_and_commands, which contains some functions originally written for this module.

Numbers


source

is_number

 is_number (x:Union[float,int,complex,str])

*Return True if the input x represents a number.

This function is different from Python’s built-in is_numeric function, which returns True when all characters of a string are digits.*

assert is_number("3.45")
assert is_number(1 + 5j)
assert is_number(5)
assert is_number(0.0)
assert not is_number("3.43.55")
assert not is_number("hie")
assert not is_number("[^1]")
assert not is_number(None)
from fastcore.test import test_eq

from trouver.helper.tests import _test_directory# , non_utf8_chars_in_file

Formatting modifications (from LaTeX to Obsidian.md style Markdown)

Identify macros and commands to replace

Authors usually define a lot of custom commands and macros in their LaTeX files. Such customizations vary from author to author and most customized commands are not recognized by Obsidian.

See nbs/_tests/latex_examples/commands_example/main.tex for some examples of custom commands.


source

replace_command_in_text

 replace_command_in_text (text:str,
                          command_tuple:tuple[str,int,typing.Optional[str]
                          ,str])

*Replaces all invocations of the specified command in text with the display text with the arguments used in the display text.

Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.*

Type Details
text str
command_tuple tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command.
# Basic
command_tuple = ('Sur', 0, None, r'\mathrm{Sur}')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\Sur$ is nonempty.'
test_eq(replace_command_in_text(text, command_tuple), r'The number of element of $\mathrm{Sur}(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\mathrm{Sur}$ is nonempty.')


# One parameter
command_tuple = ('field', 1, None, r'\mathbb{#1}')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'$\field{Q}$ is the field of rational numbers. $\field{C}$ is the field of complex numbers'
test_eq(replace_command_in_text(text, command_tuple), r'$\mathbb{Q}$ is the field of rational numbers. $\mathbb{C}$ is the field of complex numbers')

# Multiple parameters
command_tuple = ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
test_eq(replace_command_in_text(text, command_tuple), r'\left[\begin{array}{cc}{123} & asdfasdf{}{} \\ {{}} & {asdf}{asdf}{}\end{array}\right]')

# Multiple parameters, one of which is optional parameter
command_tuple = ('plusbinomial', 3, '2', r'(#2 + #3)^#1')
# pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
text = r'\plusbinomial{x}{y}'
test_eq(replace_command_in_text(text, command_tuple), r'(x + y)^2')

# When the optional parameter is not used
text = r'\plusbinomial[4]{x}{y}'
test_eq(replace_command_in_text(text, command_tuple), r'(x + y)^4')


# One parameter that is optional.
command_tuple = ('greet', 1, 'world', r'Hello #1!')
# pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
text = r'\greet'
test_eq(replace_command_in_text(text, command_tuple), r'Hello world!')

# When the optional parameter is not used
text = r'\greet[govna]'
test_eq(replace_command_in_text(text, command_tuple), r'Hello govna!')

# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should NOT be replaced into `\deltata` should NOT be replaced into `\deltata`
command_tuple = (r'del', 0, None, r'\delta')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'\del should be replaced. \delta should not.'
test_eq(replace_command_in_text(text, command_tuple), r'\delta should be replaced. \delta should not.')

source

replace_commands_in_text

 replace_commands_in_text (text:str,
                           command_tuples:list[tuple[str,int,typing.Option
                           al[str],str]], repeat:int=1)

*Replaces all invocations of the specified commands in text with the display text with the arguments used in the display text.

Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.

If repeat is set to -1, then this function attempts to replace custom commands until no commands to replace are found. However, this might cause infinite loops for some documents.*

Type Default Details
text str The text in which to replace the commands. This should not include the preamble of a latex document.
command_tuples list An output of custom_commands. Each tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command.
repeat int 1 The number of times to repeat replacing the commands throughout the text; note that some custom commands could be “nested”, i.e. the custom commands are defined in terms of other custom commands. Defaults to 1, in which custom commands are replaced throughout the entire document once. If set to -1, then this function attempts to replace custom commands until no commands to replace are found.
Returns str

The replace_comands_in_text function replaces custom commands from a (the main part of) a LaTeX document.

text = r'''Here is a matrix over $\field{Q}$: $\mat{1/2}{2}{-1}{5/7}$.
           Note that it is not over $\field{F}_7$ and not over $\field{F}_2$.'''

command_tuples = [
    ('field', 1, None, r'\mathbb{#1}'),
    ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')]

sample_output = replace_commands_in_text(text, command_tuples)

test_eq(sample_output, 
        r'''Here is a matrix over $\mathbb{Q}$: $\left[\begin{array}{cc}1/2 & 2 \\ -1 & 5/7\end{array}\right]$.
           Note that it is not over $\mathbb{F}_7$ and not over $\mathbb{F}_2$.''')

Note that some writers define custom commands using other custom commands. By default, the replace_commands_in_text function replaces custom commands just once. In the following example, there is a custom command that is defined using another custom command and the function replace the “outer” custom command:

# TODO: continue this example
text = r'''$\Gm$'''

command_tuples = [
    ('Gm', 0, None, r'\bG_\mult'),
    ('bG', 0, None, r'\mathbb{G}'),
    ('mult', 0, None, r'\times'),
]

sample_output = replace_commands_in_text(text, command_tuples)
test_eq(sample_output, 
        r'''$\mathbb{G}_\times$''')

source

replace_commands_in_latex_document

 replace_commands_in_latex_document (document:str, repeat:int=1)

*Return the latex document (with the preamble) with invocations of custom commands/operators replaced with their display text.

Assumes that all custom commands and operators are defined in the preamble.

Assumes that, if commands with the same name are defined multiple times, only the finally defined command is used.

This function does not replace \input’s and \include’s with the code of the corresponding files. See replace_input_and_include, which accomplishes this.

Even replaces these invocations incommented out text.*

Type Default Details
document str
repeat int 1 The number of times to repeat replacing the commands throughout the text; note that some custom commands could be “nested”, i.e. the custom commands are defined in terms of other custom commands. Defaults to 1, in which custom commands are replaced throughout the entire document once. If set to -1, then this function attempts to replace custom commands until no commands to replace are found. See also replace_commands_in_text
Returns str
file = _test_directory() / 'latex_examples' / 'commands_recursive_example' / 'main.tex'
document = text_from_file(file)
commands_replaced = replace_commands_in_latex_document(document)
assert commands_replaced.startswith(r'\begin{document}')
assert commands_replaced.endswith(r'\end{document}')
assert r'\S' not in commands_replaced
assert r'\mathbb{S}1' in commands_replaced  # Note that $\S$ is defined twice in the preamble; only the latter definition is used.
assert r'\field{Q}$' not in commands_replaced
assert r'\mathbb{Q}$' in commands_replaced
assert r'\commentedout' not in commands_replaced
assert r'This is actually a command that is commented out, but it is also replaced!' in commands_replaced
print(commands_replaced)
\begin{document}

$\mathbb{S}1$
%$\mathbf{Q}$
%$\mathbf{Q}$
%This is actually a command that is commented out, but it is also replaced!
$\mathbb{Q}$

\end{document}

Replace commonly used syntax

Obsidian does not compile all LaTeX syntax. For example

  • \( \) and \[ \] are not recognized as math mode delimiters.
  • \begin{equation} \end{equation} and \begin{align} \end{align} (as well as their unnumbered versions with *) require surrounding $$.

The adjust_common_syntax_to_markdown function gives options to modify formatting from LaTeX text to fit markdown better.


source

adjust_common_syntax_to_markdown

 adjust_common_syntax_to_markdown (text:str,
                                   options:list[str]=['math_mode_delimiter
                                   s', 'begin_end_equationlike_env', 'repl
                                   ace_backtick_and_apostrophe_quotes',
                                   'remove_comments',
                                   'inline_mathmode_to_own_paragraph',
                                   'merge_multiline_paragraph',
                                   'remove_xspace', 'remove_ensuremath'])

*Adjust some common syntax, such as math mode delimiters and equation/align environments, for Markdown.

Assumes that the tokens for math mode delimiters (e.g. \( \) and \[ \]) are not used otherwise.

The following lists admissible parameters in the options parameter and the effects that including them have:

  • "math_mode_delimiters"
    • Replace \( \) as math mode delimiters with $ $.
    • Replace \[ \] as math mode delimiters with $$ $$.
  • "begin_end_equationlike_env"
    • Replace \\begin{...} \end{...} with $$\\begin{...} \end{...}$$ and \\begin{...*} \end{...*} with $$\\begin{...*} \end{...*}$$ for the following environments:
      • align
      • displaymath
      • equation
      • eqnarray
  • "replace_backtick_and_apostrophe_quotes"
    • replace `` '' as quotation delimiters with " ".
  • "remove_comments"
    • remove LaTeX comments.
  • "inline_mathmode_to_own_paragraph"
    • Make it so that each inline-math mode string (of the form $$...$$) has at least one newline before and after it. Also delete one blank space character ' ' from the text immediately before and after the in-line math mode string if they exist.
  • "merge_multiline_paragraph"
    • Some writers will type paragraphs in multiple lines, likely because their LaTeX editor of choice does not wrap text within a single line. Including this option merges “normal” paragraphs into a single line.
      • Inline-mathmode text are not affected by this option.
  • "remove_xspace"
    • Some writiers include \\xspace in their code (such as in their custom commands) Obsidian does not render these.
  • "remove_ensuremath"
    • Some writiers include \\ensuremath in their code (such as in their custom commands) Obsidian does not render these.*
Type Default Details
text str The LaTeX code to adjust to Markdown.
options list [‘math_mode_delimiters’, ‘begin_end_equationlike_env’, ‘replace_backtick_and_apostrophe_quotes’, ‘remove_comments’, ‘inline_mathmode_to_own_paragraph’, ‘merge_multiline_paragraph’, ‘remove_xspace’, ‘remove_ensuremath’]
Returns str Each str specifies what formatting should be done.
text = r'''
are cellular and the DG category one gets by replacing the
topological spaces ${Mor}_{{\mathcal K}_{{top}}}(X_i,X_j)$ by their
cellular chain complexes equals ${\mathcal K}$.
'''
print(_merge_multilines(text))

are cellular and the DG category one gets by replacing the topological spaces ${Mor}_{{\mathcal K}_{{top}}}(X_i,X_j)$ by their cellular chain complexes equals ${\mathcal K}$.

Here is a basic example:

text = r'''I want to talk about \(\mathbb{Z}[i]\). It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
It has a multiplication structure:
\[ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.\]

Here is an equation:
\begin{equation}
5+7 = 12
\end{equation}

Here is another:
\begin{equation*}
5+6 = 11
\end{equation*}

Here is an align:
\begin{align}
5+7 = 12
\end{align}

Here is another:
\begin{align*}
5+6 = 11
\end{align*}

\begin{eqnarray}
asdf
\end{eqnarray}

\begin{displaymath}
asdf
\end{displaymath}

asdf ``hello''

``a multiline
quotation'' % Sneak comment!

'''
sample_output = adjust_common_syntax_to_markdown(text)
print(sample_output)
assert r'\(' not in sample_output
assert r'\)' not in sample_output
assert r'\[' not in sample_output
assert r'\]' not in sample_output
assert r'$$\begin{align}' in sample_output
assert r'\end{align}$$' in sample_output
assert r'$$\begin{equation}' in sample_output
assert r'\end{equation}$$' in sample_output
assert r'$$\begin{align*}' in sample_output
assert r'\end{align*}$$' in sample_output
assert r'$$\begin{equation*}' in sample_output
assert r'\end{equation*}$$' in sample_output
assert r'$$\begin{eqnarray}' in sample_output
assert r'\end{eqnarray}$$' in sample_output
assert r'$$\begin{displaymath}' in sample_output
assert r'\end{displaymath}$$' in sample_output
assert r"``hello''" not in sample_output
assert r'"hello"' in sample_output
assert '"a multiline quotation"' in sample_output
assert ' "a multiline quotation"' not in sample_output
assert '``' not in sample_output
assert r'%' not in sample_output
assert r'Sneak comment' not in sample_output
I want to talk about $\mathbb{Z}[i]$. It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$. It has a multiplication structure:

$$ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.$$

Here is an equation:

$$\begin{equation}
5+7 = 12
\end{equation}$$

Here is another:

$$\begin{equation*}
5+6 = 11
\end{equation*}$$

Here is an align:

$$\begin{align}
5+7 = 12
\end{align}$$

Here is another:

$$\begin{align*}
5+6 = 11
\end{align*}$$

$$\begin{eqnarray}
asdf
\end{eqnarray}$$

$$\begin{displaymath}
asdf
\end{displaymath}$$

asdf "hello"

"a multiline quotation" 

We test the removal of \xspace and \ensuremath

text = r'\ensuremath{\operatorname{GL}_n}\xspace'
test_eq(adjust_common_syntax_to_markdown(text), '\\operatorname{GL}_n')

Formatting irregularities

Since writers use their own formatting for LaTeX documents, there will be formatting practices that the functions in this module do not account for or do not fully account for. The following parses the LaTeX document to warn the user of formatting choices that trouver does not account for.

def warn_for_formatting_irregularities(
        document: str
        ):
    # TODO: Go through assumptions made by various functions
    # TODO: custom command for section
    # TODO: custom command is invoked, but does not match explicit formatting, e.g. {\tilde M}
    # TODO: warn about \docuemntclass; the most common are `amsart` and `article`, but some writers will use other classes, such as `standalone, svmult, icmart, compositio`
    # TODO: warn about \newcounter, cf https://www.overleaf.com/learn/latex/Counters#LaTeX_commands_for_working_with_counters.
    # TODO: warn about newcomand and DeclareOperator being declared as custom commands of their own. cf. arinkin_gaitsgory_sscsglc
    # TODO: warn about section/subsection/subsubsetion being wrapped around some custom commands.
    return

Replace \input and \include

Writers oftentimes opt to use separate their writings in multiple parts and then incorporate these parts using \input or \include. The following functions account for these to put together all the latex code in one place


source

replace_input_and_include

 replace_input_and_include (document:str, dir:os.PathLike,
                            commands:list[tuple[str,int,typing.Optional[st
                            r],str]], repeat_replacing_commands:int=-1)

Sequentially replace invocations of \input or \include with the contents of the corresponding files, updating and applying custom commands as needed.

Type Default Details
document str
dir PathLike The directory containing the .tex files which are to be included.
commands list
repeat_replacing_commands int -1 this is passed as the repeat argument into the invocation of replace_commands_in_text.
Returns str

The following is a basic example

latex_folder = _test_directory() / 'latex_examples' / 'latex_example_with_inputs_and_includes'
main_file = latex_folder / 'main.tex'
with open(main_file, 'r', encoding='utf-8') as file:
    document = file.read()
preamble, body = divide_preamble(document)
commands = custom_commands(preamble)
output = replace_input_and_include(body, latex_folder, commands)
print(output)
\begin{document}

\section*{\underline{Introduction}}
This is the main document. We use {\bf custom commands} here.

% This file defines new commands


Now we can use {\it input command}.


\section*{\underline{Chapter 1}}

This is content from chapter 1. We define {\large new commands} here too.

This function is related to the main document


Let's use {\large include command} from chapter1.


\subsection{A Subsection}

This content is from a subsection file. We use {\small\textsc{local commands}} here.

We also use {\bf main document commands} and {\it input file commands}.


Finally, we use {\small\textsc{another input}}.

\end{document}

Formatting modifications (from Obsidian.md style Markdown to LaTeX)

As written previously displaymath/equationlike environments (e.g. \begin{equation} \end{equation}) used in Markdown require surrounding $$.

The remove_dollar_signs_around_equationlike_envs function is one function used to revert such markdown-formatted code into code better suited for LaTeX.


source

remove_dollar_signs_around_equationlike_envs

 remove_dollar_signs_around_equationlike_envs (text:str)

*Remove dollar signs preceding and following displaymath/equation-like environments.

Such dollar signs are in place to make the code renderable as Markdown code in Obsidian.md.

This function is essentially the reverse of _replace_equationlike_envs.*

output = remove_dollar_signs_around_equationlike_envs(
r'''$$\begin{align*}asdf\end{align*}$$''')
print(output)
assert '$' not in output
\begin{align*}asdf\end{align*}