assert is_number("3.45")
assert is_number(1 + 5j)
assert is_number(5)
assert is_number(0.0)
assert not is_number("3.43.55")
assert not is_number("hie")
assert not is_number("[^1]")
assert not is_number(None)
latex.formatting
Obsidian.md
(or vice versa)
See helper.latex.macros_and_commands
, which contains some functions originally written for this module.
Numbers
is_number
is_number (x:Union[float,int,complex,str])
*Return True
if the input x
represents a number.
This function is different from Python’s built-in is_numeric
function, which returns True
when all characters of a string are digits.*
from fastcore.test import test_eq
from trouver.helper.tests import _test_directory# , non_utf8_chars_in_file
Formatting modifications (from LaTeX to Obsidian.md
style Markdown)
Identify macros and commands to replace
Authors usually define a lot of custom commands and macros in their LaTeX files. Such customizations vary from author to author and most customized commands are not recognized by Obsidian.
See nbs/_tests/latex_examples/commands_example/main.tex
for some examples of custom commands.
replace_command_in_text
replace_command_in_text (text:str, command_tuple:tuple[str,int,typing.Optional[str] ,str])
*Replaces all invocations of the specified command in text
with the display text with the arguments used in the display text.
Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.*
Type | Details | |
---|---|---|
text | str | |
command_tuple | tuple | Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command. |
# Basic
= ('Sur', 0, None, r'\mathrm{Sur}')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\Sur$ is nonempty.'
text r'The number of element of $\mathrm{Sur}(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\mathrm{Sur}$ is nonempty.')
test_eq(replace_command_in_text(text, command_tuple),
# One parameter
= ('field', 1, None, r'\mathbb{#1}')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'$\field{Q}$ is the field of rational numbers. $\field{C}$ is the field of complex numbers'
text r'$\mathbb{Q}$ is the field of rational numbers. $\mathbb{C}$ is the field of complex numbers')
test_eq(replace_command_in_text(text, command_tuple),
# Multiple parameters
= ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
text r'\left[\begin{array}{cc}{123} & asdfasdf{}{} \\ {{}} & {asdf}{asdf}{}\end{array}\right]')
test_eq(replace_command_in_text(text, command_tuple),
# Multiple parameters, one of which is optional parameter
= ('plusbinomial', 3, '2', r'(#2 + #3)^#1')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
= r'\plusbinomial{x}{y}'
text r'(x + y)^2')
test_eq(replace_command_in_text(text, command_tuple),
# When the optional parameter is not used
= r'\plusbinomial[4]{x}{y}'
text r'(x + y)^4')
test_eq(replace_command_in_text(text, command_tuple),
# One parameter that is optional.
= ('greet', 1, 'world', r'Hello #1!')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
= r'\greet'
text r'Hello world!')
test_eq(replace_command_in_text(text, command_tuple),
# When the optional parameter is not used
= r'\greet[govna]'
text r'Hello govna!')
test_eq(replace_command_in_text(text, command_tuple),
# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should NOT be replaced into `\deltata` should NOT be replaced into `\deltata`
= (r'del', 0, None, r'\delta')
command_tuple # pattern = regex_pattern_detecting_command(command_tuple)
= r'\del should be replaced. \delta should not.'
text r'\delta should be replaced. \delta should not.') test_eq(replace_command_in_text(text, command_tuple),
replace_commands_in_text
replace_commands_in_text (text:str, command_tuples:list[tuple[str,int,typing.Option al[str],str]], repeat:int=1)
*Replaces all invocations of the specified commands in text
with the display text with the arguments used in the display text.
Assumes that ‘’, ‘’, ‘’, etc. are not part of the display text.
If repeat
is set to -1
, then this function attempts to replace custom commands until no commands to replace are found. However, this might cause infinite loops for some documents.*
Type | Default | Details | |
---|---|---|---|
text | str | The text in which to replace the commands. This should not include the preamble of a latex document. | |
command_tuples | list | An output of custom_commands . Each tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or None otherwise, and 4. the display text of the command. |
|
repeat | int | 1 | The number of times to repeat replacing the commands throughout the text; note that some custom commands could be “nested”, i.e. the custom commands are defined in terms of other custom commands. Defaults to 1 , in which custom commands are replaced throughout the entire document once. If set to -1, then this function attempts to replace custom commands until no commands to replace are found. |
Returns | str |
The replace_comands_in_text
function replaces custom commands from a (the main part of) a LaTeX document.
= r'''Here is a matrix over $\field{Q}$: $\mat{1/2}{2}{-1}{5/7}$.
text Note that it is not over $\field{F}_7$ and not over $\field{F}_2$.'''
= [
command_tuples 'field', 1, None, r'\mathbb{#1}'),
('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')]
(
= replace_commands_in_text(text, command_tuples)
sample_output
test_eq(sample_output, r'''Here is a matrix over $\mathbb{Q}$: $\left[\begin{array}{cc}1/2 & 2 \\ -1 & 5/7\end{array}\right]$.
Note that it is not over $\mathbb{F}_7$ and not over $\mathbb{F}_2$.''')
Note that some writers define custom commands using other custom commands. By default, the replace_commands_in_text
function replaces custom commands just once. In the following example, there is a custom command that is defined using another custom command and the function replace the “outer” custom command:
# TODO: continue this example
= r'''$\Gm$'''
text
= [
command_tuples 'Gm', 0, None, r'\bG_\mult'),
('bG', 0, None, r'\mathbb{G}'),
('mult', 0, None, r'\times'),
(
]
= replace_commands_in_text(text, command_tuples)
sample_output
test_eq(sample_output, r'''$\mathbb{G}_\times$''')
replace_commands_in_latex_document
replace_commands_in_latex_document (document:str, repeat:int=1)
*Return the latex document (with the preamble) with invocations of custom commands/operators replaced with their display text.
Assumes that all custom commands and operators are defined in the preamble.
Assumes that, if commands with the same name are defined multiple times, only the finally defined command is used.
This function does not replace \input
’s and \include
’s with the code of the corresponding files. See replace_input_and_include
, which accomplishes this.
Even replaces these invocations incommented out text.*
Type | Default | Details | |
---|---|---|---|
document | str | ||
repeat | int | 1 | The number of times to repeat replacing the commands throughout the text; note that some custom commands could be “nested”, i.e. the custom commands are defined in terms of other custom commands. Defaults to 1 , in which custom commands are replaced throughout the entire document once. If set to -1, then this function attempts to replace custom commands until no commands to replace are found. See also replace_commands_in_text |
Returns | str |
file = _test_directory() / 'latex_examples' / 'commands_recursive_example' / 'main.tex'
= text_from_file(file)
document = replace_commands_in_latex_document(document)
commands_replaced assert commands_replaced.startswith(r'\begin{document}')
assert commands_replaced.endswith(r'\end{document}')
assert r'\S' not in commands_replaced
assert r'\mathbb{S}1' in commands_replaced # Note that $\S$ is defined twice in the preamble; only the latter definition is used.
assert r'\field{Q}$' not in commands_replaced
assert r'\mathbb{Q}$' in commands_replaced
assert r'\commentedout' not in commands_replaced
assert r'This is actually a command that is commented out, but it is also replaced!' in commands_replaced
print(commands_replaced)
\begin{document}
$\mathbb{S}1$
%$\mathbf{Q}$
%$\mathbf{Q}$
%This is actually a command that is commented out, but it is also replaced!
$\mathbb{Q}$
\end{document}
Replace commonly used syntax
Obsidian does not compile all LaTeX syntax. For example
\( \)
and\[ \]
are not recognized as math mode delimiters.\begin{equation} \end{equation}
and\begin{align} \end{align}
(as well as their unnumbered versions with*
) require surrounding$$
.
The adjust_common_syntax_to_markdown
function gives options to modify formatting from LaTeX text to fit markdown better.
adjust_common_syntax_to_markdown
adjust_common_syntax_to_markdown (text:str, options:list[str]=['math_mode_delimiter s', 'begin_end_equationlike_env', 'repl ace_backtick_and_apostrophe_quotes', 'remove_comments', 'inline_mathmode_to_own_paragraph', 'merge_multiline_paragraph', 'remove_xspace', 'remove_ensuremath'])
*Adjust some common syntax, such as math mode delimiters and equation/align environments, for Markdown.
Assumes that the tokens for math mode delimiters (e.g. \( \)
and \[ \]
) are not used otherwise.
The following lists admissible parameters in the options
parameter and the effects that including them have:
"math_mode_delimiters"
- Replace
\( \)
as math mode delimiters with$ $
. - Replace
\[ \]
as math mode delimiters with$$ $$
.
- Replace
"begin_end_equationlike_env"
- Replace
\\begin{...} \end{...}
with$$\\begin{...} \end{...}$$
and\\begin{...*} \end{...*}
with$$\\begin{...*} \end{...*}$$
for the following environments:align
displaymath
equation
eqnarray
- Replace
"replace_backtick_and_apostrophe_quotes"
- replace
`` ''
as quotation delimiters with" "
.
- replace
"remove_comments"
- remove LaTeX comments.
"inline_mathmode_to_own_paragraph"
- Make it so that each inline-math mode string (of the form
$$...$$
) has at least one newline before and after it. Also delete one blank space character' '
from the text immediately before and after the in-line math mode string if they exist.
- Make it so that each inline-math mode string (of the form
"merge_multiline_paragraph"
- Some writers will type paragraphs in multiple lines, likely because their LaTeX editor of choice does not wrap text within a single line. Including this option merges “normal” paragraphs into a single line.
- Inline-mathmode text are not affected by this option.
- Some writers will type paragraphs in multiple lines, likely because their LaTeX editor of choice does not wrap text within a single line. Including this option merges “normal” paragraphs into a single line.
"remove_xspace"
- Some writiers include
\\xspace
in their code (such as in their custom commands) Obsidian does not render these.
- Some writiers include
"remove_ensuremath"
- Some writiers include
\\ensuremath
in their code (such as in their custom commands) Obsidian does not render these.*
- Some writiers include
Type | Default | Details | |
---|---|---|---|
text | str | The LaTeX code to adjust to Markdown. | |
options | list | [‘math_mode_delimiters’, ‘begin_end_equationlike_env’, ‘replace_backtick_and_apostrophe_quotes’, ‘remove_comments’, ‘inline_mathmode_to_own_paragraph’, ‘merge_multiline_paragraph’, ‘remove_xspace’, ‘remove_ensuremath’] | |
Returns | str | Each str specifies what formatting should be done. |
= r'''
text are cellular and the DG category one gets by replacing the
topological spaces ${Mor}_{{\mathcal K}_{{top}}}(X_i,X_j)$ by their
cellular chain complexes equals ${\mathcal K}$.
'''
print(_merge_multilines(text))
are cellular and the DG category one gets by replacing the topological spaces ${Mor}_{{\mathcal K}_{{top}}}(X_i,X_j)$ by their cellular chain complexes equals ${\mathcal K}$.
Here is a basic example:
= r'''I want to talk about \(\mathbb{Z}[i]\). It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
text It has a multiplication structure:
\[ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.\]
Here is an equation:
\begin{equation}
5+7 = 12
\end{equation}
Here is another:
\begin{equation*}
5+6 = 11
\end{equation*}
Here is an align:
\begin{align}
5+7 = 12
\end{align}
Here is another:
\begin{align*}
5+6 = 11
\end{align*}
\begin{eqnarray}
asdf
\end{eqnarray}
\begin{displaymath}
asdf
\end{displaymath}
asdf ``hello''
``a multiline
quotation'' % Sneak comment!
'''
= adjust_common_syntax_to_markdown(text)
sample_output print(sample_output)
assert r'\(' not in sample_output
assert r'\)' not in sample_output
assert r'\[' not in sample_output
assert r'\]' not in sample_output
assert r'$$\begin{align}' in sample_output
assert r'\end{align}$$' in sample_output
assert r'$$\begin{equation}' in sample_output
assert r'\end{equation}$$' in sample_output
assert r'$$\begin{align*}' in sample_output
assert r'\end{align*}$$' in sample_output
assert r'$$\begin{equation*}' in sample_output
assert r'\end{equation*}$$' in sample_output
assert r'$$\begin{eqnarray}' in sample_output
assert r'\end{eqnarray}$$' in sample_output
assert r'$$\begin{displaymath}' in sample_output
assert r'\end{displaymath}$$' in sample_output
assert r"``hello''" not in sample_output
assert r'"hello"' in sample_output
assert '"a multiline quotation"' in sample_output
assert ' "a multiline quotation"' not in sample_output
assert '``' not in sample_output
assert r'%' not in sample_output
assert r'Sneak comment' not in sample_output
I want to talk about $\mathbb{Z}[i]$. It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$. It has a multiplication structure:
$$ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.$$
Here is an equation:
$$\begin{equation}
5+7 = 12
\end{equation}$$
Here is another:
$$\begin{equation*}
5+6 = 11
\end{equation*}$$
Here is an align:
$$\begin{align}
5+7 = 12
\end{align}$$
Here is another:
$$\begin{align*}
5+6 = 11
\end{align*}$$
$$\begin{eqnarray}
asdf
\end{eqnarray}$$
$$\begin{displaymath}
asdf
\end{displaymath}$$
asdf "hello"
"a multiline quotation"
We test the removal of \xspace
and \ensuremath
= r'\ensuremath{\operatorname{GL}_n}\xspace'
text '\\operatorname{GL}_n') test_eq(adjust_common_syntax_to_markdown(text),
Formatting irregularities
Since writers use their own formatting for LaTeX documents, there will be formatting practices that the functions in this module do not account for or do not fully account for. The following parses the LaTeX document to warn the user of formatting choices that trouver
does not account for.
def warn_for_formatting_irregularities(
str
document:
):# TODO: Go through assumptions made by various functions
# TODO: custom command for section
# TODO: custom command is invoked, but does not match explicit formatting, e.g. {\tilde M}
# TODO: warn about \docuemntclass; the most common are `amsart` and `article`, but some writers will use other classes, such as `standalone, svmult, icmart, compositio`
# TODO: warn about \newcounter, cf https://www.overleaf.com/learn/latex/Counters#LaTeX_commands_for_working_with_counters.
# TODO: warn about newcomand and DeclareOperator being declared as custom commands of their own. cf. arinkin_gaitsgory_sscsglc
# TODO: warn about section/subsection/subsubsetion being wrapped around some custom commands.
return
Replace \input
and \include
Writers oftentimes opt to use separate their writings in multiple parts and then incorporate these parts using \input
or \include
. The following functions account for these to put together all the latex code in one place
replace_input_and_include
replace_input_and_include (document:str, dir:os.PathLike, commands:list[tuple[str,int,typing.Optional[st r],str]], repeat_replacing_commands:int=-1)
Sequentially replace invocations of \input
or \include
with the contents of the corresponding files, updating and applying custom commands as needed.
Type | Default | Details | |
---|---|---|---|
document | str | ||
dir | PathLike | The directory containing the .tex files which are to be included. |
|
commands | list | ||
repeat_replacing_commands | int | -1 | this is passed as the repeat argument into the invocation of replace_commands_in_text . |
Returns | str |
The following is a basic example
= _test_directory() / 'latex_examples' / 'latex_example_with_inputs_and_includes'
latex_folder = latex_folder / 'main.tex'
main_file with open(main_file, 'r', encoding='utf-8') as file:
= file.read()
document = divide_preamble(document)
preamble, body = custom_commands(preamble)
commands = replace_input_and_include(body, latex_folder, commands) output
print(output)
\begin{document}
\section*{\underline{Introduction}}
This is the main document. We use {\bf custom commands} here.
% This file defines new commands
Now we can use {\it input command}.
\section*{\underline{Chapter 1}}
This is content from chapter 1. We define {\large new commands} here too.
This function is related to the main document
Let's use {\large include command} from chapter1.
\subsection{A Subsection}
This content is from a subsection file. We use {\small\textsc{local commands}} here.
We also use {\bf main document commands} and {\it input file commands}.
Finally, we use {\small\textsc{another input}}.
\end{document}
Formatting modifications (from Obsidian.md
style Markdown to LaTeX)
As written previously displaymath/equationlike environments (e.g. \begin{equation} \end{equation}
) used in Markdown require surrounding $$
.
The remove_dollar_signs_around_equationlike_envs
function is one function used to revert such markdown-formatted code into code better suited for LaTeX.
remove_dollar_signs_around_equationlike_envs
remove_dollar_signs_around_equationlike_envs (text:str)
*Remove dollar signs preceding and following displaymath/equation-like environments.
Such dollar signs are in place to make the code renderable as Markdown code in Obsidian.md.
This function is essentially the reverse of _replace_equationlike_envs
.*
= remove_dollar_signs_around_equationlike_envs(
output r'''$$\begin{align*}asdf\end{align*}$$''')
print(output)
assert '$' not in output
\begin{align*}asdf\end{align*}