import operator
from unittest import mock
from pathvalidate import validate_filename
from string import ascii_uppercase
from fastcore.test import *
from nbdev import show_doc
helper
Tests
_test_directory
_test_directory ()
Returns the nbs/_tests
directory of the trouver
repository.
Assumes that the current working directory is either the root of the repository or the nbs
folder and the nbs/_tests
folder exists in the repository but the root of the repository does not have a folder named _tests
.
Test files for trouver
are contained in nbs/_tests
directory within the repository. The hidden _test_directory()
method returns this directory.
assert os.path.exists(_test_directory())
os.listdir(_test_directory())
['empty_model_vault',
'latex_examples',
'ml_examples',
'test_vault_1',
'test_vault_2',
'test_vault_3',
'test_vault_4',
'test_vault_5',
'test_vault_6',
'test_vault_7',
'test_vault_8',
'test_vault_9']
regex
find_regex_in_text
find_regex_in_text (text:str, pattern:Union[str,Pattern[str]])
Return ranges in text
where pattern
occurs.
Type | Details | |
---|---|---|
text | str | Text in which to find regex patter |
pattern | str | Pattern[str] | The regex pattern |
Returns | list[tuple[int]] | Each tuple is of the form (a,b) where text[a:b] is the regex match. |
The following example finds the occurrence of the Markdown footnote:
= r'\[\^\d\]'
regex_pattern = '[^1]: asdf'
text
= find_regex_in_text(text, regex_pattern)
output 0,4)])
test_eq(output, [(
= output[0]
start, end '[^1]') test_eq(text[start:end],
If there are multiple matches for the regex pattern, then they are all included in the outputted list.
= r'\d+' # Searches for one or more consecutive digits
regex_pattern = '9000 is a big number. But you know what is bigger? 9001.'
text
= find_regex_in_text(text, regex_pattern)
output len(output), 2)
test_eq(
= output[0]
start, end '9000')
test_eq(text[start:end],
= output[1]
start, end '9001') test_eq(text[start:end],
The following example detects YAML frontmatter text as used in Obsidian. This regex pattern is also used in markdown.markdown.file.find_front_matter_meta_in_markdown_text
.
The regex pattern used is able to detect the frontmatter even when it is empty.
= r'---\n([\S\s]*?)?(?(1)\n|)---'
sample_regex = '---\n---'
sample_str = find_regex_in_text(sample_str, sample_regex)
sample_output assert sample_output == [(0,7)]
= '---\naliases: [this_is_an_aliases_for_the_Obsidian_note]\n---'
sample_str = find_regex_in_text(sample_str, sample_regex)
sample_output assert sample_output == [(0, len(sample_str))] # The entire sample_str is detected.
Contrast the regex pattern above with the pattern ---\n[\S\s]*?\n---
, which does not detect empty YAML frontmatter text.
= '---\n[\S\s]*?\n---'
sample_regex = '---\n---'
sample_str = find_regex_in_text(sample_str, sample_regex)
sample_output assert not sample_output
replace_string_by_indices
replace_string_by_indices (string:str, replace_ranges:Sequence[Union[Sequence[int],in t]], replace_with:Union[Sequence[str],str])
Replace parts of string
at the specified locations”
Use this with find_regex_in_text
.
Parameters
string
-str
replace_ranges
-Sequence[Sequence[int] | int]
- Either a list of lists/tuples of one or two int’s. A list/tuple
[a,b]
or(a,b)
means thatstring[a:b]
is to be replaced.[a]
or(a)
means thatstring[a:]
is to be replaced. The ranges should not overlap and should be arranged in chronological order.
- Either a list of lists/tuples of one or two int’s. A list/tuple
replace_with
-Sequence[str] | str
- The str’s which will replace the parts represented by
replace_ranges
.replace_ranges
andreplace_with
must be both lists or both not lists. If they are lists, they must be of the same length.
- The str’s which will replace the parts represented by
Returns
- str
Type | Details | |
---|---|---|
string | str | String in which to make replacemenets |
replace_ranges | Sequence[Union[Sequence[int], int]] | A list of lists/tuples of int’s or a single list/tuple of int’s. Each |
replace_with | Sequence[str] | str | The str(s) which will replace the substrings at replace_ranges in string . replace_with must be a str exactly when replace_ranges is a Sequence of a single Sequence of int. |
Returns | str | The str obtained by replacing the substrings at replace_range in string by the strs specified by replace_with . |
The following are basic examples:
'hello world', replace_ranges=(0,5), replace_with='hi'), 'hi world')
test_eq(replace_string_by_indices('hello somebody', replace_ranges=[(0,1), (6,10)], replace_with=['', '']), 'ello body') test_eq(replace_string_by_indices(
If replace_ranges
and replace_with
are of different length, then a ValueError
is raised:
with ExceptionExpected(ex=ValueError, regex="are different"):
'hello world', replace_ranges = [(0,5), (6,10)], replace_with = ['']) replace_string_by_indices(
Definitions and notations
I surround definitions and notations by double asterisks **
. The double_asterisk_indices
method finds strings surrounded by double asterisks, the notation_asterisk_indices
method finds notation str, and the definition_asterisk_indices
method finds definition str.
double_asterisk_indices
double_asterisk_indices (text:str)
Return the indices in str
of text surrounded by double asterisks.
Assumes there no LaTeX math mode string has double asterisks.
See Also
Type | Details | |
---|---|---|
text | str | the str in which to find the indices of double asterisk surrounded text. |
Returns | list[tuple[int]] | Each tuple is of the form (start,end) , where text[start:end] is a part in text with double asterisks, including the double asterisks. |
definition_asterisk_indices
definition_asterisk_indices (text:str)
Return the indices of definition text surrounded by double asterisks.
A double-asterisk-surrounded-text is a definition almost always when it is not purely LaTeX math mode text.
Assumes that no LaTeX math mode string has double asterisks and that no LaTeX math mode string has the dollar sign character within it.
Type | Details | |
---|---|---|
text | str | The str in which to find the indices of the definitions surrounded by double asterisks. |
Returns | list[tuple[int]] | Each tuple is of the form (start,end) , where text[start:end] is a substring in text surrounded by double asterisks, including the double asterisks. |
notation_asterisk_indices
notation_asterisk_indices (text:str)
Return the indices of notation text surrounded by double asterisks.
A double-asterisk-surrounded-text is a notation almost always when it is purely LaTeX math mode text.
Assumes that no LaTeX math mode string has the dollar sign character within it.
Type | Details | |
---|---|---|
text | str | the str in which to find the indices of notations surrounded by double asterisks. |
Returns | list[tuple[int]] | Each tuple is of the form (start,end) , where text[start:end] is a part in text with LaTeX math mode text with double asterisks, including the double asterisks. |
Examples
In the following example, scheme
and structure sheaf
are definitions, whereas $\mathcal{O}_X$
is a notation:
= r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
text = double_asterisk_indices(text)
listy
= listy[0]
start, end '**scheme**')
test_eq(text[start:end],
= listy[1]
start, end '**structure sheaf**')
test_eq(text[start:end],
= listy[2]
start, end '**$\mathcal{O}_X$**')
test_eq(text[start:end],
= notation_asterisk_indices(text)
listy = listy[0]
start, end '**$\mathcal{O}_X$**')
test_eq(text[start:end], len(listy), 1)
test_eq(
= definition_asterisk_indices(text)
listy print(listy)
len(listy), 2) test_eq(
[(2, 12), (25, 44)]
The following example has a definition which starts and ends with dollar sign $
characters:
= r'A **$G$-group over a ring $A$** is'
text = notation_asterisk_indices(text)
listy len(listy), 0)
test_eq(
= definition_asterisk_indices(text)
listy len(listy), 0, operator.ge) test(
The following example tests notation_asterisk_indices
for LaTeX str with single asterisks in them:
# Tests LaTeX str with asterisks in them:
= (r'''The **direct image of a sheaf $F^{\prime}$ on $X_{E}^{\prime}$** '''
text r'''is defined to be **$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$** and the '''
r'''**inverse image of a sheaf $F$ on $X_{E}$** is defined to be **$\pi^{*} F=a\left(\pi^{p} F\right)$**.''')
= notation_asterisk_indices(text)
listy len(listy), 2)
test_eq(= listy[0]
start, end r'**$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$**') test_eq(text[start:end],
There are pure LaTeX strings which should be considered definitions, but for the purposes of the code here will be considered notations. For example, Hausdorff spaces in topology are also called \(T_2\) spaces:
= (r"A topological space $X$ is called **$T_2$** if for all $x,y \in X$, "
text r"there exist open neightborhoods $V$ and $W$ around $x$ and $y$ respectively "
r"such that $V \cap W = 0$.")
= notation_asterisk_indices(text)
listy 0][0]:listy[0][1]], r'**$T_2$**') test_eq(text[listy[
Unfortunately, the current implementation of the above methods do not work correctly if there are LaTeX string with double asterisks **
within them.
# TODO: If this is fixed, delete this.
= r'The double dual of $M$ is denoted by **$M^{**}$**.'
text = definition_asterisk_indices(text)
listy print(f"The `definition_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a definition. This is incorrect!")
= notation_asterisk_indices(text)
listy print(f"The `notation_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a notation. I guess this is correct.")
The `definition_asterisk_indices` function detects the substring **$M^{** as a definition. This is incorrect!
The `notation_asterisk_indices` function detects the substring **$M^{**}$** as a notation. I guess this is correct.
A workaround is to replace asterisks *
with the LaTeX \ast
command:
= r'The double dual of $M$ is denoted by **$M^{\ast\ast}$**.'
text = definition_asterisk_indices(text)
listy len(listy), 0)
test_eq(
= notation_asterisk_indices(text)
listy len(listy), 1)
test_eq(0][0]:listy[0][1]], r'**$M^{\ast\ast}$**') test_eq(text[listy[
defs_and_notats_separations
defs_and_notats_separations (text:str)
Finds the indices in the text where double asterisks occur and categorizes whether each index is for a definition or a notation.
Parameters
- text - str
Returns
- list[tuple[int, bool]]
- Each tuple is of the form
(start, end, is_notation)
, wheretext[start:end]
is the double-asterisk surrounded string, including the double asterisks.
- Each tuple is of the form
In the following example, the first two double-asterisk-surrounded-strings are definitions, and the third is a notation:
= r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
text = defs_and_notats_separations(text)
listy assert not listy[0][2]
assert not listy[1][2]
assert listy[2][2]
LaTeX string
latex_indices
latex_indices (text:str)
Returns the indices in the text containing LaTeX str.
This may not work correctly if the text has a LaTeX formatting issue or if any LaTeX string has a dollar sign \$
.
Parameters
- text - str
Returns
- tuple[int]
- Each tuple is of the form
(start, end)
wheretext[start:end]
is a LaTeX string, including any leading trailing dollar signs ($
or$$
).
- Each tuple is of the form
Here are some basic uses:
= r'$$5 \neq 7$$ is a LaTeX equation.'
text = latex_indices(text)
listy assert len(listy) == 1
= listy[0]
start, end r'$$5 \neq 7$$')
test_eq(text[start:end],
= r'$\mathcal{O}_X$ denotes the structure sheaf.'
text = latex_indices(text)
listy assert len(listy) == 1
= listy[0]
start, end r'$\mathcal{O}_X$')
test_eq(text[start:end],
= r'$$\n5 \neq 7\n$$'
text = latex_indices(text)
listy assert len(listy) == 1
If there is a dollar sign symbol \$
outside of a LaTeX string, then the latex_indices
function works as expected; the dollar signs are not considered to be part of any LaTeX string:
= r'\$6.2.4 helo blah $15+6+21$' # Avoid detecting \$ as latex start/end
text = latex_indices(text)
listy = listy[0]
start, end r'$15+6+21$') test_eq(text[start:end],
However, the current implementation of the latex_indices
function does not correctly detect LaTeX strings with dollar sign symbols \$
in them:
# TODO: delete this example if the issue is fixed.
= r'\$6.2.4 helo blah $\$37$ are needed for stuff.'
text = latex_indices(text)
listy = listy[0]
start, end assert len(listy) == 1
print(text[listy[0][0]:listy[0][1]]) # This should print `$\$`, which is at the start of `$\$37$`.
# assert text[start:end] == r'$\$37$'
$\$
Numbers
is_number
is_number (x:Union[float,int,complex,str])
Return True
if the input x
represents a number.
This function is different from Python’s built-in is_numeric
function, which returns True
when all characters of a string are digits.
assert is_number("3.45")
assert is_number(1 + 5j)
assert is_number(5)
assert is_number(0.0)
assert not is_number("3.43.55")
assert not is_number("hie")
assert not is_number("[^1]")
assert not is_number(None)
Accented characters
Files and folders
File existence
file_existence_test
file_existence_test (path:os.PathLike, relative_to:Optional[os.PathLike]=None)
Deprecated. Use existing_path
instead.
Returns a path relative to a specified path as an absolute path that exists.
Raises - FileNotFoundError
- If relative_to
is not None
but does not exist, or if file
does not exist.
Notes - This function may add the string '\\?\'
in front, which identifies very long paths.
Type | Default | Details | |
---|---|---|---|
path | PathLike | A file or directory path. Either absolute or relative to relative_to . |
|
relative_to | Optional[PathLike] | None | Path to the directory that file is relative to. If None , then path is an absolute path. |
Returns | Path | The path formed by relative_to adjoined with path . Defaults to None |
existing_path
existing_path (path:os.PathLike, relative_to:Optional[os.PathLike]=None)
Returns a path relative to a specified path as an absolute path that exists.
Raises
FileNotFoundError
- If
relative_to
is notNone
but does not exist, or iffile
does not exist.
- If
ValueError
- If
relative_to
is notNone
and yet not an absolute path, or ifrelative_to
isNone
at yetpath
is not an absolute path.
- If
Notes - This function may add the string '\\?\'
in front, which identifies very long paths.
Type | Default | Details | |
---|---|---|---|
path | PathLike | A file or directory path. Either absolute or relative to relative_to . |
|
relative_to | Optional[PathLike] | None | Path to the directory that file is relative to. If None , then path is an absolute path. |
Returns | Path | The path formed by relative_to adjoined with path . Defaults to None |
In the following example, the existing_path
method returns an existing absolute path \(p_2 \backslash p_1\) which is equivalent to a specified path \(p_1\) relative to an existing absolute path \(p_2\). Note that all paths and os
methods are mocked:
with (mock.patch('os.path.exists') as mock_path_exists,
'os.path.isabs') as mock_is_abs):
mock.patch(= True
mock_path_exists.return_value = True
mock_is_abs.return_value = existing_path('mock_existing_relative_path', 'mock_existing_absolute_path')
path_1 'mock_existing_absolute_path') / 'mock_existing_relative_path', path_1) test_eq(Path(
If the desired path is very long in Windows, then the \\?\
may be appended in front of the absolute path so that Python can actually find the path, cf. https://stackoverflow.com/questions/36219317/pathname-too-long-to-open:
# TODO provide an example
with (mock.patch('os.path.exists') as mock_path_exists,
'os.path.isabs') as mock_is_abs):
mock.patch(print('hi')
hi
If the parameter relative_to
, which is supposed to be an absolute path, is not None
and not absolute, then a ValueError
is raised:
with (ExceptionExpected(ex=ValueError, regex='absolute path'),
'os.path.exists') as mock_path_exists,
mock.patch('os.path.isabs') as mock_is_abs):
mock.patch(= False
mock_is_abs.return_value = 'mock_relative_path_that_is_not_None'
path = 'mock_non_absolute_path'
relative_to 'mock_relative_to_that_is_not_None', relative_to)
existing_path(
with (ExceptionExpected(ex=ValueError, regex='absolute path'),
'os.path.exists') as mock_path_exists,
mock.patch('os.path.isabs') as mock_is_abs):
mock.patch(= False
mock_is_abs.return_value # It does not matter what `path`` is - as long as `relative_to`` is not `None` and not absolute, the ValueError is raised.
= None
path = 'mock_non_absolute_path'
relative_to 'mock_relative_to_that_is_not_None', relative_to) existing_path(
If the parameter relative_to
is None
and the paramether path
is not absolute, then a ValueError
is raised:
with (ExceptionExpected(ex=ValueError, regex='absolute path'),
'os.path.isabs') as mock_is_abs):
mock.patch(= False
mock_is_abs.return_value = None
relative_to = 'mock_non_absolute_path'
path existing_path(path, relative_to)
If relative_to
does not exist or if path
does not exist, then a FileNotFoundError
is raised:
# In this example, both `relative_to` and `path` are specified, and `relative_to`
# is a non-existent path.`
with (ExceptionExpected(ex=FileNotFoundError),
'os.path.exists') as mock_path_exists,
mock.patch('os.path.isabs') as mock_is_abs):
mock.patch(= 'mock_non_existent_absolute_path'
relative_to = 'mock_some_relative_path'
path def relative_to_does_not_exist(path_to_check):
return False if path_to_check is relative_to else True
def relative_to_is_absolute_path(path_to_check):
return True if path_to_check is relative_to else False
= relative_to_does_not_exist
mock_path_exists.side_effect = relative_to_is_absolute_path
mock_is_abs.side_effect
existing_path(path, relative_to)
# In this example, both `relative_to` and `path` are specified, and `path`
# is a non-existent path.`, whereas `relative_to` exists.
with (ExceptionExpected(ex=FileNotFoundError),
'os.path.exists') as mock_path_exists,
mock.patch('os.path.isabs') as mock_is_abs):
mock.patch(= 'mock_existent_absolute_path'
relative_to = 'mock_non_existent_relative_path'
path def only_relative_to_exists(path_to_check):
# only `relative_to` exists; all other paths of interest do not exist.
return path_to_check is relative_to
def relative_to_is_absolute_path(path_to_check):
return True if path_to_check is relative_to else False
= only_relative_to_exists
mock_path_exists.side_effect = relative_to_is_absolute_path
mock_is_abs.side_effect existing_path(path, relative_to)
Paths without extensions
path_name_no_ext
path_name_no_ext (path:os.PathLike)
Return the name of a file or directory from its path without the extension.
The file or directory does not have to exist.
Type | Details | |
---|---|---|
path | PathLike | The path of the file or directory. This may be absolute or relative to any directory. |
Returns | str | The name of the file or directory without the extension. |
Basic usage:
= Path('hypothetical_directory')
path / 'hypothetical_subdirectory'), 'hypothetical_subdirectory')
test_eq(path_name_no_ext(path / 'hypotehtical_subdirectory' / 'hypothetical_file.md'), 'hypothetical_file') test_eq(path_name_no_ext(path
The path does not have to exist.
/ 'this_folder_does_not_exist'), 'this_folder_does_not_exist') test_eq(path_name_no_ext(path
On paths to files with “multiple extensions”, the function returns the file name without the last extension only.
'archived_file_somewhere.7z.zip.tar'), 'archived_file_somewhere.7z.zip') test_eq(path_name_no_ext(
path_no_ext
path_no_ext (path:os.PathLike)
Returns the path of a file or directory without the extension.
The file or directory does not have to exist.
Type | Details | |
---|---|---|
path | PathLike | The path of the file or directory. This may be absolute or relative to any directory. |
Returns | str | The path of the file or directory without the extension. If path is a path to a directory, then the output should be essentially the same as path . |
Basic usage - the path does not have to exist:
assert path_no_ext('C:\\hi') == 'C:\\hi'
assert path_no_ext('greetings\\file.txt') == 'greetings\\file'
Read text from file
text_from_file
text_from_file (path:os.PathLike, encoding:str='utf8')
Return the entire text from a file.
Assuems that the file can be encoded in the specified encoding
Type | Default | Details | |
---|---|---|---|
path | PathLike | The absolute path of the file. | |
encoding | str | utf8 | The encoding of the file to be read. Defaults to 'utf8' . |
Returns | str | The entire text from a file |
The text_from_file
method is a quick method to extract the text from a file.
# TODO: examples/tests
files_of_format_sorted
files_of_format_sorted (directory:os.PathLike, extension:str='txt')
Return a list of path str of files in the directory (but not subdirectories) sorted via natsort
.
Type | Default | Details | |
---|---|---|---|
directory | PathLike | The directory in which to find the files | |
extension | str | txt | Extension of the files to find. Defaults to ‘txt’. |
Returns | list[str] |
In the following example, we mock a folder with numbered files. files_of_format_sorted
returns them in the “natural” order.
with (mock.patch('glob.glob') as mock_glob):
= Path('some_directory')
mock_directory # `glob_results`` is not sorted in "natural" order.
= [mock_directory / f'{i}.txt' for i in range (10,0, -1)]
glob_results = glob_results
mock_glob.return_value
# mock to make sure that natsorted was called.
with mock.patch(__name__ + '.natsorted') as mock_natsorted:
= files_of_format_sorted(mock_directory)
mock_files
mock_natsorted.assert_called_with(mock_glob.return_value)
# Now print out that the files are sorted in "natural" order.
= files_of_format_sorted(mock_directory)
mock_files print(mock_files)
test_shuffled(glob_results, mock_files)
[Path('some_directory/1.txt'), Path('some_directory/2.txt'), Path('some_directory/3.txt'), Path('some_directory/4.txt'), Path('some_directory/5.txt'), Path('some_directory/6.txt'), Path('some_directory/7.txt'), Path('some_directory/8.txt'), Path('some_directory/9.txt'), Path('some_directory/10.txt')]
Date and time
current_time_formatted_to_minutes
current_time_formatted_to_minutes ()
Return the current time to minutes.
Returns
- str
- In UTC time, to minutes.
The current_time_formatted_to_minutes
function is a convenience function that returns a string formatting current UTC time:
= current_time_formatted_to_minutes()
time_str print(time_str)
2023-05-20T16:52
Use the dattime.dattime.fromisoformat
function to convert back the formatted string to a datetime.datetime
object:
= datetime.datetime.fromisoformat(time_str) dt
Topological sort
# TODO: reformat the parameter specifications and add examples.
It is sometimes convenient to topological sort.
natsort_comparison
natsort_comparison (str1:str, str2:str)
Parameters - str1 - str - str2 - str
default_str_comparison
default_str_comparison (str1:str, str2:str)
Parameters - str1 - str - str2 - str
containing_string_priority
containing_string_priority (str1:str, str2:str)
Returns 1, 0, -1 depending on whether one string contains the other.
TODO make the string containment criterion looser, e.g. finite Galois etale cover “contains” finite etale cover.
Parameters - str1 - str - str2 - str
graph_for_topological_sort
graph_for_topological_sort (items_to_sort:collections.abc.Iterable[str], key_order:Callable[[str,str],int])
Parameters - items_to_sort - Iterable[str] - key_order: Callable[[str, str], int] - Comparing str1 against str2 results in a positive number if str1 is “greater” than str2 (i.e. str1 is of a later priority)
Returns - dict[str, set[str]] - A dict whose keys are the elements k
of items_to_sort
and whose values are sets of elements k2
of items_to_sort
such that key_order(k, k2) > 0
.
dict_with_keys_topologically_sorted
dict_with_keys_topologically_sorted (dict_to_sort:dict[str], key_order:Callable[[str,str],int], reverse:bool=False)
Returns an OrderedDict whose keys are sorted topologically by the specified key order.
Parameters - dict_to_sort - dict[str] - The dict whose keys need to be ordered. - key_order - The comparison function on the keys of dict_to_sort
. Defaults to the key function for the comparison [
containing_string_priority](https://hyunjongkimmath.github.io/trouver/helper.html#containing_string_priority)
. - reverse - bool - Defaults to False
Returns - OrderedDict[str]
# TODO: test
Alphabet
With Greek letters typed in LaTeX, I sometimes need to interpret them like their English equivalents, e.g. \alpha
is like A
, etc.
alphabet_or_latex_command_to_alphabet_group
alphabet_or_latex_command_to_alphabet_group (character)
alphabet_or_latex_command_to_alphabet
alphabet_or_latex_command_to_alphabet (character)
Returns the alphabet that the character “corresponds to”.
alphabet_to_alphabet_group
alphabet_to_alphabet_group (character)
Returns the alphabet group
In my vaults, I often alphabetize things and also group the alphabet as follows: - A-E - F-J - K-O - P-T - U-V
Parameters - character - str
Returns - str or None
- Returns None
if character
is not an alphabet.
= {}
dicty for c in ascii_uppercase:
= alphabet_to_alphabet_group(c)
dicty[c] print(dicty)
{'A': 'A-E', 'B': 'A-E', 'C': 'A-E', 'D': 'A-E', 'E': 'A-E', 'F': 'F-J', 'G': 'F-J', 'H': 'F-J', 'I': 'F-J', 'J': 'F-J', 'K': 'K-O', 'L': 'K-O', 'M': 'K-O', 'N': 'K-O', 'O': 'K-O', 'P': 'P-T', 'Q': 'P-T', 'R': 'P-T', 'S': 'P-T', 'T': 'P-T', 'U': 'U-Z', 'V': 'U-Z', 'W': 'U-Z', 'X': 'U-Z', 'Y': 'U-Z', 'Z': 'U-Z'}
Getting a path-valid string from a string containing latex
latex_to_path_accepted_string
latex_to_path_accepted_string (latex:str)
Convert a latex string to a path accepted string
The latex_to_path_accepted_string
function “cleans” a latex str into a path-valid string for the purposes of making files.
= r'\mathcal{O}_X'
sample_1 = latex_to_path_accepted_string(sample_1)
output_1 print(output_1)
assert 'O' in output_1 and 'X' in output_1
validate_filename(output_1)
= r'\operatorname{Gal}(L/K)'
sample_2 = latex_to_path_accepted_string(sample_2)
output_2 print(output_2)
assert 'Gal' in output_2 and 'L' in output_2 and 'K' in output_2
validate_filename(output_2)
# Example found in https://arxiv.org/abs/1607.04471
= r'\begin{equation} \label{escape rate} G_{F_t}(z,w) = \lim_{n\to\infty} \frac{1}{d^n} \log \| F_t^n(z,w) \|, \end{equation}'
sample_3 = latex_to_path_accepted_string(sample_3)
output_3 print(output_3)
validate_filename(output_3)
O_X
Gal_L_K
label_escape_rate_G_F_t_z_w_lim_n_to_infty_frac_1_d_n_log_F_t_n_z_w