helper

Helper functions
import operator
from unittest import mock

from pathvalidate import validate_filename

from string import ascii_uppercase
from fastcore.test import *
from nbdev import show_doc

Tests


source

_test_directory

 _test_directory ()

Returns the nbs/_tests directory of the trouver repository.

Assumes that the current working directory is either the root of the repository or the nbs folder and the nbs/_tests folder exists in the repository but the root of the repository does not have a folder named _tests.

Test files for trouver are contained in nbs/_tests directory within the repository. The hidden _test_directory() method returns this directory.

assert os.path.exists(_test_directory())
os.listdir(_test_directory())
['empty_model_vault',
 'latex_examples',
 'ml_examples',
 'test_vault_1',
 'test_vault_2',
 'test_vault_3',
 'test_vault_4',
 'test_vault_5',
 'test_vault_6',
 'test_vault_7',
 'test_vault_8',
 'test_vault_9']

regex


source

find_regex_in_text

 find_regex_in_text (text:str, pattern:Union[str,Pattern[str]])

Return ranges in text where pattern occurs.

Type Details
text str Text in which to find regex patter
pattern str | Pattern[str] The regex pattern
Returns list[tuple[int]] Each tuple is of the form (a,b) where text[a:b] is the regex match.

The following example finds the occurrence of the Markdown footnote:

regex_pattern = r'\[\^\d\]'
text = '[^1]: asdf'

output = find_regex_in_text(text, regex_pattern)
test_eq(output, [(0,4)])

start, end = output[0]
test_eq(text[start:end], '[^1]')

If there are multiple matches for the regex pattern, then they are all included in the outputted list.

regex_pattern = r'\d+'  # Searches for one or more consecutive digits
text = '9000 is a big number. But you know what is bigger? 9001.'

output = find_regex_in_text(text, regex_pattern)
test_eq(len(output), 2)

start, end = output[0]
test_eq(text[start:end], '9000')

start, end = output[1]
test_eq(text[start:end], '9001')

The following example detects YAML frontmatter text as used in Obsidian. This regex pattern is also used in markdown.markdown.file.find_front_matter_meta_in_markdown_text.

The regex pattern used is able to detect the frontmatter even when it is empty.

sample_regex = r'---\n([\S\s]*?)?(?(1)\n|)---'
sample_str = '---\n---'
sample_output = find_regex_in_text(sample_str, sample_regex)
assert sample_output == [(0,7)]

sample_str = '---\naliases: [this_is_an_aliases_for_the_Obsidian_note]\n---'
sample_output = find_regex_in_text(sample_str, sample_regex)
assert sample_output == [(0, len(sample_str))]  # The entire sample_str is detected.

Contrast the regex pattern above with the pattern ---\n[\S\s]*?\n---, which does not detect empty YAML frontmatter text.

sample_regex = '---\n[\S\s]*?\n---'
sample_str = '---\n---'
sample_output = find_regex_in_text(sample_str, sample_regex)
assert not sample_output

source

replace_string_by_indices

 replace_string_by_indices (string:str,
                            replace_ranges:Sequence[Union[Sequence[int],in
                            t]], replace_with:Union[Sequence[str],str])

Replace parts of string at the specified locations”

Use this with find_regex_in_text.

Parameters

  • string - str
  • replace_ranges - Sequence[Sequence[int] | int]
    • Either a list of lists/tuples of one or two int’s. A list/tuple [a,b] or (a,b) means that string[a:b] is to be replaced. [a] or (a) means that string[a:] is to be replaced. The ranges should not overlap and should be arranged in chronological order.
  • replace_with - Sequence[str] | str
    • The str’s which will replace the parts represented by replace_ranges. replace_ranges and replace_with must be both lists or both not lists. If they are lists, they must be of the same length.

Returns

  • str
Type Details
string str String in which to make replacemenets
replace_ranges Sequence[Union[Sequence[int], int]] A list of lists/tuples of int’s or a single list/tuple of int’s. Each
replace_with Sequence[str] | str The str(s) which will replace the substrings at replace_ranges in string. replace_with must be a str exactly when replace_ranges is a Sequence of a single Sequence of int.
Returns str The str obtained by replacing the substrings at replace_range in string by the strs specified by replace_with.

The following are basic examples:

test_eq(replace_string_by_indices('hello world', replace_ranges=(0,5), replace_with='hi'), 'hi world')
test_eq(replace_string_by_indices('hello somebody', replace_ranges=[(0,1), (6,10)], replace_with=['', '']), 'ello body')

If replace_ranges and replace_with are of different length, then a ValueError is raised:

with ExceptionExpected(ex=ValueError, regex="are different"):
    replace_string_by_indices('hello world', replace_ranges = [(0,5), (6,10)], replace_with = [''])

Definitions and notations

I surround definitions and notations by double asterisks **. The double_asterisk_indices method finds strings surrounded by double asterisks, the notation_asterisk_indices method finds notation str, and the definition_asterisk_indices method finds definition str.


source

double_asterisk_indices

 double_asterisk_indices (text:str)

Return the indices in str of text surrounded by double asterisks.

Assumes there no LaTeX math mode string has double asterisks.

See Also

Type Details
text str the str in which to find the indices of double asterisk surrounded text.
Returns list[tuple[int]] Each tuple is of the form (start,end), where text[start:end] is a part in text with double asterisks, including the double asterisks.

source

definition_asterisk_indices

 definition_asterisk_indices (text:str)

Return the indices of definition text surrounded by double asterisks.

A double-asterisk-surrounded-text is a definition almost always when it is not purely LaTeX math mode text.

Assumes that no LaTeX math mode string has double asterisks and that no LaTeX math mode string has the dollar sign character within it.

Type Details
text str The str in which to find the indices of the definitions surrounded by double asterisks.
Returns list[tuple[int]] Each tuple is of the form (start,end), where text[start:end] is a substring in text surrounded by double asterisks, including the double asterisks.

source

notation_asterisk_indices

 notation_asterisk_indices (text:str)

Return the indices of notation text surrounded by double asterisks.

A double-asterisk-surrounded-text is a notation almost always when it is purely LaTeX math mode text.

Assumes that no LaTeX math mode string has the dollar sign character within it.

Type Details
text str the str in which to find the indices of notations surrounded by double asterisks.
Returns list[tuple[int]] Each tuple is of the form (start,end), where text[start:end] is a part in text with LaTeX math mode text with double asterisks, including the double asterisks.

Examples

In the following example, scheme and structure sheaf are definitions, whereas $\mathcal{O}_X$ is a notation:

text = r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
listy = double_asterisk_indices(text)

start, end = listy[0]
test_eq(text[start:end], '**scheme**')

start, end = listy[1]
test_eq(text[start:end], '**structure sheaf**')

start, end = listy[2]
test_eq(text[start:end], '**$\mathcal{O}_X$**')

listy = notation_asterisk_indices(text)
start, end = listy[0]
test_eq(text[start:end], '**$\mathcal{O}_X$**')
test_eq(len(listy), 1)

listy = definition_asterisk_indices(text)
print(listy)
test_eq(len(listy), 2)
[(2, 12), (25, 44)]

The following example has a definition which starts and ends with dollar sign $ characters:

text = r'A **$G$-group over a ring $A$** is'
listy = notation_asterisk_indices(text)
test_eq(len(listy), 0)

listy = definition_asterisk_indices(text)
test(len(listy), 0, operator.ge)

The following example tests notation_asterisk_indices for LaTeX str with single asterisks in them:

# Tests LaTeX str with asterisks in them:
text = (r'''The **direct image of a sheaf $F^{\prime}$ on $X_{E}^{\prime}$** '''
        r'''is defined to be **$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$** and the '''
        r'''**inverse image of a sheaf $F$ on $X_{E}$** is defined to be **$\pi^{*} F=a\left(\pi^{p} F\right)$**.''')
listy = notation_asterisk_indices(text)
test_eq(len(listy), 2)
start, end = listy[0]
test_eq(text[start:end], r'**$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$**')

There are pure LaTeX strings which should be considered definitions, but for the purposes of the code here will be considered notations. For example, Hausdorff spaces in topology are also called \(T_2\) spaces:

text = (r"A topological space $X$ is called **$T_2$** if for all $x,y \in X$, "
        r"there exist open neightborhoods $V$ and $W$ around $x$ and $y$ respectively "
        r"such that $V \cap W = 0$.")
listy = notation_asterisk_indices(text)
test_eq(text[listy[0][0]:listy[0][1]], r'**$T_2$**')

Unfortunately, the current implementation of the above methods do not work correctly if there are LaTeX string with double asterisks ** within them.

# TODO: If this is fixed, delete this.
text = r'The double dual of $M$ is denoted by **$M^{**}$**.'
listy = definition_asterisk_indices(text)
print(f"The `definition_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a definition. This is incorrect!")

listy = notation_asterisk_indices(text)
print(f"The `notation_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a notation. I guess this is correct.")
The `definition_asterisk_indices` function detects the substring **$M^{** as a definition. This is incorrect!
The `notation_asterisk_indices` function detects the substring **$M^{**}$** as a notation. I guess this is correct.

A workaround is to replace asterisks * with the LaTeX \ast command:

text = r'The double dual of $M$ is denoted by **$M^{\ast\ast}$**.'
listy = definition_asterisk_indices(text)
test_eq(len(listy), 0)

listy = notation_asterisk_indices(text)
test_eq(len(listy), 1)
test_eq(text[listy[0][0]:listy[0][1]], r'**$M^{\ast\ast}$**')

source

defs_and_notats_separations

 defs_and_notats_separations (text:str)

Finds the indices in the text where double asterisks occur and categorizes whether each index is for a definition or a notation.

Parameters

  • text - str

Returns

  • list[tuple[int, bool]]
    • Each tuple is of the form (start, end, is_notation), where text[start:end] is the double-asterisk surrounded string, including the double asterisks.

In the following example, the first two double-asterisk-surrounded-strings are definitions, and the third is a notation:

text = r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
listy = defs_and_notats_separations(text)
assert not listy[0][2]
assert not listy[1][2]
assert listy[2][2]

LaTeX string


source

latex_indices

 latex_indices (text:str)

Returns the indices in the text containing LaTeX str.

This may not work correctly if the text has a LaTeX formatting issue or if any LaTeX string has a dollar sign \$.

Parameters

  • text - str

Returns

  • tuple[int]
    • Each tuple is of the form (start, end) where text[start:end] is a LaTeX string, including any leading trailing dollar signs ($ or $$).

Here are some basic uses:

text = r'$$5 \neq 7$$ is a LaTeX equation.'
listy = latex_indices(text)
assert len(listy) == 1
start, end = listy[0]
test_eq(text[start:end], r'$$5 \neq 7$$')

text = r'$\mathcal{O}_X$ denotes the structure sheaf.'
listy = latex_indices(text)
assert len(listy) == 1
start, end = listy[0]
test_eq(text[start:end], r'$\mathcal{O}_X$')

text = r'$$\n5 \neq 7\n$$'
listy = latex_indices(text)
assert len(listy) == 1

If there is a dollar sign symbol \$ outside of a LaTeX string, then the latex_indices function works as expected; the dollar signs are not considered to be part of any LaTeX string:

text = r'\$6.2.4 helo blah $15+6+21$'  # Avoid detecting \$ as latex start/end
listy = latex_indices(text)
start, end = listy[0]
test_eq(text[start:end], r'$15+6+21$')

However, the current implementation of the latex_indices function does not correctly detect LaTeX strings with dollar sign symbols \$ in them:

# TODO: delete this example if the issue is fixed.
text = r'\$6.2.4 helo blah $\$37$ are needed for stuff.' 
listy = latex_indices(text)
start, end = listy[0]
assert len(listy) == 1
print(text[listy[0][0]:listy[0][1]])  # This should print `$\$`, which is at the start of `$\$37$`.
# assert text[start:end] == r'$\$37$'
$\$

Numbers


source

is_number

 is_number (x:Union[float,int,complex,str])

Return True if the input x represents a number.

This function is different from Python’s built-in is_numeric function, which returns True when all characters of a string are digits.

assert is_number("3.45")
assert is_number(1 + 5j)
assert is_number(5)
assert is_number(0.0)
assert not is_number("3.43.55")
assert not is_number("hie")
assert not is_number("[^1]")
assert not is_number(None)

Accented characters

Files and folders

File existence


source

file_existence_test

 file_existence_test (path:os.PathLike,
                      relative_to:Optional[os.PathLike]=None)

Deprecated. Use existing_path instead.

Returns a path relative to a specified path as an absolute path that exists.

Raises - FileNotFoundError - If relative_to is not None but does not exist, or if file does not exist.

Notes - This function may add the string '\\?\' in front, which identifies very long paths.

Type Default Details
path PathLike A file or directory path. Either absolute or relative to relative_to.
relative_to Optional[PathLike] None Path to the directory that file is relative to. If None, then path is an absolute path.
Returns Path The path formed by relative_to adjoined with path. Defaults to None

source

existing_path

 existing_path (path:os.PathLike, relative_to:Optional[os.PathLike]=None)

Returns a path relative to a specified path as an absolute path that exists.

Raises

  • FileNotFoundError
    • If relative_to is not None but does not exist, or if file does not exist.
  • ValueError
    • If relative_to is not None and yet not an absolute path, or if relative_to is None at yet path is not an absolute path.

Notes - This function may add the string '\\?\' in front, which identifies very long paths.

Type Default Details
path PathLike A file or directory path. Either absolute or relative to relative_to.
relative_to Optional[PathLike] None Path to the directory that file is relative to. If None, then path is an absolute path.
Returns Path The path formed by relative_to adjoined with path. Defaults to None

In the following example, the existing_path method returns an existing absolute path \(p_2 \backslash p_1\) which is equivalent to a specified path \(p_1\) relative to an existing absolute path \(p_2\). Note that all paths and os methods are mocked:

with (mock.patch('os.path.exists') as mock_path_exists,
      mock.patch('os.path.isabs') as mock_is_abs):
    mock_path_exists.return_value = True
    mock_is_abs.return_value = True
    path_1 = existing_path('mock_existing_relative_path', 'mock_existing_absolute_path')
    test_eq(Path('mock_existing_absolute_path') / 'mock_existing_relative_path', path_1)

If the desired path is very long in Windows, then the \\?\ may be appended in front of the absolute path so that Python can actually find the path, cf. https://stackoverflow.com/questions/36219317/pathname-too-long-to-open:

# TODO provide an example
with (mock.patch('os.path.exists') as mock_path_exists,
      mock.patch('os.path.isabs') as mock_is_abs):
  print('hi')
hi

If the parameter relative_to, which is supposed to be an absolute path, is not None and not absolute, then a ValueError is raised:

with (ExceptionExpected(ex=ValueError, regex='absolute path'),
      mock.patch('os.path.exists') as mock_path_exists,
      mock.patch('os.path.isabs') as mock_is_abs):
    mock_is_abs.return_value = False
    path = 'mock_relative_path_that_is_not_None'
    relative_to = 'mock_non_absolute_path'
    existing_path('mock_relative_to_that_is_not_None', relative_to)


with (ExceptionExpected(ex=ValueError, regex='absolute path'),
      mock.patch('os.path.exists') as mock_path_exists,
      mock.patch('os.path.isabs') as mock_is_abs):
    mock_is_abs.return_value = False
    # It does not matter what `path`` is - as long as `relative_to`` is not `None` and not absolute, the ValueError is raised.
    path = None  
    relative_to = 'mock_non_absolute_path'
    existing_path('mock_relative_to_that_is_not_None', relative_to)

If the parameter relative_to is None and the paramether path is not absolute, then a ValueError is raised:

with (ExceptionExpected(ex=ValueError, regex='absolute path'),
      mock.patch('os.path.isabs') as mock_is_abs):
    mock_is_abs.return_value = False
    relative_to = None
    path = 'mock_non_absolute_path'
    existing_path(path, relative_to)

If relative_to does not exist or if path does not exist, then a FileNotFoundError is raised:

# In this example, both `relative_to` and `path` are specified, and `relative_to`
# is a non-existent path.`
with (ExceptionExpected(ex=FileNotFoundError),
      mock.patch('os.path.exists') as mock_path_exists,
      mock.patch('os.path.isabs') as mock_is_abs):
    relative_to = 'mock_non_existent_absolute_path'
    path = 'mock_some_relative_path'
    def relative_to_does_not_exist(path_to_check):
      return False if path_to_check is relative_to else True
    def relative_to_is_absolute_path(path_to_check):
      return True if path_to_check is relative_to else False

    mock_path_exists.side_effect = relative_to_does_not_exist
    mock_is_abs.side_effect = relative_to_is_absolute_path
    existing_path(path, relative_to)


# In this example, both `relative_to` and `path` are specified, and `path`
# is a non-existent path.`, whereas `relative_to` exists.
with (ExceptionExpected(ex=FileNotFoundError),
      mock.patch('os.path.exists') as mock_path_exists,
      mock.patch('os.path.isabs') as mock_is_abs):
    relative_to = 'mock_existent_absolute_path'
    path = 'mock_non_existent_relative_path'
    def only_relative_to_exists(path_to_check):
      # only `relative_to` exists; all other paths of interest do not exist.
      return path_to_check is relative_to
    def relative_to_is_absolute_path(path_to_check):
      return True if path_to_check is relative_to else False

    mock_path_exists.side_effect = only_relative_to_exists
    mock_is_abs.side_effect = relative_to_is_absolute_path
    existing_path(path, relative_to)

Paths without extensions


source

path_name_no_ext

 path_name_no_ext (path:os.PathLike)

Return the name of a file or directory from its path without the extension.

The file or directory does not have to exist.

Type Details
path PathLike The path of the file or directory. This may be absolute or relative to any directory.
Returns str The name of the file or directory without the extension.

Basic usage:

path = Path('hypothetical_directory')
test_eq(path_name_no_ext(path / 'hypothetical_subdirectory'),  'hypothetical_subdirectory')
test_eq(path_name_no_ext(path / 'hypotehtical_subdirectory' / 'hypothetical_file.md'),  'hypothetical_file')

The path does not have to exist.

test_eq(path_name_no_ext(path / 'this_folder_does_not_exist'), 'this_folder_does_not_exist')

On paths to files with “multiple extensions”, the function returns the file name without the last extension only.

test_eq(path_name_no_ext('archived_file_somewhere.7z.zip.tar'),  'archived_file_somewhere.7z.zip')

source

path_no_ext

 path_no_ext (path:os.PathLike)

Returns the path of a file or directory without the extension.

The file or directory does not have to exist.

Type Details
path PathLike The path of the file or directory. This may be absolute or relative to any directory.
Returns str The path of the file or directory without the extension. If path is a path to a directory, then the output should be essentially the same as path.

Basic usage - the path does not have to exist:

assert path_no_ext('C:\\hi') == 'C:\\hi'
assert path_no_ext('greetings\\file.txt') == 'greetings\\file'

Read text from file


source

text_from_file

 text_from_file (path:os.PathLike, encoding:str='utf8')

Return the entire text from a file.

Assuems that the file can be encoded in the specified encoding

Type Default Details
path PathLike The absolute path of the file.
encoding str utf8 The encoding of the file to be read. Defaults to 'utf8'.
Returns str The entire text from a file

The text_from_file method is a quick method to extract the text from a file.

# TODO: examples/tests

source

files_of_format_sorted

 files_of_format_sorted (directory:os.PathLike, extension:str='txt')

Return a list of path str of files in the directory (but not subdirectories) sorted via natsort.

Type Default Details
directory PathLike The directory in which to find the files
extension str txt Extension of the files to find. Defaults to ‘txt’.
Returns list[str]

In the following example, we mock a folder with numbered files. files_of_format_sorted returns them in the “natural” order.

with (mock.patch('glob.glob') as mock_glob):
    mock_directory = Path('some_directory')
    # `glob_results`` is not sorted in "natural" order. 
    glob_results = [mock_directory / f'{i}.txt' for i in range (10,0, -1)]
    mock_glob.return_value = glob_results

    # mock to make sure that natsorted was called.
    with mock.patch(__name__ + '.natsorted') as mock_natsorted:
      mock_files = files_of_format_sorted(mock_directory)
      mock_natsorted.assert_called_with(mock_glob.return_value)

    # Now print out that the files are sorted in "natural" order.
    mock_files = files_of_format_sorted(mock_directory)
    print(mock_files)
    test_shuffled(glob_results, mock_files)
[Path('some_directory/1.txt'), Path('some_directory/2.txt'), Path('some_directory/3.txt'), Path('some_directory/4.txt'), Path('some_directory/5.txt'), Path('some_directory/6.txt'), Path('some_directory/7.txt'), Path('some_directory/8.txt'), Path('some_directory/9.txt'), Path('some_directory/10.txt')]

Date and time


source

current_time_formatted_to_minutes

 current_time_formatted_to_minutes ()

Return the current time to minutes.

Returns

  • str
    • In UTC time, to minutes.

The current_time_formatted_to_minutes function is a convenience function that returns a string formatting current UTC time:

time_str = current_time_formatted_to_minutes()
print(time_str)
2023-05-20T16:52

Use the dattime.dattime.fromisoformat function to convert back the formatted string to a datetime.datetime object:

dt = datetime.datetime.fromisoformat(time_str)

Topological sort

# TODO: reformat the parameter specifications and add examples.

It is sometimes convenient to topological sort.


source

natsort_comparison

 natsort_comparison (str1:str, str2:str)

Parameters - str1 - str - str2 - str


source

default_str_comparison

 default_str_comparison (str1:str, str2:str)

Parameters - str1 - str - str2 - str


source

containing_string_priority

 containing_string_priority (str1:str, str2:str)

Returns 1, 0, -1 depending on whether one string contains the other.

TODO make the string containment criterion looser, e.g. finite Galois etale cover “contains” finite etale cover.

Parameters - str1 - str - str2 - str


source

graph_for_topological_sort

 graph_for_topological_sort (items_to_sort:collections.abc.Iterable[str],
                             key_order:Callable[[str,str],int])

Parameters - items_to_sort - Iterable[str] - key_order: Callable[[str, str], int] - Comparing str1 against str2 results in a positive number if str1 is “greater” than str2 (i.e. str1 is of a later priority)

Returns - dict[str, set[str]] - A dict whose keys are the elements k of items_to_sort and whose values are sets of elements k2 of items_to_sort such that key_order(k, k2) > 0.


source

dict_with_keys_topologically_sorted

 dict_with_keys_topologically_sorted (dict_to_sort:dict[str],
                                      key_order:Callable[[str,str],int],
                                      reverse:bool=False)

Returns an OrderedDict whose keys are sorted topologically by the specified key order.

Parameters - dict_to_sort - dict[str] - The dict whose keys need to be ordered. - key_order - The comparison function on the keys of dict_to_sort. Defaults to the key function for the comparison [containing_string_priority](https://hyunjongkimmath.github.io/trouver/helper.html#containing_string_priority). - reverse - bool - Defaults to False

Returns - OrderedDict[str]

# TODO: test

Alphabet

With Greek letters typed in LaTeX, I sometimes need to interpret them like their English equivalents, e.g. \alpha is like A, etc.


source

alphabet_or_latex_command_to_alphabet_group

 alphabet_or_latex_command_to_alphabet_group (character)

source

alphabet_or_latex_command_to_alphabet

 alphabet_or_latex_command_to_alphabet (character)

Returns the alphabet that the character “corresponds to”.


source

alphabet_to_alphabet_group

 alphabet_to_alphabet_group (character)

Returns the alphabet group

In my vaults, I often alphabetize things and also group the alphabet as follows: - A-E - F-J - K-O - P-T - U-V

Parameters - character - str

Returns - str or None - Returns None if character is not an alphabet.

dicty = {}
for c in ascii_uppercase:
    dicty[c] = alphabet_to_alphabet_group(c)
print(dicty)
{'A': 'A-E', 'B': 'A-E', 'C': 'A-E', 'D': 'A-E', 'E': 'A-E', 'F': 'F-J', 'G': 'F-J', 'H': 'F-J', 'I': 'F-J', 'J': 'F-J', 'K': 'K-O', 'L': 'K-O', 'M': 'K-O', 'N': 'K-O', 'O': 'K-O', 'P': 'P-T', 'Q': 'P-T', 'R': 'P-T', 'S': 'P-T', 'T': 'P-T', 'U': 'U-Z', 'V': 'U-Z', 'W': 'U-Z', 'X': 'U-Z', 'Y': 'U-Z', 'Z': 'U-Z'}

Getting a path-valid string from a string containing latex


source

latex_to_path_accepted_string

 latex_to_path_accepted_string (latex:str)

Convert a latex string to a path accepted string

The latex_to_path_accepted_string function “cleans” a latex str into a path-valid string for the purposes of making files.

sample_1 = r'\mathcal{O}_X'
output_1 = latex_to_path_accepted_string(sample_1)
print(output_1)
assert 'O' in output_1 and 'X' in output_1
validate_filename(output_1)

sample_2 = r'\operatorname{Gal}(L/K)'
output_2 = latex_to_path_accepted_string(sample_2)
print(output_2)
assert 'Gal' in output_2 and 'L' in output_2 and 'K' in output_2
validate_filename(output_2)

# Example found in https://arxiv.org/abs/1607.04471
sample_3 = r'\begin{equation} \label{escape rate}   G_{F_t}(z,w) = \lim_{n\to\infty} \frac{1}{d^n} \log \| F_t^n(z,w) \|, \end{equation}'
output_3 = latex_to_path_accepted_string(sample_3)
print(output_3)
validate_filename(output_3)
O_X
Gal_L_K
label_escape_rate_G_F_t_z_w_lim_n_to_infty_frac_1_d_n_log_F_t_n_z_w