helper.definition_and_notation

Functions that deal with definitions and notations
import operator
from fastcore.test import *

Definitions and notations

I surround definitions and notations by double asterisks **. The double_asterisk_indices method finds strings surrounded by double asterisks, the notation_asterisk_indices method finds notation str, and the definition_asterisk_indices method finds definition str.


source

double_asterisk_indices

 double_asterisk_indices (text:str)

*Return the indices in str of text surrounded by double asterisks.

Assumes there no LaTeX math mode string has double asterisks.

See Also

Type Details
text str the str in which to find the indices of double asterisk surrounded text.
Returns list Each tuple is of the form (start,end), where text[start:end] is a part in text with double asterisks, including the double asterisks.

source

definition_asterisk_indices

 definition_asterisk_indices (text:str)

*Return the indices of definition text surrounded by double asterisks.

A double-asterisk-surrounded-text is a definition almost always when it is not purely LaTeX math mode text.

Assumes that no LaTeX math mode string has double asterisks and that no LaTeX math mode string has the dollar sign character within it.*

Type Details
text str The str in which to find the indices of the definitions surrounded by double asterisks.
Returns list Each tuple is of the form (start,end), where text[start:end] is a substring in text surrounded by double asterisks, including the double asterisks.

source

notation_asterisk_indices

 notation_asterisk_indices (text:str)

*Return the indices of notation text surrounded by double asterisks.

A double-asterisk-surrounded-text is a notation almost always when it is purely LaTeX math mode text.

Assumes that no LaTeX math mode string has the dollar sign character within it.*

Type Details
text str the str in which to find the indices of notations surrounded by double asterisks.
Returns list Each tuple is of the form (start,end), where text[start:end] is a part in text with LaTeX math mode text with double asterisks, including the double asterisks.

Examples

In the following example, scheme and structure sheaf are definitions, whereas $\mathcal{O}_X$ is a notation:

text = r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
listy = double_asterisk_indices(text)

start, end = listy[0]
test_eq(text[start:end], '**scheme**')

start, end = listy[1]
test_eq(text[start:end], '**structure sheaf**')

start, end = listy[2]
test_eq(text[start:end], '**$\mathcal{O}_X$**')

listy = notation_asterisk_indices(text)
start, end = listy[0]
test_eq(text[start:end], '**$\mathcal{O}_X$**')
test_eq(len(listy), 1)

listy = definition_asterisk_indices(text)
print(listy)
test_eq(len(listy), 2)
[(2, 12), (25, 44)]
<>:11: SyntaxWarning: invalid escape sequence '\m'
<>:15: SyntaxWarning: invalid escape sequence '\m'
<>:11: SyntaxWarning: invalid escape sequence '\m'
<>:15: SyntaxWarning: invalid escape sequence '\m'
C:\Users\hyunj\AppData\Local\Temp\ipykernel_9044\3923368571.py:11: SyntaxWarning: invalid escape sequence '\m'
  test_eq(text[start:end], '**$\mathcal{O}_X$**')
C:\Users\hyunj\AppData\Local\Temp\ipykernel_9044\3923368571.py:15: SyntaxWarning: invalid escape sequence '\m'
  test_eq(text[start:end], '**$\mathcal{O}_X$**')

The following example has a definition which starts and ends with dollar sign $ characters:

text = r'A **$G$-group over a ring $A$** is'
listy = notation_asterisk_indices(text)
test_eq(len(listy), 0)

listy = definition_asterisk_indices(text)
test(len(listy), 0, operator.ge)

The following example tests notation_asterisk_indices for LaTeX str with single asterisks in them:

# Tests LaTeX str with asterisks in them:
text = (r'''The **direct image of a sheaf $F^{\prime}$ on $X_{E}^{\prime}$** '''
        r'''is defined to be **$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$** and the '''
        r'''**inverse image of a sheaf $F$ on $X_{E}$** is defined to be **$\pi^{*} F=a\left(\pi^{p} F\right)$**.''')
listy = notation_asterisk_indices(text)
test_eq(len(listy), 2)
start, end = listy[0]
test_eq(text[start:end], r'**$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$**')

There are pure LaTeX strings which should be considered definitions, but for the purposes of the code here will be considered notations. For example, Hausdorff spaces in topology are also called \(T_2\) spaces:

text = (r"A topological space $X$ is called **$T_2$** if for all $x,y \in X$, "
        r"there exist open neightborhoods $V$ and $W$ around $x$ and $y$ respectively "
        r"such that $V \cap W = 0$.")
listy = notation_asterisk_indices(text)
test_eq(text[listy[0][0]:listy[0][1]], r'**$T_2$**')

Unfortunately, the current implementation of the above methods do not work correctly if there are LaTeX string with double asterisks ** within them.

# TODO: If this is fixed, delete this.
text = r'The double dual of $M$ is denoted by **$M^{**}$**.'
listy = definition_asterisk_indices(text)
print(f"The `definition_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a definition. This is incorrect!")

listy = notation_asterisk_indices(text)
print(f"The `notation_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a notation. I guess this is correct.")
The `definition_asterisk_indices` function detects the substring **$M^{** as a definition. This is incorrect!
The `notation_asterisk_indices` function detects the substring **$M^{**}$** as a notation. I guess this is correct.

A workaround is to replace asterisks * with the LaTeX \ast command:

text = r'The double dual of $M$ is denoted by **$M^{\ast\ast}$**.'
listy = definition_asterisk_indices(text)
test_eq(len(listy), 0)

listy = notation_asterisk_indices(text)
test_eq(len(listy), 1)
test_eq(text[listy[0][0]:listy[0][1]], r'**$M^{\ast\ast}$**')

source

defs_and_notats_separations

 defs_and_notats_separations (text:str)

*Finds the indices in the text where double asterisks occur and categorizes whether each index is for a definition or a notation.

Parameters

  • text - str

Returns

  • list[tuple[int, bool]]
    • Each tuple is of the form (start, end, is_notation), where text[start:end] is the double-asterisk surrounded string, including the double asterisks.*

In the following example, the first two double-asterisk-surrounded-strings are definitions, and the third is a notation:

text = r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
listy = defs_and_notats_separations(text)
assert not listy[0][2]
assert not listy[1][2]
assert listy[2][2]