import operator
from fastcore.test import *
helper.definition_and_notation
Definitions and notations
I surround definitions and notations by double asterisks **
. The double_asterisk_indices
method finds strings surrounded by double asterisks, the notation_asterisk_indices
method finds notation str, and the definition_asterisk_indices
method finds definition str.
double_asterisk_indices
double_asterisk_indices (text:str)
*Return the indices in str
of text surrounded by double asterisks.
Assumes there no LaTeX math mode string has double asterisks.
See Also
Type | Details | |
---|---|---|
text | str | the str in which to find the indices of double asterisk surrounded text. |
Returns | list | Each tuple is of the form (start,end) , where text[start:end] is a part in text with double asterisks, including the double asterisks. |
definition_asterisk_indices
definition_asterisk_indices (text:str)
*Return the indices of definition text surrounded by double asterisks.
A double-asterisk-surrounded-text is a definition almost always when it is not purely LaTeX math mode text.
Assumes that no LaTeX math mode string has double asterisks and that no LaTeX math mode string has the dollar sign character within it.*
Type | Details | |
---|---|---|
text | str | The str in which to find the indices of the definitions surrounded by double asterisks. |
Returns | list | Each tuple is of the form (start,end) , where text[start:end] is a substring in text surrounded by double asterisks, including the double asterisks. |
notation_asterisk_indices
notation_asterisk_indices (text:str)
*Return the indices of notation text surrounded by double asterisks.
A double-asterisk-surrounded-text is a notation almost always when it is purely LaTeX math mode text.
Assumes that no LaTeX math mode string has the dollar sign character within it.*
Type | Details | |
---|---|---|
text | str | the str in which to find the indices of notations surrounded by double asterisks. |
Returns | list | Each tuple is of the form (start,end) , where text[start:end] is a part in text with LaTeX math mode text with double asterisks, including the double asterisks. |
Examples
In the following example, scheme
and structure sheaf
are definitions, whereas $\mathcal{O}_X$
is a notation:
= r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
text = double_asterisk_indices(text)
listy
= listy[0]
start, end '**scheme**')
test_eq(text[start:end],
= listy[1]
start, end '**structure sheaf**')
test_eq(text[start:end],
= listy[2]
start, end '**$\mathcal{O}_X$**')
test_eq(text[start:end],
= notation_asterisk_indices(text)
listy = listy[0]
start, end '**$\mathcal{O}_X$**')
test_eq(text[start:end], len(listy), 1)
test_eq(
= definition_asterisk_indices(text)
listy print(listy)
len(listy), 2) test_eq(
[(2, 12), (25, 44)]
<>:11: SyntaxWarning: invalid escape sequence '\m'
<>:15: SyntaxWarning: invalid escape sequence '\m'
<>:11: SyntaxWarning: invalid escape sequence '\m'
<>:15: SyntaxWarning: invalid escape sequence '\m'
C:\Users\hyunj\AppData\Local\Temp\ipykernel_9044\3923368571.py:11: SyntaxWarning: invalid escape sequence '\m'
test_eq(text[start:end], '**$\mathcal{O}_X$**')
C:\Users\hyunj\AppData\Local\Temp\ipykernel_9044\3923368571.py:15: SyntaxWarning: invalid escape sequence '\m'
test_eq(text[start:end], '**$\mathcal{O}_X$**')
The following example has a definition which starts and ends with dollar sign $
characters:
= r'A **$G$-group over a ring $A$** is'
text = notation_asterisk_indices(text)
listy len(listy), 0)
test_eq(
= definition_asterisk_indices(text)
listy len(listy), 0, operator.ge) test(
The following example tests notation_asterisk_indices
for LaTeX str with single asterisks in them:
# Tests LaTeX str with asterisks in them:
= (r'''The **direct image of a sheaf $F^{\prime}$ on $X_{E}^{\prime}$** '''
text r'''is defined to be **$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$** and the '''
r'''**inverse image of a sheaf $F$ on $X_{E}$** is defined to be **$\pi^{*} F=a\left(\pi^{p} F\right)$**.''')
= notation_asterisk_indices(text)
listy len(listy), 2)
test_eq(= listy[0]
start, end r'**$\pi_{*} F^{\prime}=\pi_{p} F^{\prime}$**') test_eq(text[start:end],
There are pure LaTeX strings which should be considered definitions, but for the purposes of the code here will be considered notations. For example, Hausdorff spaces in topology are also called \(T_2\) spaces:
= (r"A topological space $X$ is called **$T_2$** if for all $x,y \in X$, "
text r"there exist open neightborhoods $V$ and $W$ around $x$ and $y$ respectively "
r"such that $V \cap W = 0$.")
= notation_asterisk_indices(text)
listy 0][0]:listy[0][1]], r'**$T_2$**') test_eq(text[listy[
Unfortunately, the current implementation of the above methods do not work correctly if there are LaTeX string with double asterisks **
within them.
# TODO: If this is fixed, delete this.
= r'The double dual of $M$ is denoted by **$M^{**}$**.'
text = definition_asterisk_indices(text)
listy print(f"The `definition_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a definition. This is incorrect!")
= notation_asterisk_indices(text)
listy print(f"The `notation_asterisk_indices` function detects the substring {text[listy[0][0]:listy[0][1]]} as a notation. I guess this is correct.")
The `definition_asterisk_indices` function detects the substring **$M^{** as a definition. This is incorrect!
The `notation_asterisk_indices` function detects the substring **$M^{**}$** as a notation. I guess this is correct.
A workaround is to replace asterisks *
with the LaTeX \ast
command:
= r'The double dual of $M$ is denoted by **$M^{\ast\ast}$**.'
text = definition_asterisk_indices(text)
listy len(listy), 0)
test_eq(
= notation_asterisk_indices(text)
listy len(listy), 1)
test_eq(0][0]:listy[0][1]], r'**$M^{\ast\ast}$**') test_eq(text[listy[
defs_and_notats_separations
defs_and_notats_separations (text:str)
*Finds the indices in the text where double asterisks occur and categorizes whether each index is for a definition or a notation.
Parameters
- text - str
Returns
- list[tuple[int, bool]]
- Each tuple is of the form
(start, end, is_notation)
, wheretext[start:end]
is the double-asterisk surrounded string, including the double asterisks.*
- Each tuple is of the form
In the following example, the first two double-asterisk-surrounded-strings are definitions, and the third is a notation:
= r'A **scheme** is a... the **structure sheaf** of a scheme $X$ is the sheaf **$\mathcal{O}_X$**.'
text = defs_and_notats_separations(text)
listy assert not listy[0][2]
assert not listy[1][2]
assert listy[2][2]