markdown.obisidian.personal.machine_learning.notation_identification

Functions for finding notations introduced in mathematical text.
from unittest import mock
import shutil
import tempfile

from fastcore.test import *
from pathvalidate import validate_filename 
from torch import Tensor

from trouver.helper import _test_directory

Get notation data

Given information notes with notations marked with double asterisks **, we extract the data of these double asterisks organize them for machine learning.

Ultimately, we would like to have a ML model that can find the locations where notations are newly introduced in a note. The approach here is to train a categorization model which takes an input a text with a single double asterisk pair surrounding a LaTeX math mode string and outputs whether the LaTeX math mode string contains a notation. We then use the categorization model to find all LaTeX math mode strings containing notations one by one.


source

add_one_double_asts_to_line

 add_one_double_asts_to_line (line:str, start:int, end:int)

Return line with only one double asterisks ** surrounded text.

Used in _definition_data_from_line

Type Details
line str The text to which to add the double asterisks **
start int The first double asterisks are added in between line[start-1] and line[start].
end int The second double asterisks are added in between line[end-1] and line[end].
Returns str The str obtained from line by surrounding the substring line[start:end] with double asterisks.
test_eq(add_one_double_asts_to_line("I will add just one double ast pair.", 2,6), 'I **will** add just one double ast pair.')

source

notation_data_from_text

 notation_data_from_text (with_double_asts:str)

Extracts data on the locations of notations in a text with double asterisks.

Used in notation_data_from_note

Returns

  • tuple[str, list[tuple[int, int, bool]]]
    • The str is the str no_double_asts, which is the same as with_double_asts, except with the double asterisks removed.

    • Each list represents a data point for a LaTeX math-mode string in no_double_astsand consists of

      1. The indices start, end where the data point considers whether or not the LaTeX math-mode substring line_no_double_asts[start:end] is surrounded by double-asterisks (and hene is supposed to introduce a notation).

      2. A bool which is True, if the data-point represents a str with double-asterisks surrounding a notation and False otherwise.

Type Details
with_double_asts str May or may not have double asterisks to signify definitions and notations
Returns tuple
sample_output = notation_data_from_text(
    r'**here is a double ast text**. It is not a LaTeX math mode string,'
    r'so it will not be included as a data point.'
    r'On the other hand, **$\operatorname{Gal}(L/K)$** and $\mathbb{Z}/2\mathbb{Z}$'
    r'are both included LaTeX math mode strings and are included as data points.'
    r'The bool for the former is `True`, whereas the bool for the latter is `False`.')

assert '**' not in sample_output[0]
start, end, is_notation = sample_output[1][0]
test_eq(sample_output[0][start:end], r'$\operatorname{Gal}(L/K)$')
start, end, is_notation = sample_output[1][1]
test_eq(sample_output[0][start:end], r'$\mathbb{Z}/2\mathbb{Z}$')
print(sample_output)
('here is a double ast text. It is not a LaTeX math mode string,so it will not be included as a data point.On the other hand, $\\operatorname{Gal}(L/K)$ and $\\mathbb{Z}/2\\mathbb{Z}$are both included LaTeX math mode strings and are included as data points.The bool for the former is `True`, whereas the bool for the latter is `False`.', [(124, 149, True), (154, 178, False)])

source

notation_data_from_note

 notation_data_from_note (note:trouver.markdown.obsidian.vault.VaultNote,
                          vault:os.PathLike)

Obtain notation data from a note.

Note that the lists of str might not be in any particular order.

Returns

  • list[tuple[str, str, bool]]
    • Each list consists of
      1. The name of note,
      2. The processed str of note with only a single double asterisk surrounded LaTeX text. Note that the processed str merges display math mode text into single lines, cf. process_standard_information_note.
      3. A bool that is True if the LaTeX text contains notation.

We first set up an example:

test_vault = _test_directory() / 'test_vault_6'
vn = VaultNote(test_vault, name='reference_with_tag_labels_Definition 2')
print(vn.text())
---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note, _meta/definition, _meta/notation]
---
# Ring of integers modulo $n$[^1]

Let $n \geq 1$ be an integer. The **ring of integers modulo $n$**, denoted by **$\mathbb{Z}/n\mathbb{Z}$**, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.

More precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.

...


# See Also
- [[reference_with_tag_labels_Exercise 1|reference_with_tag_labels_Z_nZ_is_a_ring]]
# Meta
## References

## Citations and Footnotes
[^1]: Kim, Definition 2
sample_output = notation_data_from_note(vn, test_vault)
total_count_for_is_notation = 0
for name, with_one_double_asts, is_notation in sample_output:
    test_eq(name, vn.name)
    test_eq(with_one_double_asts.count('**'), 2)
    if is_notation:
        total_count_for_is_notation += 1
test_eq(total_count_for_is_notation, 1)
sample_output
[('reference_with_tag_labels_Definition 2',
  'Let $n \\geq 1$ be an integer. The ring of integers modulo $n$, denoted by **$\\mathbb{Z}/n\\mathbb{Z}$**, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\\mathbb{Z}/n\\mathbb{Z}$ has the elements $0,1,\\ldots,n-1$.\n\n...\n',
  True),
 ('reference_with_tag_labels_Definition 2',
  'Let **$n \\geq 1$** be an integer. The ring of integers modulo $n$, denoted by $\\mathbb{Z}/n\\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\\mathbb{Z}/n\\mathbb{Z}$ has the elements $0,1,\\ldots,n-1$.\n\n...\n',
  False),
 ('reference_with_tag_labels_Definition 2',
  'Let $n \\geq 1$ be an integer. The ring of integers modulo **$n$**, denoted by $\\mathbb{Z}/n\\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\\mathbb{Z}/n\\mathbb{Z}$ has the elements $0,1,\\ldots,n-1$.\n\n...\n',
  False),
 ('reference_with_tag_labels_Definition 2',
  'Let $n \\geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\\mathbb{Z}/n\\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that **$0$** and $n$ are equal.\n\nMore precisely, $\\mathbb{Z}/n\\mathbb{Z}$ has the elements $0,1,\\ldots,n-1$.\n\n...\n',
  False),
 ('reference_with_tag_labels_Definition 2',
  'Let $n \\geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\\mathbb{Z}/n\\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and **$n$** are equal.\n\nMore precisely, $\\mathbb{Z}/n\\mathbb{Z}$ has the elements $0,1,\\ldots,n-1$.\n\n...\n',
  False),
 ('reference_with_tag_labels_Definition 2',
  'Let $n \\geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\\mathbb{Z}/n\\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, **$\\mathbb{Z}/n\\mathbb{Z}$** has the elements $0,1,\\ldots,n-1$.\n\n...\n',
  False),
 ('reference_with_tag_labels_Definition 2',
  'Let $n \\geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\\mathbb{Z}/n\\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\\mathbb{Z}/n\\mathbb{Z}$ has the elements **$0,1,\\ldots,n-1$**.\n\n...\n',
  False)]

Make database of notation data


source

append_notation_data_to_database

 append_notation_data_to_database (vault:os.PathLike, file:os.PathLike,
                                   notes:list[trouver.markdown.obsidian.va
                                   ult.VaultNote], backup:bool=True)
Type Default Details
vault PathLike The vault from which the data is drawn
file PathLike The path to a CSV file
notes list The notes to add to the database
backup bool True If True, makes a copy of file in the same directoy and with the same name, except with an added extension of .bak.
Returns None
# TODO: example

Use ML categorization model to find and mark notations in notes


source

automatically_mark_notations

 automatically_mark_notations
                               (vn:trouver.markdown.obsidian.vault.VaultNo
                               te, learn:fastai.text.learner.TextLearner,
                               create_notation_notes:bool=False,
                               reference_name:str='')

Predict and mark where notations occur in a note, and optionally create a notation note, and add the notation note to the See Also section of the note.

Assumes that no double asterisks are already in the contents of vn.

This function Removes links, headings, footnotes, etc. from the original note and merges multi-line display math mode LaTeX text into single lines. Use with caution.

Type Default Details
vn VaultNote The information note to which to mark notations.
learn TextLearner The ML model which predicts where notation notes should occur. This is a classifier which takes as input a str with double asterisks surrounding LaTeX text. The model outputs whether or not the single double asterisk pair surrounds a LaTeX text with notation.
create_notation_notes bool False If True, creates the notations notes for the predicted notations and links them to the ‘See Also’ sections of the information notes.
reference_name str The name of the reference that vn belongs to; this is only relevant when create_notation_notes=True so that the created notation notes have file names starting with the reference name.
Returns None
# TODO: Test
with tempfile.TemporaryDirectory(prefix='tmp_dir_', dir=os.getcwd()) as tmp_dir:
    tmp_dir = Path(tmp_dir)
    temp_vault = tmp_dir / 'test_vault_6'
    shutil.copytree('_tests/test_vault_6', temp_vault)

    note = VaultNote(temp_vault, name='number_theory_reference_1_Definition 15')

    with mock.patch('__main__.TextLearner') as mock_textlearner_class:
        mock_textlearner = mock_textlearner_class.return_value
        mock_textlearner.predict.side_effect = [
            ('False', Tensor([0]), Tensor([1, 0])),
            ('True', Tensor([0]), Tensor([0, 1])),
            ('False', Tensor([0]), Tensor([1, 0])),
            ('False', Tensor([0]), Tensor([1, 0])),
            ]
        automatically_mark_notations(note, mock_textlearner)
        print('The following is the note after the double asterisks are added, '
              'assuming that the ML model predictions are as above:')
        print(note.text())
        assert r'**$\operatorname{Gal}(L/K)$**' in note.text()
The following is the note after the double asterisks are added, assuming that the ML model predictions are as above:
---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note, _meta/definition, _meta/notation]
---
# Topic[^1]
%%This is an example file to which  `automatcally_mark_notations` will be applied.%%

Let $L/K$ be a Galois field extension. Its Galois group **$\operatorname{Gal}(L/K)$** is defined as the group of automorphisms of $L$ fixing $K$ pointwise.

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Kim, 
# TODO: test 'w' after implementing `overwrite.`
# TODO: test 'a' after implementing `overwrite.`
# TODO: test `None` after implementing `overwrite.`