markdown.obsidian.personal.machine_learning.notation_summarization

Functions for summarizing notations

In markdown.obsidian.personal.machine_learning.notation_identification, we defined functions to gather ML data to identify notations and functions to use an ML model to predict where notations are defined. In how_to.train_ml_model.fastai, we showed how to train this ML model from the data. markdown.obsidian.personal.notation also has some functions to create and interact with notation notes.

It would be further convenient to train an ML model to summarize details about notations. This module defines functions to gather data to train such a model.

import shutil
import tempfile
from unittest import mock

from fastcore.test import *
# from torch import Tensor

from trouver.helper.tests import _test_directory

Gather ML data from notation notes

source

get_latex_in_original_from_parsed_notation_note_data

 get_latex_in_original_from_parsed_notation_note_data (metadata:dict[str],
                                                       notation_str:str)

*Return the string that should be considered the latex_in_original given part of the output of parse_notation_note

latex_in_original is intended to either be 1. a substring of the main note in which the notation is introduced, if available, or 2. the notation itself.

If the notation note has a latex_in_original field in its YAML frontmatter metadata, then the (first) str there is the latex_in_original.*

# The metadata is empty, so the `notation_str` should be considered the `latex_in_original`.
test_eq(
    get_latex_in_original_from_parsed_notation_note_data({}, r'\mathbb{A}^n_k'),
    r'\mathbb{A}^n_k')

test_metadata_1 = {
    'detect_regex': [],
    'latex_in_original': [
        r"c_x: \\mathbb{P}_k^n/(\\mathbb{P}_k^{n-1}) \\to \\mathbb{P}_k^n/(\\mathbb{P}_k^{n} - \\{x\\}) \\cong \\mathbb{P}_k^n/(\\mathbb{P}_k^{n-1})"
        ]}
test_notation_str_1 = r'c_x'
test_eq(
    get_latex_in_original_from_parsed_notation_note_data(test_metadata_1, test_notation_str_1),
    test_metadata_1['latex_in_original'][0])

source

notation_summarization_data_from_note

 notation_summarization_data_from_note
                                        (notation_note:trouver.markdown.ob
                                        sidian.vault.VaultNote,
                                        vault:os.PathLike, check_for_actua
                                        l_summarization:bool=True)

*Obtain notation summzarization data from the notation note.

None is returned instead of a `dict if

the notation is determined to have not been (fully) summarized,
- Currently, this is implemented as follows: a notation note is not fully summarized if it does not have any content beyond <notation> denotes or if the notation note has #_meta/TODO marked in its content (except in the trailing bulleted list of notations used in the notation note)
the main note of the notation not does not exist or is essentially empty
- Essentially empty means that the process_standard_information_note function applied to the a MarkdownFile object of the main note yields a MarkdownFile object whose __str__ method returns a string of only blank characters.
the notation note is marked with an _auto/notation_summary tag in its YAML frontmatter meta.
- If a notation note is marked with an _auto/notation_summary tag, then that means that its summary has been auto-generated by an ML model using the append_summary_to_notation_note function

The notion of whether the notation “has been summarized” is not exactly implemented, but should be sufficient for gathering data.

This function is mainly used in gather_notation_note_summaries.

TODO: there are some notation notes drawing information from multiple notes; separate text in notation notes corresponding to each note.

Returns

Union[dict[str, str], None]
- If check_for_actual_summarization is False or the notation note is determined to have been summarized (i.e. contains text beyond <notation> denotes and does not have the #_meta/TODO tag) then the output is a dict whose key-value pairs are
  - "Notation note name" - The name of the notation note
  - "Notation" - The notation of the note
  - "Latex in original"' - The entry of thelatex_in_originalfield of the note if available, cf.make_a_notation_note`
  - "Summary" - The summary of the notation.
  - "Main note name" - The name of the main note of the notation note
  - "Processed main note contents" - The processed contents of the main note
- Otherwise, the output is `None.*

	Type	Default	Details
notation_note	VaultNote
vault	PathLike
check_for_actual_summarization	bool	True	If `True`, then return `None` if `notation_note` is determined to not have a summary or if the summary is autogenerated.
Returns	Optional		The keys to the dict are “Notation note name”, “Notation”, “Latex in oiriginal”, “Summary”, and “Main note name”. However, `None` is returned instead of a `dict` if the notation note is determined to have not been summarized, if the main note of the notation note does not exist, or the notation note is marked with the `_auto/notation_summary` tag.

vault = _test_directory() / 'test_vault_7'

notation_note = VaultNote(vault, name='some_reference_name_notation_Spec_A') 
sample_output = notation_summarization_data_from_note(notation_note, vault)

del sample_output['Processed main note contents']

test_eq(sample_output,
    {'Notation note name': 'some_reference_name_notation_Spec_A',
     'Notation': '$\\operatorname{Spec} A$',
     'Latex in original': '\\operatorname{Spec} A',
     'Summary': 'the spectrum of the ring $A$.',
     'Main note name': 'spectrum_of_a_ring',
     # 'Processed main note contents': 'Let $A$ be a ring.\n\nThe Spectrum $\\operatorname{Spec} A$ is the set of prime ideals of $A$. It is equipped with a topology, called the Zariski topology in which the followings sets, called the distinguished open subsets of $\\operatorname{Spec} A$, give a base for the topology:\n\n$$D(f) = \\{\\mathfrak{p} \\in \\operatorname{Spec} A: f \\not\\in \\mathfrak{p} \\}.$$\n'}
    })

c:\Users\hyunj\Documents\Development\Python\trouver_py310_venv\lib\site-packages\bs4\__init__.py:435: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup.
  warnings.warn(

If the “main” note of the notation note cannot be identified, then notation_summarization_data_from_note returns None:

vault = _test_directory() / 'test_vault_7'
notation_note = VaultNote(vault, name='some_reference_name_notation_O_X_this_file_has_no_links') 
print("The following note has no main note link:\n")
print(notation_note.text())
test_eq(notation_summarization_data_from_note(notation_note, vault), None)


notation_note = VaultNote(vault, name='notation_note_with_main_note_link_but_main_note_does_not_exist') 
print("\n\nThe following note has a main note link, but the main note does not actually exist:\n")
print(notation_note.text())
test_eq(notation_summarization_data_from_note(notation_note, vault), None)

The following note has no main note link:

---
detect_regex: []
latex_in_original: ["\\mathscr{O}_X"]
---
$\mathscr{O}_X$ denotes the structure sheaf of the scheme $X$.


The following note has a main note link, but the main note does not actually exist:

$k[[t]]$ [[nonexistent_note|denotes]] the ring of formal power series where $k$ is a field.

# TODO: example with embedded links

# TODO: example of a notation note without any metadata

If the notation note has not been summarized, then notation_summarization_data_from_note returns None, even if the main note of the notation note exists:

# TODO: example where main of notation note exists, but has basically no content;
# The function should return `None` in this case.
vault = _test_directory() / 'test_vault_7'
notation_note = VaultNote(vault, name='some_reference_name_notation_B_R') 

print(
    "This is what a notation note that has not been summarized basically looks like."
    " Note that it has YAML frontmatter metadata and starts a sentence to describe"
    " what a certain notation is supposed to denote, but ultimately does not contain"
    " an actual summary:\n")
print(notation_note.text())

test_eq(notation_summarization_data_from_note(notation_note, vault), None)

This is what a notation note that has not been summarized basically looks like. Note that it has YAML frontmatter metadata and starts a sentence to describe what a certain notation is supposed to denote, but ultimately does not contain an actual summary:

---
detect_regex: []
latex_in_original: ["\\mathcal{B}(\\mathbb{R})"]
---
$\mathcal{B}(\mathbb{R})$ [[note_with_some_excessive_notation_notes|denotes]]

# TODO: example where notation note has _auto/notation_summary tag; the notation summarization data should not be added.

source

gather_notation_note_summaries

 gather_notation_note_summaries (vault:os.PathLike,
                                 notes:list[trouver.markdown.obsidian.vaul
                                 t.VaultNote])

*Return a pandas.DataFrame encapsulating the data of notation note summaries.

cf. notation_summarization_data_from_note, which is the function from which the notation note summaries are drawn.

This function is mainly used in append_to_notation_note_summarization_database.*

	Type	Details
vault	PathLike
notes	list
Returns	DataFrame	Has columns `Time added`, `Time modified`, `Notation note name`, `Notation`, `Latex in original`, ‘Summary’, ‘Main note name’, and ‘Processed main note contents’.

# TODO: example

# TODO: example verifying that outputs of `None` from `notation_summarization_data_from_note`
# are weeded out.

source

append_to_notation_note_summarization_database

 append_to_notation_note_summarization_database (vault:os.PathLike,
                                                 file:os.PathLike, notes:l
                                                 ist[trouver.markdown.obsi
                                                 dian.vault.VaultNote],
                                                 backup:bool=True)

*Either create a csv file containing data for notation note summarization or append to an existing csv file.

The columns of the database file are as follows:

Time added - The time when the row was added.
Time modified - The time when the labels of the row
Notation note name - The name of the note from which the data for the row was derived.
‘Notation’ - The notation which is being summarized
‘Latex in original’ - The entry of the latex_in_original field of the note if available, cf. make_a_notation_note
"Summary" - The summary of the notation.
"Main note name" - The name of the main note of the notation note
"Processed main note contents" - The processed contents of the main note

All timestamps are in UTC time and specify time to minutes (i.e. no seconds/microseconds).

TODO: implement updating rows and rewrite the next paragraph to accurately reflect the implementation. I would like the ‘Notation’, ‘Latex in original’, ‘Summary’, ‘processed main note contents’ to be the “pivot_cols”

If a “new” note has the same processed content as a pre-existing note and anything is different about the “new” note, then update the row of the existing note. In particular, the following are updated: - Time modified (set to current time) - Notation (overwritten) - Latex in original (overwritten) - Summary (overwritten) - Main note name (overwritten) - Processed main note contents (overwritten)

This method assumes that all the processed content in the CSV file are all distinct if the CSV file exists.*

	Type	Default	Details
vault	PathLike		The vault freom which the data is drawn
file	PathLike		The path to a CSV file
notes	list		the notation notes to consider adding to the database.
backup	bool	True	If `True`, makes a copy of `file` in the same directory and with the same name, except with an added extension of `.bak`.
Returns	None

# TODO basic example

# TODO: example of appending instead of making an entirely new CSV file

# TODO: example of notation notes existing, but none of them are to be added to the database by virtue of either the notation summary
# not being written or by virtue of their main notes not existing.

# TODO basic example

# TODO: example of appending instead of making an entirely new CSV file

# TODO: example of notation notes existing, but none of them are to be added to the database by virtue of either the notation summary
# not being written or by virtue of their main notes not existing.

Format Data

Given the contents of the CSV file generated in append_to_notation_note_summarization_database, we create a single column representing the data upon which the ML model should train.

Since a single piece of text can introduce multiple notations, we should indicate which the notation we want summarized. The entry in the Latex in original column of the CSV file can indicate this notation; as explained in make_a_notation_note, this entry is not necessarily the notation itself, but rather the substring of the text in which the notation is introduced. For example, in the text

Let $R$ be a ring and let $M$ be a module. We define the dual of $M$ as 

$$M^\vee := \operatorname{Hom}_R(R,M).$$

The notation M^\vee is introduced, and it is in the display math mode string.

source

single_input_for_notation_summarization

 single_input_for_notation_summarization (main_note_content:str,
                                          latex_in_original:str, latex_in_
                                          original_comes_first:bool=True)

*Format an input for a transformers.pipelines.text2text_generation.SummarizationPipeline object to summarize a notation introduced in a mathematical text.

The input consists of main_note_content as well as a part that is formatted as f"latex_in_original: {latex_in_original}".

Note that this function is used to format data used to train/validate the summarization model within the SummarizationPipeline.*

	Type	Default	Details
main_note_content	str		The mathematical text that introduces the notation and from which to summarize a notation.
latex_in_original	str		A substring in main_note_content which is a latex string in which the notation is introduced.
latex_in_original_comes_first	bool	True	If `True`, the `latex_in_original` piece appears before the `main_note_content`
Returns	str

The single_input_for_notation_summarization function formats an input for the notation summarization model to train on or to predict. Such an input consists of a mathematical excerpt and the latex_in_original field, which is a latex string containing a notation supposedly introduced in the mathematical excerpt.

test_content = "This is a test content introducing notation: $\mathbb{Z}/n\mathbb{Z}$ denotes this. $\mathbb{F}_q$ denotes that"
latex_in_original = "\mathbb{Z}/n\mathbb{Z}"

# In this example, `latex_in_orignal_comex_first` is set to `False`.
sample_output = single_input_for_notation_summarization(
    test_content,
    latex_in_original,
    latex_in_original_comes_first=False)
assert '\n\nlatex_in_original: ' in sample_output
print(sample_output, '\n')

This is a test content introducing notation: $\mathbb{Z}/n\mathbb{Z}$ denotes this. $\mathbb{F}_q$ denotes that

latex_in_original: \mathbb{Z}/n\mathbb{Z}

Originally, this function was implemented in a way that places the latex_in_original at the end of the string (tantamount to letting latex_in_original_comes_first=False in the current implementation). Letting latex_in_original_comes_first=False places the latex_in_original field before the mathematical excerpt.

This change in implementation is an experimental attempt to improve the model; the author hypothesizes that placing the latex_in_original field before the mathematical excerpt will improve the model by allowing it to know what notation it is supposed to summarize before considering the mathematical excerpt.

# In this example, `latex_in_orignal_comex_first` is set to `True`.
sample_output = single_input_for_notation_summarization(
    test_content,
    latex_in_original,
    latex_in_original_comes_first=True)
assert sample_output.startswith('latex_in_original: ')
print(sample_output)

latex_in_original: \mathbb{Z}/n\mathbb{Z}

This is a test content introducing notation: $\mathbb{Z}/n\mathbb{Z}$ denotes this. $\mathbb{F}_q$ denotes that

If latex_in_original is None, TODO: finish explanation

# TODO: finish example

source

append_column_for_single_text

 append_column_for_single_text (df:pandas.core.frame.DataFrame,
                                latex_in_original_comes_first:bool=True)

Append a column "Single text" to the notation note summarization DataFrame to represent the input into an ML model as a single text

	Type	Default	Details
df	DataFrame		Assumed to be structured just as a dataframe of a CSV file created/modified by append_to_notation_note_summarization_database``\| \| latex_in_original_comes_first \| bool \| True \| This is a parameter to pass to calls to the [`single_input_for_notation_summarization`](https://hyunjongkimmath.github.io/trouver/markdown.obsidian.personal.machine_learning.notation_summarization.html#single_input_for_notation_summarization) function. If`True`, the`latex_in_original`piece appears before the`main_note_content`
Returns	None

csv_file = _test_directory() / 'ml_examples' / 'notation_summarizations.csv'
df = pd.read_csv(csv_file)
df.head()

	Time added	Time modified	Notation note name	Notation	Latex in original	Summary	Main note name	Processed main note contents
0	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_Gal_L_K	$\operatorname{Gal}(L/K)$	$\operatorname{ann} M$	the Galois group of the Galois field extension...	reference_name_Galois_group	If $L/K$ is a Galois field extension, then its...
1	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_I_J	$\mathbb{Z}/n\mathbb{Z}$	$\mathbb{Z}/n\mathbb{Z}$	The ring of integers modulo $n$. It is defined...	reference_name_Definition 1	The ring $\mathbb{Z}/n\mathbb{Z}$, called the ...
2	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_M_vee_dual_of_a_module	$M^\vee$	$M^\vee$	the dual of the $R$-module $M$. It is defined ...	reference_name_Definition 17	Let $M$ be an $R$-module. We define the dual m...
3	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_F_q	$\mathbb{F}_q$	$\mathbb{F}_q$	the finite field of $q$ elements, where $q$ is...	reference_name_Theorem 1	Theorem. Let $q$ be a prime power. There is a ...
4	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_X_N	$X(N)$	$X(N)$	the quotient of $\mathbb{H}^*$ by $\Gamma(N)$,...	reference_name_Definition 200	Now we discuss modular curves, whose points pa...

append_column_for_single_text(df)
df

	Time added	Time modified	Notation note name	Notation	Latex in original	Summary	Main note name	Processed main note contents	Single text
0	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_Gal_L_K	$\operatorname{Gal}(L/K)$	$\operatorname{ann} M$	the Galois group of the Galois field extension...	reference_name_Galois_group	If $L/K$ is a Galois field extension, then its...	latex_in_original: $\operatorname{ann} M$\n\nI...
1	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_I_J	$\mathbb{Z}/n\mathbb{Z}$	$\mathbb{Z}/n\mathbb{Z}$	The ring of integers modulo $n$. It is defined...	reference_name_Definition 1	The ring $\mathbb{Z}/n\mathbb{Z}$, called the ...	latex_in_original: $\mathbb{Z}/n\mathbb{Z}$\n\...
2	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_M_vee_dual_of_a_module	$M^\vee$	$M^\vee$	the dual of the $R$-module $M$. It is defined ...	reference_name_Definition 17	Let $M$ be an $R$-module. We define the dual m...	latex_in_original: $M^\vee$\n\nLet $M$ be an $...
3	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_F_q	$\mathbb{F}_q$	$\mathbb{F}_q$	the finite field of $q$ elements, where $q$ is...	reference_name_Theorem 1	Theorem. Let $q$ be a prime power. There is a ...	latex_in_original: $\mathbb{F}_q$\n\nTheorem. ...
4	2023-01-17T01:47	2023-01-17T01:47	reference_name_notation_X_N	$X(N)$	$X(N)$	the quotient of $\mathbb{H}^*$ by $\Gamma(N)$,...	reference_name_Definition 200	Now we discuss modular curves, whose points pa...	latex_in_original: $X(N)$\n\nNow we discuss mo...

Use ML model to fill in notation notes

After training the model (cf. how_to.train_ml_model.huggingface), we use it to summarize notations.

Using a HuggingFace pipeline to summarize a notation

source

summarize_notation

 summarize_notation
                     (main_content:Union[str,trouver.markdown.markdown.fil
                     e.MarkdownFile], latex_in_original:str, summarizer:tr
                     ansformers.pipelines.text2text_generation.Summarizati
                     onPipeline, fix_formatting:bool=True,
                     latex_in_original_comes_first:bool=True,
                     correct_syntax_error:bool=True)

*Summarize a notation introduced in a mathematical text using a huggingface pipeline.

Assumes that main_content is a mathematical text introducing a notation and that latex_in_original is a substring of main_content in which a notation is introduced.

By setting correct_latex_syntax_error to True, this function consider each math mode text in the generated summary, and attempts to, if not syntactically correct, replace it with a math mode text closely resembling it within main_content. More specifically, each math mode text within main_content is considered, and substrings within those math mode texts are considered. The syntactically correct substring (Determined via math_mode_string_is_syntactically_valid) that also most closely resembles (determined via Levenshtein distance) the math mode text originally in the generated summary.*

	Type	Default	Details
main_content	Union
latex_in_original	str
summarizer	SummarizationPipeline
fix_formatting	bool	True	If `True`, run `fix_autogen_formatting` on `summarizer`’s summary before retuning it.
latex_in_original_comes_first	bool	True	This is a parameter to pass to calls to the `single_input_for_notation_summarization` function. If `True`, the `latex_in_original` piece appears before the `main_note_content`
correct_syntax_error	bool	True	If `True`, attempt to correct latex syntax error
Returns	str

# TODO: example

Adding the summary to the notation note

source

append_summary_to_notation_note

 append_summary_to_notation_note
                                  (notation_note:trouver.markdown.obsidian
                                  .vault.VaultNote, vault:os.PathLike, sum
                                  marizer:transformers.pipelines.text2text
                                  _generation.SummarizationPipeline, main_
                                  note:Optional[trouver.markdown.obsidian.
                                  vault.VaultNote]=None, overwrite_previou
                                  s_autogenerated_summary:bool=False,
                                  latex_in_original_comes_first:bool=True)

*Summarize a notation introduced in a mathematical text using a huggingface pipeline and append said summarization to notation_note.

If notation_note does not have a YAML frontmatter meta or does not have a latex_in_original field in its YAML frontmatter meta, then the actual notation is used as the latex_in_original.

The following describes the circumstances under which the summarization does not happen:

If main_note is None and no main note of notation_note can be determined via the main_of_notation function.
If overwrite_previous_autogenerated_summary is False and notation_note has the _auto/notation_summary tag in its YAML frontmatter meta (if available).
If notation_note does not have a YAML frontmatter meta or the _auto/notation_summary tag is not present in the YAML frontmatter meta and notation_note has nontrivial content (i.e. its content is not merely of the form $<notation>$ [[<link_to_main_note|denotes]]).

If an auto-generated summary is appended, then this function adds an _auto/notation_summary tag to the notation note’s YAML frontmatter, if not already present. This tag is intended to be used as follows:

The presence of this tag tells the reader that the summary has been autogenerated by this function.
The notation_summarization_data_from_note function (and by extension, the gather_notation_note_summaries and append_to_notation_note_summarization_database functions) avoids gathering notation summarization data from notation notes marked with this tag
The owner of the Obsidian.md vault can manually make modifications to the notation note if necessary and remove this tag to indicate that the summary is appropriate to be added to the note summarization database.*

	Type	Default	Details
notation_note	VaultNote
vault	PathLike
summarizer	SummarizationPipeline		Contains an ML model which summarizes the notation, see `summarize_notation` function.
main_note	Optional	None	The main note from which the notation comes from. If this is `None`, then the `main_note` is obtained via the `main_of_notation` function.
overwrite_previous_autogenerated_summary	bool	False	If `True`, overwrite previously autogenerated summaries
latex_in_original_comes_first	bool	True	This is a parameter to pass to calls to the `single_input_for_notation_summarization` function. If `True`, the `latex_in_original` piece appears before the `main_note_content`. While the default value of `True` is recommended, passing `False` to this parameter may be necessary to use the older version of the summarization model in the repo `notation_summarizations_model`.
Returns	None

The next several examples demonstrate when a summary is written and when it is not.

When the main note of the notation note cannot be identified, don’t write a summary

with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir,
      mock.patch('__main__._get_summary') as mock_get_summary,
      mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)
    notation_note_without_a_main_note = VaultNote(temp_vault, name='some_reference_name_notation_O_X_this_file_has_no_links')

    print("The following in the text of the notation note:\n")
    print(notation_note_without_a_main_note.text())

    append_summary_to_notation_note(notation_note_without_a_main_note, temp_vault, summarizer)

    mock_get_summary.assert_not_called()
    mock_write_summary_to_notation_note.assert_not_called()

The following in the text of the notation note:

---
detect_regex: []
latex_in_original: ["\\mathscr{O}_X"]
---
$\mathscr{O}_X$ denotes the structure sheaf of the scheme $X$.

When notation_note has the _auto/notation_summary tag in its YAML frontmatter meta then

don’t write a summary if overwrite_previous_autogenerated_summary is False.
write a summary otherwise.

# Note that `overwrite_previous_autogenerated_summary` here is `False`.
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir,
      mock.patch('__main__._get_summary') as mock_get_summary,
      mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)
    notation_note_with_auto_notation_summary_tag = VaultNote(temp_vault, name='notation_note_with_auto_notation_summary_tag')

    print("The following in the text of the notation note:\n")
    print(notation_note_with_auto_notation_summary_tag.text())

    append_summary_to_notation_note(notation_note_with_auto_notation_summary_tag, temp_vault, summarizer, overwrite_previous_autogenerated_summary=False)

    mock_get_summary.assert_not_called()
    mock_write_summary_to_notation_note.assert_not_called()

The following in the text of the notation note:

---
detect_regex: []
latex_in_original: ["\\operatorname{MT}(A)"]
tags: [_auto/notation_summary]
---
$\operatorname{MT}(A)$ [[main_note_of_notation_note_with_auto_notation_summary_tag|denotes]] the Mumford-Tate group of the abelian variety $A$.

# Note that `overwrite_previous_autogenerated_summary` here is `True`.
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir,
      mock.patch('__main__._get_summary') as mock_get_summary,
      # mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note
      ):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)
    notation_note_with_auto_notation_summary_tag = VaultNote(temp_vault, name='notation_note_with_auto_notation_summary_tag')

    print("The following is the original text of the notation note:\n")
    print(notation_note_with_auto_notation_summary_tag.text())

    mock_get_summary.return_value = "(this is a mock summary) the Mumford Tate group of the abelian variety $A$ with CM. It is defined as the image of the reflex norm map of $A$."
    append_summary_to_notation_note(notation_note_with_auto_notation_summary_tag, temp_vault, summarizer, overwrite_previous_autogenerated_summary=True)

    mock_get_summary.assert_called_once()

    print("\n\nIn this example, the previously autogenerated summary is overwritten with a new autogenerated summary:\n\n")
    print(notation_note_with_auto_notation_summary_tag.text())
    assert mock_get_summary.return_value in notation_note_with_auto_notation_summary_tag.text()

The following is the original text of the notation note:

---
detect_regex: []
latex_in_original: ["\\operatorname{MT}(A)"]
tags: [_auto/notation_summary]
---
$\operatorname{MT}(A)$ [[main_note_of_notation_note_with_auto_notation_summary_tag|denotes]] the Mumford-Tate group of the abelian variety $A$.


In this example, the previously autogenerated summary is overwritten with a new autogenerated summary:


---
detect_regex: []
latex_in_original: ["\\operatorname{MT}(A)"]
tags: [_auto/notation_summary]
---
$\operatorname{MT}(A)$ [[main_note_of_notation_note_with_auto_notation_summary_tag|denotes]] (this is a mock summary) the Mumford Tate group of the abelian variety $A$ with CM. It is defined as the image of the reflex norm map of $A$.

Don’t write a summary when

notation_note does not have a YAML frontmatter meta or
the _auto/notation_summary tag is not present in the YAML frontmatter meta and notation_note has nontrivial content.

# In this example, the notation_note does not have a YAML frontmatter meta,
# so no summary is written.
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir,
      mock.patch('__main__._get_summary') as mock_get_summary,
      mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)
    notation_note_without_yaml_frontmatter = VaultNote(temp_vault, name='some_reference_name_notation_k_t_formal_power_series_ring')

    print("The following in the text of the notation note:\n")
    print(notation_note_without_yaml_frontmatter.text())

    append_summary_to_notation_note(notation_note_without_yaml_frontmatter, temp_vault, summarizer, overwrite_previous_autogenerated_summary=False)

    mock_get_summary.assert_not_called()
    mock_write_summary_to_notation_note.assert_not_called()

The following in the text of the notation note:

$k[[t]]$ [[some_note|denotes]] the ring of formal power series where $k$ is a field.

# In this example, there is "nontrivial content" in the notation note
# and the `_auto/notation_summary` tag is not present in the notation
# note, so the summary is assumed to be not autogenerated and hence
# not replacable. As such, no summary is autogenerated.

with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir,
      mock.patch('__main__._get_summary') as mock_get_summary,
      mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)
    notation_note_without_yaml_frontmatter = VaultNote(temp_vault, name='some_reference_name_notation_Pic_C')

    print("The following in the text of the notation note:\n")
    print(notation_note_without_yaml_frontmatter.text())

    append_summary_to_notation_note(notation_note_without_yaml_frontmatter, temp_vault, summarizer, overwrite_previous_autogenerated_summary=False)

    mock_get_summary.assert_not_called()
    mock_write_summary_to_notation_note.assert_not_called()

The following in the text of the notation note:

---
detect_regex: []
latex_in_original: ["\\operatorname{Pic} C"]
---
$\operatorname{Pic} C$ [[divisor_class_group_of_a_curve|denotes]] the spectrum of the ring $A$.

When the main note is identifiable, the _auto/notation_summary tag is not present, and notation_note has no content, write a summary, no matter what overwrite_previous_autogenerated_summary is.

# In this example, the main note of the notation note is identifiable,
# the `_auto/notation_summary` tag is not present, and the notation note has no,
# so it is appropriate to auto generate a summary and write it to the
# notation note. 

with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir,
      mock.patch('__main__._get_summary') as mock_get_summary,
      # mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note
      ):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)
    notation_note_with_auto_notation_summary_tag = VaultNote(temp_vault, name='foag_notation_M_n_bullet')

    print("The following is the original text of the notation note:\n")
    print(notation_note_with_auto_notation_summary_tag.text())

    mock_get_summary.return_value = r"(this is a mock summary) the twist of the quasicoherent sheaf $M_\bullet$ on a scheme by the integer $n$."
    append_summary_to_notation_note(notation_note_with_auto_notation_summary_tag, temp_vault, summarizer)

    mock_get_summary.assert_called_once()

    print("\n\nIn this example, the original note is deemed to not have a summary, auto-generated or not, written and hence a summary is generated and written. Note that an `_auto/notation_summary` tag is added :\n\n")
    print(notation_note_with_auto_notation_summary_tag.text())
    assert mock_get_summary.return_value in notation_note_with_auto_notation_summary_tag.text()

The following is the original text of the notation note:

---
detect_regex: []
latex_in_original: ["$M(n)_\\bullet$"]
---
$M(n)_\bullet$ [[twist_of_a_graded_module|denotes]] 


In this example, the original note is deemed to not have a summary, auto-generated or not, written and hence a summary is generated and written. Note that an `_auto/notation_summary` tag is added :


---
detect_regex: []
latex_in_original: ["$M(n)_\\bullet$"]
tags: [_auto/notation_summary]
---
$M(n)_\bullet$ [[twist_of_a_graded_module|denotes]] (this is a mock summary) the twist of the quasicoherent sheaf $M_\bullet$ on a scheme by the integer $n$.

# This example is almost identical to the previous one, except `overwrite_previous_autogenerated_summary` is set to `True`
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir,
      mock.patch('__main__._get_summary') as mock_get_summary,
      # mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note
      ):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)
    notation_note_with_auto_notation_summary_tag = VaultNote(temp_vault, name='foag_notation_M_n_bullet')

    print("The following is the original text of the notation note:\n")
    print(notation_note_with_auto_notation_summary_tag.text())

    mock_get_summary.return_value = r"(this is a mock summary) the twist of the quasicoherent sheaf $M_\bullet$ on a scheme by the integer $n$."
    append_summary_to_notation_note(notation_note_with_auto_notation_summary_tag, temp_vault, summarizer, overwrite_previous_autogenerated_summary=True)

    mock_get_summary.assert_called_once()

    print("\n\nIn this example, the original note is deemed to not have a summary, auto-generated or not, written and hence a summary is generated and written. Note that an `_auto/notation_summary` tag is added :\n\n")
    print(notation_note_with_auto_notation_summary_tag.text())
    assert mock_get_summary.return_value in notation_note_with_auto_notation_summary_tag.text()

The following is the original text of the notation note:

---
detect_regex: []
latex_in_original: ["$M(n)_\\bullet$"]
---
$M(n)_\bullet$ [[twist_of_a_graded_module|denotes]] 


In this example, the original note is deemed to not have a summary, auto-generated or not, written and hence a summary is generated and written. Note that an `_auto/notation_summary` tag is added :


---
detect_regex: []
latex_in_original: ["$M(n)_\\bullet$"]
tags: [_auto/notation_summary]
---
$M(n)_\bullet$ [[twist_of_a_graded_module|denotes]] (this is a mock summary) the twist of the quasicoherent sheaf $M_\bullet$ on a scheme by the integer $n$.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline

model = AutoModelForSeq2SeqLM.from_pretrained('hyunjongkimmath/notation_summarizations_model')
tokenizer = AutoTokenizer.from_pretrained('hyunjongkimmath/notation_summarizations_model')
summarizer = pipeline('summarization', model=model, tokenizer=tokenizer)

with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_7'
    shutil.copytree(_test_directory() / 'test_vault_7', temp_vault)

    notation_note_to_be_summarized = VaultNote(temp_vault, name='foag_notation_O_text_Proj__S__n')
    append_summary_to_notation_note(notation_note_to_be_summarized, temp_vault, summarizer)

    print(notation_note_to_be_summarized.text())

Your max_length is set to 200, but your input_length is only 126. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=63)

---
detect_regex: []
latex_in_original: ["\\mathscr{O}_{\\text {Proj } S_{*}}(n)"]
tags: [_auto/notation_summary]
---
$\mathscr{O}_{\text {Proj } S_{*}}(n)$ [[foag_15.2.1|denotes]] $\mathscr{O}_{\text {Proj} S_{*}}(n)$ where $S_{*}$ is a graded ring generated in degree $1$.

Naming notation notes

Another convenient functionality is to name notation notes automatically, see markdown.obsidian.personal.machine_learning.definition_and_notation_naming.ipynb.