import shutil
import tempfile
from unittest import mock
from fastcore.test import *
# from torch import Tensor
from trouver.helper import _test_directory
markdown.obsidian.personal.machine_learning.notation_summarization
In markdown.obsidian.personal.machine_learning.notation_identification
, we defined functions to gather ML data to identify notations and functions to use an ML model to predict where notations are defined. In how_to.train_ml_model.fastai
, we showed how to train this ML model from the data. markdown.obsidian.personal.notation
also has some functions to create and interact with notation notes.
It would be further convenient to train an ML model to summarize details about notations. This module defines functions to gather data to train such a model.
Gather ML data from notation notes
get_latex_in_original_from_parsed_notation_note_data
get_latex_in_original_from_parsed_notation_note_data (metadata:dict[str], notation_str:str)
Return the string that should be considered the latex_in_original
given part of the output of parse_notation_note
latex_in_original
is intended to either be 1. a substring of the main note in which the notation is introduced, if available, or 2. the notation itself.
If the notation note has a latex_in_original
field in its YAML frontmatter metadata, then the (first) str there is the latex_in_original
.
# The metadata is empty, so the `notation_str` should be considered the `latex_in_original`.
test_eq(r'\mathbb{A}^n_k'),
get_latex_in_original_from_parsed_notation_note_data({}, r'\mathbb{A}^n_k')
= {
test_metadata_1 'detect_regex': [],
'latex_in_original': [
r"c_x: \\mathbb{P}_k^n/(\\mathbb{P}_k^{n-1}) \\to \\mathbb{P}_k^n/(\\mathbb{P}_k^{n} - \\{x\\}) \\cong \\mathbb{P}_k^n/(\\mathbb{P}_k^{n-1})"
]}= r'c_x'
test_notation_str_1
test_eq(
get_latex_in_original_from_parsed_notation_note_data(test_metadata_1, test_notation_str_1),'latex_in_original'][0]) test_metadata_1[
notation_summarization_data_from_note
notation_summarization_data_from_note (notation_note:trouver.markdown.ob sidian.vault.VaultNote, vault:os.PathLike)
Obtain notation summzarization data from the notation note.
None
is returned instead of a `dict if
- the notation is determined to have not been (fully) summarized,
- Currently, this is implemented as follows: a notation note is not fully summarized if it does not have any content beyond
<notation> denotes
or if the notation note has#_meta/TODO
marked in its content (except in the trailing bulleted list of notations used in the notation note)
- Currently, this is implemented as follows: a notation note is not fully summarized if it does not have any content beyond
- the main note of the notation not does not exist or is essentially empty
- Essentially empty means that the
process_standard_information_note
function applied to the aMarkdownFile
object of the main note yields aMarkdownFile
object whose__str__
method returns a string of only blank characters.
- Essentially empty means that the
- the notation note is marked with an
_auto/notation_summary
tag in its YAML frontmatter meta.- If a notation note is marked with an
_auto/notation_summary
tag, then that means that its summary has been auto-generated by an ML model using theappend_summary_to_notation_note
function
- If a notation note is marked with an
The notion of whether the notation “has been summarized” is not exactly implemented, but should be sufficient for gathering data.
This function is mainly used in gather_notation_note_summaries
.
TODO: there are some notation notes drawing information from multiple notes; separate text in notation notes corresponding to each note.
Returns
Union[dict[str, str], None]
- If the notation note is determined to have been summarized (i.e. contains text beyond
<notation> denotes
and does not have the#_meta/TODO
tag) then the output is adict
whose key-value pairs are"Notation note name"
- The name of the notation note"Notation"
- The notation of the note"Latex in original"' - The entry of the
latex_in_originalfield of the note if available, cf. [
make_a_notation_note`](https://hyunjongkimmath.github.io/trouver/markdown.obsidian.personal.notation.html#make_a_notation_note)"Summary"
- The summary of the notation."Main note name"
- The name of the main note of the notation note"Processed main note contents"
- The processed contents of the main note
- Otherwise, the output is `None.
- If the notation note is determined to have been summarized (i.e. contains text beyond
Type | Details | |
---|---|---|
notation_note | VaultNote | |
vault | PathLike | |
Returns | typing.Optional[dict[str, str]] | The keys to the dict are “Notation note name”, “Notation”, “Latex in oiriginal”, “Summary”, and “Main note name”. However, None is returned instead of a dict if the notation note is determined to have not been summarized, if the main note of the notation note does not exist, or the notation note is marked with the _auto/notation_summary tag. |
= _test_directory() / 'test_vault_7'
vault
= VaultNote(vault, name='some_reference_name_notation_Spec_A')
notation_note = notation_summarization_data_from_note(notation_note, vault)
sample_output
del sample_output['Processed main note contents']
test_eq(sample_output,'Notation note name': 'some_reference_name_notation_Spec_A',
{'Notation': '$\\operatorname{Spec} A$',
'Latex in original': '\\operatorname{Spec} A',
'Summary': 'the spectrum of the ring $A$.',
'Main note name': 'spectrum_of_a_ring',
# 'Processed main note contents': 'Let $A$ be a ring.\n\nThe Spectrum $\\operatorname{Spec} A$ is the set of prime ideals of $A$. It is equipped with a topology, called the Zariski topology in which the followings sets, called the distinguished open subsets of $\\operatorname{Spec} A$, give a base for the topology:\n\n$$D(f) = \\{\\mathfrak{p} \\in \\operatorname{Spec} A: f \\not\\in \\mathfrak{p} \\}.$$\n'}
})
If the “main” note of the notation note cannot be identified, then notation_summarization_data_from_note
returns None
:
= _test_directory() / 'test_vault_7'
vault = VaultNote(vault, name='some_reference_name_notation_O_X_this_file_has_no_links')
notation_note print("The following note has no main note link:\n")
print(notation_note.text())
None)
test_eq(notation_summarization_data_from_note(notation_note, vault),
= VaultNote(vault, name='some_reference_name_notation_k_t_formal_power_series_ring')
notation_note print("\n\nThe following note has a main note link, but the main note does not actually exist:\n")
print(notation_note.text())
None) test_eq(notation_summarization_data_from_note(notation_note, vault),
The following note has no main note link:
---
detect_regex: []
latex_in_original: ["\\mathscr{O}_X"]
---
$\mathscr{O}_X$ denotes the structure sheaf of the scheme $X$.
The following note has a main note link, but the main note does not actually exist:
$k[[t]]$ [[some_note|denotes]] the ring of formal power series where $k$ is a field.
# TODO: example with embedded links
# TODO: example of a notation note without any metadata
If the notation note has not been summarized, then notation_summarization_data_from_note
returns None
, even if the main note of the notation note exists:
# TODO: example where main of notation note exists, but has basically no content;
# The function should return `None` in this case.
= _test_directory() / 'test_vault_7'
vault = VaultNote(vault, name='some_reference_name_notation_B_R')
notation_note
print(
"This is what a notation note that has not been summarized basically looks like."
" Note that it has YAML frontmatter metadata and starts a sentence to describe"
" what a certain notation is supposed to denote, but ultimately does not contain"
" an actual summary:\n")
print(notation_note.text())
None) test_eq(notation_summarization_data_from_note(notation_note, vault),
This is what a notation note that has not been summarized basically looks like. Note that it has YAML frontmatter metadata and starts a sentence to describe what a certain notation is supposed to denote, but ultimately does not contain an actual summary:
---
detect_regex: []
latex_in_original: ["\\mathcal{B}(\\mathbb{R})"]
---
$\mathcal{B}(\mathbb{R})$ [[note_with_some_excessive_notation_notes|denotes]]
# TODO: example where notation note has _auto/notation_summary tag; the notation summarization data should not be added.
gather_notation_note_summaries
gather_notation_note_summaries (vault:os.PathLike, notes:list[trouver.markdown.obsidian.vaul t.VaultNote])
Return a pandas.DataFrame
encapsulating the data of notation note summaries.
cf. notation_summarization_data_from_note
, which is the function from which the notation note summaries are drawn.
This function is mainly used in append_to_notation_note_summarization_database
.
Type | Details | |
---|---|---|
vault | PathLike | |
notes | list | |
Returns | DataFrame | Has columns Time added , Time modified , Notation note name , Notation , Latex in original , ‘Summary’, ‘Main note name’, and ‘Processed main note contents’. |
# TODO: example
# TODO: example verifying that outputs of `None` from `notation_summarization_data_from_note`
# are weeded out.
append_to_notation_note_summarization_database
append_to_notation_note_summarization_database (vault:os.PathLike, file:os.PathLike, notes:l ist[trouver.markdown.obsi dian.vault.VaultNote], backup:bool=True)
Either create a csv
file containing data for notation note summarization or append to an existing csv
file.
The columns of the database file are as follows:
Time added
- The time when the row was added.Time modified
- The time when the labels of the rowNotation note name
- The name of the note from which the data for the row was derived.- ‘Notation’ - The notation which is being summarized
- ‘Latex in original’ - The entry of the
latex_in_original
field of the note if available, cf.make_a_notation_note
"Summary"
- The summary of the notation."Main note name"
- The name of the main note of the notation note"Processed main note contents"
- The processed contents of the main note
All timestamps are in UTC time and specify time to minutes (i.e. no seconds/microseconds).
TODO: implement updating rows and rewrite the next paragraph to accurately reflect the implementation. I would like the ‘Notation’, ‘Latex in original’, ‘Summary’, ‘processed main note contents’ to be the “pivot_cols”
If a “new” note has the same processed content as a pre-existing note and anything is different about the “new” note, then update the row of the existing note. In particular, the following are updated: - Time modified (set to current time) - Notation (overwritten) - Latex in original (overwritten) - Summary (overwritten) - Main note name (overwritten) - Processed main note contents (overwritten)
This method assumes that all the processed content in the CSV file are all distinct if the CSV file exists.
Type | Default | Details | |
---|---|---|---|
vault | PathLike | The vault freom which the data is drawn | |
file | PathLike | The path to a CSV file | |
notes | list | the notation notes to consider adding to the database. | |
backup | bool | True | If True , makes a copy of file in the same directory and with the same name, except with an added extension of .bak . |
Returns | None |
# TODO basic example
# TODO: example of appending instead of making an entirely new CSV file
# TODO: example of notation notes existing, but none of them are to be added to the database by virtue of either the notation summary
# not being written or by virtue of their main notes not existing.
# TODO basic example
# TODO: example of appending instead of making an entirely new CSV file
# TODO: example of notation notes existing, but none of them are to be added to the database by virtue of either the notation summary
# not being written or by virtue of their main notes not existing.
Format Data
Given the contents of the CSV file generated in append_to_notation_note_summarization_database
, we create a single column representing the data upon which the ML model should train.
Since a single piece of text can introduce multiple notations, we should indicate which the notation we want summarized. The entry in the Latex in original
column of the CSV file can indicate this notation; as explained in make_a_notation_note
, this entry is not necessarily the notation itself, but rather the substring of the text in which the notation is introduced. For example, in the text
Let $R$ be a ring and let $M$ be a module. We define the dual of $M$ as
$$M^\vee := \operatorname{Hom}_R(R,M).$$
The notation M^\vee
is introduced, and it is in the display math mode string.
single_input_for_notation_summarization
single_input_for_notation_summarization (main_note_content:str, latex_in_original:str, latex_in_ original_comes_first:bool=True)
Format an input for a transformers.pipelines.text2text_generation.SummarizationPipeline
object to summarize a notation introduced in a mathematical text.
The input consists of main_note_content
as well as a part that is formatted as f"latex_in_original: {latex_in_original}"
.
Note that this function is used to format data used to train/validate the summarization model within the SummarizationPipeline
.
Type | Default | Details | |
---|---|---|---|
main_note_content | str | The mathematical text that introduces the notation and from which to summarize a notation. | |
latex_in_original | str | A substring in main_note_content which is a latex string in which the notation is introduced. | |
latex_in_original_comes_first | bool | True | If True , the latex_in_original piece appears before the main_note_content |
Returns | str |
The single_input_for_notation_summarization
function formats an input for the notation summarization model to train on or to predict. Such an input consists of a mathematical excerpt and the latex_in_original
field, which is a latex string containing a notation supposedly introduced in the mathematical excerpt.
= "This is a test content introducing notation: $\mathbb{Z}/n\mathbb{Z}$ denotes this. $\mathbb{F}_q$ denotes that"
test_content = "\mathbb{Z}/n\mathbb{Z}"
latex_in_original
# In this example, `latex_in_orignal_comex_first` is set to `False`.
= single_input_for_notation_summarization(
sample_output
test_content,
latex_in_original,=False)
latex_in_original_comes_firstassert '\n\nlatex_in_original: ' in sample_output
print(sample_output, '\n')
This is a test content introducing notation: $\mathbb{Z}/n\mathbb{Z}$ denotes this. $\mathbb{F}_q$ denotes that
latex_in_original: \mathbb{Z}/n\mathbb{Z}
Originally, this function was implemented in a way that places the latex_in_original
at the end of the string (tantamount to letting latex_in_original_comes_first=False
in the current implementation). Letting latex_in_original_comes_first=False
places the latex_in_original
field before the mathematical excerpt.
This change in implementation is an experimental attempt to improve the model; the author hypothesizes that placing the latex_in_original
field before the mathematical excerpt will improve the model by allowing it to know what notation it is supposed to summarize before considering the mathematical excerpt.
# In this example, `latex_in_orignal_comex_first` is set to `True`.
= single_input_for_notation_summarization(
sample_output
test_content,
latex_in_original,=True)
latex_in_original_comes_firstassert sample_output.startswith('latex_in_original: ')
print(sample_output)
latex_in_original: \mathbb{Z}/n\mathbb{Z}
This is a test content introducing notation: $\mathbb{Z}/n\mathbb{Z}$ denotes this. $\mathbb{F}_q$ denotes that
If latex_in_original
is None, TODO: finish explanation
# TODO: finish example
append_column_for_single_text
append_column_for_single_text (df:pandas.core.frame.DataFrame, latex_in_original_comes_first:bool=True)
Append a column "Single text"
to the notation note summarization DataFrame to represent the input into an ML model as a single text
Type | Default | Details | |
---|---|---|---|
df | DataFrame | Assumed to be structured just as a dataframe of a CSV file created/modified by append_to_notation_note_summarization_database`| | latex_in_original_comes_first | bool | True | This is a parameter to pass to calls to the [ single_input_for_notation_summarization](https://hyunjongkimmath.github.io/trouver/markdown.obsidian.personal.machine_learning.notation_summarization.html#single_input_for_notation_summarization) function. If True, the latex_in_originalpiece appears before the main_note_content` |
|
Returns | None |
= _test_directory() / 'ml_examples' / 'notation_summarizations.csv'
csv_file = pd.read_csv(csv_file)
df df.head()
Time added | Time modified | Notation note name | Notation | Latex in original | Summary | Main note name | Processed main note contents | |
---|---|---|---|---|---|---|---|---|
0 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_Gal_L_K | $\operatorname{Gal}(L/K)$ | $\operatorname{ann} M$ | the Galois group of the Galois field extension... | reference_name_Galois_group | If $L/K$ is a Galois field extension, then its... |
1 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_I_J | $\mathbb{Z}/n\mathbb{Z}$ | $\mathbb{Z}/n\mathbb{Z}$ | The ring of integers modulo $n$. It is defined... | reference_name_Definition 1 | The ring $\mathbb{Z}/n\mathbb{Z}$, called the ... |
2 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_M_vee_dual_of_a_module | $M^\vee$ | $M^\vee$ | the dual of the $R$-module $M$. It is defined ... | reference_name_Definition 17 | Let $M$ be an $R$-module. We define the dual m... |
3 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_F_q | $\mathbb{F}_q$ | $\mathbb{F}_q$ | the finite field of $q$ elements, where $q$ is... | reference_name_Theorem 1 | Theorem. Let $q$ be a prime power. There is a ... |
4 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_X_N | $X(N)$ | $X(N)$ | the quotient of $\mathbb{H}^*$ by $\Gamma(N)$,... | reference_name_Definition 200 | Now we discuss modular curves, whose points pa... |
append_column_for_single_text(df) df
Time added | Time modified | Notation note name | Notation | Latex in original | Summary | Main note name | Processed main note contents | Single text | |
---|---|---|---|---|---|---|---|---|---|
0 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_Gal_L_K | $\operatorname{Gal}(L/K)$ | $\operatorname{ann} M$ | the Galois group of the Galois field extension... | reference_name_Galois_group | If $L/K$ is a Galois field extension, then its... | If $L/K$ is a Galois field extension, then its... |
1 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_I_J | $\mathbb{Z}/n\mathbb{Z}$ | $\mathbb{Z}/n\mathbb{Z}$ | The ring of integers modulo $n$. It is defined... | reference_name_Definition 1 | The ring $\mathbb{Z}/n\mathbb{Z}$, called the ... | The ring $\mathbb{Z}/n\mathbb{Z}$, called the ... |
2 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_M_vee_dual_of_a_module | $M^\vee$ | $M^\vee$ | the dual of the $R$-module $M$. It is defined ... | reference_name_Definition 17 | Let $M$ be an $R$-module. We define the dual m... | Let $M$ be an $R$-module. We define the dual m... |
3 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_F_q | $\mathbb{F}_q$ | $\mathbb{F}_q$ | the finite field of $q$ elements, where $q$ is... | reference_name_Theorem 1 | Theorem. Let $q$ be a prime power. There is a ... | Theorem. Let $q$ be a prime power. There is a ... |
4 | 2023-01-17T01:47 | 2023-01-17T01:47 | reference_name_notation_X_N | $X(N)$ | $X(N)$ | the quotient of $\mathbb{H}^*$ by $\Gamma(N)$,... | reference_name_Definition 200 | Now we discuss modular curves, whose points pa... | Now we discuss modular curves, whose points pa... |
Use ML model to fill in notation notes
After training the model (cf. how_to.train_ml_model.huggingface
), we use it to summarize notations.
fix_summary_formatting
fix_summary_formatting (summary:str)
Fix some latex formatting issues in a summarized text
Currently, the model is inclined to decode and format its summarizations in such a way that creates formatting issues either for LaTeX or Obsidian.md
. For example, the model would output a str containing
\ <command_name>
instead of\<command_name>
{
when{
is preferable$ <latex_string> $
when$<latex_string>$
is needed forObsidian.md
.
The fix_summary_formatting
function attempts to get around some of these issues.
= r'\ to'
text = fix_summary_formatting(text)
sample_output assert r'\to' in sample_output
= r'$d\ in\ mathbb{ Z}_{\ geq 0} $'
text = fix_summary_formatting(text)
sample_output assert r'\in' in sample_output
assert r'\mathbb{Z}' in sample_output
assert r'\geq 0' in sample_output
= r'There are some extra spaces in this math mode string: $ 5 + 7 = 12 $.'
text = fix_summary_formatting(text)
sample_output print(sample_output)
assert r'$5' in sample_output
assert r'12$' in sample_output
There are some extra spaces in this math mode string: $5 + 7 = 12$.
Using a HuggingFace pipeline to summarize a notation
summarize_notation
summarize_notation (main_content:str, latex_in_original:str, summarizer:t ransformers.pipelines.text2text_generation.Summarizat ionPipeline, fix_formatting:bool=True, latex_in_original_comes_first:bool=True)
Summarize a notation introduced in a mathematical text using a huggingface pipeline.
Assumes that main_content
is a mathematical text introducing a notation and that latex_in_original
is a substring of main_content
in which a notation is introduced.
Type | Default | Details | |
---|---|---|---|
main_content | str | ||
latex_in_original | str | ||
summarizer | SummarizationPipeline | ||
fix_formatting | bool | True | If True , run fix_summary_formatting on summarizer ’s summary before retuning it. |
latex_in_original_comes_first | bool | True | This is a parameter to pass to calls to the single_input_for_notation_summarization function. If True , the latex_in_original piece appears before the main_note_content |
Returns | str |
# TODO: example
Adding the summary to the notation note
append_summary_to_notation_note
append_summary_to_notation_note (notation_note:trouver.markdown.obsidian .vault.VaultNote, vault:os.PathLike, sum marizer:transformers.pipelines.text2text _generation.SummarizationPipeline, main_ note:Optional[trouver.markdown.obsidian. vault.VaultNote]=None, overwrite_previou s_autogenerated_summary:bool=False, latex_in_original_comes_first:bool=True)
Summarize a notation introduced in a mathematical text using a huggingface pipeline and append said summarization to notation_note
.
If notation_note
does not have a YAML frontmatter meta or does not have a latex_in_original
field in its YAML frontmatter meta, then the actual notation is used as the latex_in_original
.
The following describes the circumstances under which the summarization does not happen:
- If
main_note
isNone
and no main note ofnotation_note
can be determined via themain_of_notation
function. - If
overwrite_previous_autogenerated_summary
isFalse
andnotation_note
has the_auto/notation_summary
tag in its YAML frontmatter meta (if available). - If
notation_note
does not have a YAML frontmatter meta or the_auto/notation_summary
tag is not present in the YAML frontmatter meta andnotation_note
has nontrivial content (i.e. its content is not merely of the form$<notation>$ [[<link_to_main_note|denotes]]
).
If an auto-generated summary is appended, then this function adds an _auto/notation_summary
tag to the notation note’s YAML frontmatter, if not already present. This tag is intended to be used as follows:
- The presence of this tag tells the reader that the summary has been autogenerated by this function.
- The
notation_summarization_data_from_note
function (and by extension, thegather_notation_note_summaries
andappend_to_notation_note_summarization_database
functions) avoids gathering notation summarization data from notation notes marked with this tag - The owner of the
Obsidian.md
vault can manually make modifications to the notation note if necessary and remove this tag to indicate that the summary is appropriate to be added to the note summarization database.
Type | Default | Details | |
---|---|---|---|
notation_note | VaultNote | ||
vault | PathLike | ||
summarizer | SummarizationPipeline | Contains an ML model which summarizes the notation, see summarize_notation function. |
|
main_note | typing.Optional[trouver.markdown.obsidian.vault.VaultNote] | None | The main note from which the notation comes from. If this is None , then the main_note is obtained via the main_of_notation function. |
overwrite_previous_autogenerated_summary | bool | False | If True , overwrite previously autogenerated summaries |
latex_in_original_comes_first | bool | True | This is a parameter to pass to calls to the single_input_for_notation_summarization function. If True , the latex_in_original piece appears before the main_note_content . While the default value of True is recommended, passing False to this parameter may be necessary to use the older version of the summarization model in the repo notation_summarizations_model . |
Returns | None |
# Test out when a summary is written and when it is not.
# 1. When the main note of the notation note cannot be identified, don't write a
# summary
with (mock.patch('__main__.MarkdownFile.from_vault_note') as mock_mf_from_vault_note,
'__main__.parse_notation_note') as mock_parse_notation_note,
mock.patch('__main__._get_summary') as mock_get_summary,
mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
mock.patch(= None
mock_notation_note = None
mock_metadata = None
mock_main_note_name = None
mock_notation_note_content_mf = mock_metadata, r'$\mathcal{O}_K$', mock_main_note_name, mock_notation_note_content_mf, None
mock_parse_notation_note.return_value
= None
mock_vault
append_summary_to_notation_note(mock_notation_note, mock_vault, summarizer)
mock_get_summary.assert_not_called()
mock_write_summary_to_notation_note.assert_not_called()
# 2. When `overwrite_previous_autogenerated_summary` is `False` and
# `notation_note` has the `_auto/notation_summary` tag in its YAML frontmatter meta
# then don't write a summary
with (mock.patch('__main__.VaultNote') as mock_vaultnote,
'__main__.MarkdownFile.from_vault_note') as mock_mf_from_vault_note,
mock.patch('__main__.parse_notation_note') as mock_parse_notation_note,
mock.patch('__main__._get_summary') as mock_get_summary,
mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
mock.patch(= None
mock_notation_note = {'tags': ['_auto/notation_summary']}
mock_metadata = 'main_note'
mock_main_note_name = None
mock_notation_note_content_mf = mock_metadata, r'$\mathcal{O}_K$', mock_main_note_name, mock_notation_note_content_mf, None
mock_parse_notation_note.return_value
= None
mock_vault
append_summary_to_notation_note(=False)
mock_notation_note, mock_vault, summarizer, overwrite_previous_autogenerated_summary
mock_get_summary.assert_not_called()
mock_write_summary_to_notation_note.assert_not_called()
# 3. When `notation_note` does not have a YAML frontmatter meta or the
# `_auto/notation_summary` tag is not present in the YAML frontmatter meta
# and `notation_note` has nontrivial content, then don't write a summary.
with (mock.patch('__main__.VaultNote') as mock_vaultnote,
'__main__.MarkdownFile.from_vault_note') as mock_mf_from_vault_note,
mock.patch('__main__.parse_notation_note') as mock_parse_notation_note,
mock.patch('__main__._get_summary') as mock_get_summary,
mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
mock.patch(= None
mock_notation_note = {'tags': []}
mock_metadata = 'main_note'
mock_main_note_name = MarkdownFile.from_string('The ring of integers of a number field $K$')
mock_notation_note_content_mf = mock_metadata, r'$\mathcal{O}_K$', mock_main_note_name, mock_notation_note_content_mf, None
mock_parse_notation_note.return_value
# Note that `overwrite_previous_autogenerated_summary` is set to `True``
= None
mock_vault
append_summary_to_notation_note(=True)
mock_notation_note, mock_vault, summarizer, overwrite_previous_autogenerated_summary
mock_get_summary.assert_not_called()
mock_write_summary_to_notation_note.assert_not_called()
# Now set `overwrite_previous_autogenerated_summary` to `False`
= None
mock_vault
append_summary_to_notation_note(=True)
mock_notation_note, mock_vault, summarizer, overwrite_previous_autogenerated_summary
mock_get_summary.assert_not_called()
mock_write_summary_to_notation_note.assert_not_called()
# 4. When the main note is identifiable, the `_auto/notation_summary` tag is not present, and
# `notation_note` has not content, write a summary.
with (mock.patch('__main__.VaultNote') as mock_vaultnote,
'__main__.MarkdownFile.from_vault_note') as mock_mf_from_vault_note,
mock.patch('__main__.parse_notation_note') as mock_parse_notation_note,
mock.patch('__main__._get_summary') as mock_get_summary,
mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
mock.patch(= None
mock_notation_note = {'tags': []}
mock_metadata = 'main_note'
mock_main_note_name = MarkdownFile.from_string('')
mock_notation_note_content_mf = mock_metadata, r'$\mathcal{O}_K$', mock_main_note_name, mock_notation_note_content_mf, None
mock_parse_notation_note.return_value
# Note that `overwrite_previous_autogenerated_summary` is set to `True``
= None
mock_vault
append_summary_to_notation_note(=True)
mock_notation_note, mock_vault, summarizer, overwrite_previous_autogenerated_summary
mock_get_summary.assert_called_once()
mock_write_summary_to_notation_note.assert_called_once()
# 5. When the main note is identifiable, the `_auto/notation_summary` tag is not present, and
# `notation_note` has not content, write a summary.
with (mock.patch('__main__.VaultNote') as mock_vaultnote,
'__main__.MarkdownFile.from_vault_note') as mock_mf_from_vault_note,
mock.patch('__main__.parse_notation_note') as mock_parse_notation_note,
mock.patch('__main__._get_summary') as mock_get_summary,
mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
mock.patch(= None
mock_notation_note = {'tags': []}
mock_metadata = 'main_note'
mock_main_note_name = MarkdownFile.from_string('')
mock_notation_note_content_mf = mock_metadata, r'$\mathcal{O}_K$', mock_main_note_name, mock_notation_note_content_mf, None
mock_parse_notation_note.return_value
# Note that `overwrite_previous_autogenerated_summary` is set to `True``
= None
mock_vault
append_summary_to_notation_note(=True)
mock_notation_note, mock_vault, summarizer, overwrite_previous_autogenerated_summary
mock_get_summary.assert_called_once()
mock_write_summary_to_notation_note.assert_called_once()
# Note that `overwrite_previous_autogenerated_summary` is set to `False`
mock_get_summary.reset_mock()
mock_write_summary_to_notation_note.reset_mock()
append_summary_to_notation_note(=False)
mock_notation_note, mock_vault, summarizer, overwrite_previous_autogenerated_summary
mock_get_summary.assert_called_once()
mock_write_summary_to_notation_note.assert_called_once()
# 6. When the main note is identifiable, the `_auto/notation_summary` tag is present, and
# `overwrite_previous_autogenerated_summary` is `True`, write a summary.
with (mock.patch('__main__.VaultNote') as mock_vaultnote,
'__main__.MarkdownFile.from_vault_note') as mock_mf_from_vault_note,
mock.patch('__main__.parse_notation_note') as mock_parse_notation_note,
mock.patch('__main__._get_summary') as mock_get_summary,
mock.patch('__main__._write_summary_to_notation_note') as mock_write_summary_to_notation_note):
mock.patch(= None
mock_notation_note = {'tags': ['_auto/notation_summary']}
mock_metadata = 'main_note'
mock_main_note_name = MarkdownFile.from_string('the ring of integers of the number field $K$')
mock_notation_note_content_mf = mock_metadata, r'$\mathcal{O}_K$', mock_main_note_name, mock_notation_note_content_mf, None
mock_parse_notation_note.return_value
# Note that `overwrite_previous_autogenerated_summary` is set to `True``
= None
mock_vault
append_summary_to_notation_note(=True)
mock_notation_note, mock_vault, summarizer, overwrite_previous_autogenerated_summary
mock_get_summary.assert_called_once() mock_write_summary_to_notation_note.assert_called_once()
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline
= AutoModelForSeq2SeqLM.from_pretrained('hyunjongkimmath/notation_summarizations_model')
model = AutoTokenizer.from_pretrained('hyunjongkimmath/notation_summarizations_model')
tokenizer = pipeline('summarization', model=model, tokenizer=tokenizer)
summarizer
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
= Path(temp_dir) / 'test_vault_7'
temp_vault / 'test_vault_7', temp_vault)
shutil.copytree(_test_directory()
= VaultNote(temp_vault, name='foag_notation_O_text_Proj__S__n')
notation_note_to_be_summarized
append_summary_to_notation_note(notation_note_to_be_summarized, temp_vault, summarizer)
print(notation_note_to_be_summarized.text())
Your max_length is set to 200, but you input_length is only 138. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=69)
---
detect_regex: []
latex_in_original: ["\\mathscr{O}_{\\text {Proj } S_{*}}(n)"]
tags: [_auto/notation_summary]
---
$\mathscr{O}_{\text {Proj } S_{*}}(n)$ [[foag_15.2.1|denotes]] the graded ring $generated in degree 1, where$S_\bullet$is a graded$ring. It is defined as $\widehat{S(n)\$.