# TODO: exmaples
markdown.obsidian.personal.machine_learning.definition_identification
Functions for finding definitions
Gather ML data from information notes
definitions_in_text
definitions_in_text (text:str)
Return the list of str with the definitions in the text.
definition_identification_data_from_note
definition_identification_data_from_note (note:trouver.markdown.obsidian .vault.VaultNote, vault:os.PathLike)
Obtain definition identification data from the information note.
Type | Details | |
---|---|---|
note | VaultNote | |
vault | PathLike | |
Returns | typing.Optional[dict[str, str]] | The keys to the dict are “Note name”, “Raw text”, “Definitions”. However, None is returned if note does not exist. |
# TODO: examples
gather_definition_identification_data
gather_definition_identification_data (vault:os.PathLike, notes:list[trouver.markdown.obsidi an.vault.VaultNote])
Return a pandas.DataFrame
encapsulating the data of definition identifications.
cf. definition_identification_data_from_note
, which is the function with which the definition identification data is drawn.
This function is mainly used in append_to_definition_identification_database
.
Type | Details | |
---|---|---|
vault | PathLike | |
notes | list | |
Returns | DataFrame |
# TODO: examples
def append_to_definition_identification_database(
# The vault from which the data is drawn
vault: PathLike, file: PathLike, # The path to a CSV file
list[VaultNote], # The notation notes to consider adding to the database
notes: bool = True # If `True`, makes a copy of `file` in the same directory and with the same name, except with an added extension of `.bak`.
backup: -> None:
) """
Either create a `csv` file containing data for definition
identification or append to an existing `csv` file.
The columns of the database file are as follows:
- `Time added` - The time when the row was added.
- `Time modified` - The time when the labels of the row
- `Notation note name` - The name of the note from which the data for the row
was derived.
- 'Notation' - The notation which is being summarized
- 'Latex in original' - The entry of the `latex_in_original` field of the
note if available, cf. `make_a_notation_note`
- `"Summary"` - The summary of the notation.
- `"Main note name"` - The name of the main note of the
notation note
- `"Processed main note contents"` - The processed contents of the
main note
All timestamps are in UTC time and specify time to minutes
(i.e. no seconds/microseconds).
TODO: implement updating rows and rewrite the next paragraph to
accurately reflect the implementation. I would like the 'Notation', 'Latex in original',
'Summary', 'processed main note contents' to be the "pivot_cols"
If a "new" note has the same processed content as a pre-existing
note and anything is different about the "new" note, then update
the row of the existing note. In particular, the following are updated:
- Time modified (set to current time)
- Notation (overwritten)
- Latex in original (overwritten)
- Summary (overwritten)
- Main note name (overwritten)
- Processed main note contents (overwritten)
This method assumes that all the processed content in the
CSV file are all distinct if the CSV file exists.
"""
if not notes:
return
file = Path(file)
= pd.read_csv(file) if os.path.exists(file) else None
ddf = gather_definition_identification_data(vault, notes)
new_df if new_df.empty:
return
= [
cols
]