markdown.markdown.file

Functions for parsing markdown files
from collections import OrderedDict
import os
from pathlib import Path 
import tempfile
import shutil

from fastcore.test import *
from nbdev.showdoc import show_doc

Frontmatter meta

/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/fastcore/docscrape.py:225: UserWarning: potentially wrong underline length... 
```markdown 
--- in 
Return ranges in the markdown text string where front matter meta occurs.
...
  else: warn(msg)

source

find_front_matter_meta_in_markdown_text

 find_front_matter_meta_in_markdown_text (text:str)

Return ranges in the markdown text string where front matter meta occurs.

text is assumed to start with the front matter meta. The front matter meta is of the following YAML format.

---
key: entry
---
Type Details
text str
Returns Union[tuple[int], None] Each tuple is of the form (a,b) where text[a:b] is a markdown front matter. This list will be of length at most 1 because the front matter must come at the start of the markdown document.

The frontmatter meta in an Obsidian Markdown note is surrounded by --- and must be at the top. We can identify the frontmatter meta in some text:

sample_text = r'''---
cssclass: clean-embeds
aliases: []
tags: [_meta/TODO/change_title, _meta/definition, _meta/literature_note]
---
# Topic[^1]

This is some text. Lalalalala

# See Also

# Meta
## References
![[_reference_foag]]

## Citations and Footnotes
[^1]: Some citation'''

start, end = find_front_matter_meta_in_markdown_text(sample_text)
just_frontmatter = sample_text[start:end]
print(just_frontmatter)
assert just_frontmatter.startswith('---')
assert just_frontmatter.endswith('---')
assert 'cssclass:' in just_frontmatter
assert 'aliases:' in just_frontmatter
assert 'tags:' in just_frontmatter
---
cssclass: clean-embeds
aliases: []
tags: [_meta/TODO/change_title, _meta/definition, _meta/literature_note]
---

If there is no frontmatter meta, then find_front_matter_meta_in_markdown_text returns None.

sample_text = r'''
There is no frontmatter meta here.
'''

assert find_front_matter_meta_in_markdown_text(sample_text) is None


sample_text = r'''
---
aliases: []
--
Notice that the front matter meta is incorrectly formatted!
'''

assert find_front_matter_meta_in_markdown_text(sample_text) is None
# hide 
# Some additional tests
sample_text_1 = ''
assert find_front_matter_meta_in_markdown_text(sample_text_1) is None
sample_text_2 = "hello I know a song that gets on everybody\'s nose"
assert find_front_matter_meta_in_markdown_text(sample_text_2) is None
sample_text_3 = '---\n---'
start, end = find_front_matter_meta_in_markdown_text(sample_text_3)
assert sample_text_3[start:end] == sample_text_3
sample_text_4 = '---\n---\n---'
start, end = find_front_matter_meta_in_markdown_text(sample_text_4)
assert sample_text_4[start:end] == '---\n---'

source

dict_to_metadata_lines

 dict_to_metadata_lines (data:dict[str,typing.Union[str,list[str]]],
                         enquote_entries_in_fields=list[str])

Convert a dict to a list of str of yaml frontmatter metadata that Obsidian recognizes.

This function is used in MarkdownFile.replace_metadata.

Type Default Details
data dict[str, Union[str, list[str]]] The keys are str of the labels/names of the metadata. The values are the metadata, which are usually str or list.
enquote_entries_in_fields GenericAlias list A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of new_metadata, then that string is essentially ignored (in particular, no errors are raised).
Returns list[str] Each str entry is the line for the yaml frontmatter metadata of an Obsidian Markdown note.

dict_to_metadata_lines takes a dictionary and converts it to a string usable as Obsidian Markdown frontmatter meta.

sample_dict = OrderedDict([
            ('cssclass', 'clean-embeds'),
            ('aliases', []),
            ('tags', ['_meta/literature_note', '_meta/research', '_meta/self_written'])])
sample_output = dict_to_metadata_lines(sample_dict, [])
test_eq(sample_output,
    ['cssclass: clean-embeds', 'aliases: []', 'tags: [_meta/literature_note, _meta/research, _meta/self_written]'])

# sample_lines = dict_to_metadata_lines(sample_dict)
# sample_output = 
# print('\n'.join(sample_lines))
# assert sample_lines == sample_output

We can specify entries of fields to be enquoted. This might be necessary if a string needs to be escaped (because it has a character such as the backslash r'\', the mid slash r'|', a square bracket r'[', etc.). For example, if a field is expected to have LaTeX strings, then it is good practice to specify such strings to be enquoted:

sample_dict = OrderedDict([
    ('latex_in_original', [r'\\mathscr{O}_{\\text {Proj } S_{*}}(n)'])
])
sample_output = dict_to_metadata_lines(sample_dict, enquote_entries_in_fields=['latex_in_original'])
test_eq(sample_output, ['latex_in_original: ["\\\\\\\\mathscr{O}_{\\\\\\\\text {Proj } S_{*}}(n)"]'])

# An example of an entry that is not a string
sample_dict = OrderedDict([
    ('latex_in_original', [1])
])
sample_output = dict_to_metadata_lines(sample_dict, enquote_entries_in_fields=['latex_in_original'])
test_eq(sample_output, ['latex_in_original: ["1"]'])

source

parse_metadata_string

 parse_metadata_string (metadata_str:str, raise_error:bool=True,
                        raise_warning:bool=True)

Attempt to parse the string for YAML frontmatter metadata of an Obsidian Markdown note.

Raises

  • ValueError
    • If raise_error is True and if any yaml.YAMLError exceptions are raised when reading (i.e. parsing or scanning the YAML metadata. In doing so, metadata_str is printed. Moreover, the appropriate yaml.YAMLError (e.g. a yaml.parser.ParserError, yaml.scanner.ScannerError, or yaml.reader.ReaderError) is also raised.
  • Warning
    • If raise_error is False and raise_warning is True and if any yaml.YAMLError exceptions are raise when reading.
Type Default Details
metadata_str str The string for YAML frontmatter metadata of an Obsidian Markdown note
raise_error bool True If True, then raise an Error.
raise_warning bool True If raise_error is false and raise_warning is True, then raise a warning message.
Returns Union[dict[str], None] The keys are str of the labels/names of the metadata. The values are the metadata, which are usually str or list. If the YAML metadata string cannot be parsed, then this return value is None.

The parse_metadata_string function attempts to parse the string for YAML frontmatter metadata of an Obsidian Markdown note. If the argument can be parsed via the yaml.safe_load function, then the output of the yaml.safe_load invocation is returned:

good_metadata_str = r"""
field: [hi, bye]
fieldy: ooh
"""

metadata_output = parse_metadata_string(good_metadata_str)
assert type(metadata_output) == dict
test_eq(metadata_output, {'field': ['hi', 'bye'], 'fieldy': 'ooh'})

If the YAML frontmatter metadata string cannot be parsed, by the yaml.safe_load function, then an error or a warning can be optionally raised. If raise_error is False, then parse_metadata_string returns None.

bad_metadata_str = "some_metadata_field: [\badly_formatted_string]"  # ReaderError
with (ExceptionExpected(ValueError)):
    metadata_output = parse_metadata_string(bad_metadata_str, raise_error=True)
# bad_metadata_str = "field: field2:" # ScannerError

test_warns(lambda: parse_metadata_string(bad_metadata_str, raise_error=False, raise_warning=True))

metadata_output = parse_metadata_string(bad_metadata_str, raise_error=False, raise_warning=False)
test_eq(metadata_output, None)
yaml_string = 'latex_from_original: ["[t]", "hi"]'
data = yaml.safe_load(yaml_string)
yaml.dump(data, default_flow_style=True)
"{latex_from_original: ['[t]', hi]}\n"
yaml.dump(r'\begin{align*}  \end{align*}')
'\\begin{align*}  \\end{align*}\n...\n'
yaml.dump('[t]')
"'[t]'\n"

MarkdownFile class


source

MarkdownLineEnum

 MarkdownLineEnum (value, names=None, module=None, qualname=None,
                   type=None, start=1)

An enumeration.

The following are the members of the MarkdownLineEnum class:

for line_type in MarkdownLineEnum:
    print(line_type.name)
DEFAULT
HEADING
CODE_BLOCK
META
ORDERED_LIST
UNORDERED_LIST
BLOCKQUOTE
HORIZONTAL_RULE
COMMENT
BLANK_LINE
UNKNOWN
FOOTNOTE_DESCRIPTION
DISPLAY_LATEX_SINGLE
DISPLAY_LATEX_START
DISPLAY_LATEX_END
DISPLAY_LATEX

source

MarkdownFile

 MarkdownFile (parts:list[dict[str,Union[MarkdownLineEnum,str]]])

Parses and represents the contents of an Obsidian styled Markdown file.

The Markdown file must be formatted in certain ways. In general, text components of different types (see MarkdownLineEnum) must be on different lines - no text components of different types may occupy the same line. In particular,

  • Comments (surrounded by %%) must not be on the same line as non-comments.
  • Display math mode LaTeX (surrounded by $$) must not be on the same line as non-In line LaTeX.

Attributes

  • text - str
  • parts - list[dict[str, Union[MarkdownLineEnum, str]]]
    • Represents the lines of the markdown file. Each dict has two keys, 'type' and 'line', which respectively hold a MarkdownLineEnum and a str as values. Each value of 'line' does not includes a new line character \n at the end by default.

Example text and example vault used in the rest of the page

The examples demonstrated for the MarkdownFile class here will be based upon the following text:

template_text = """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation"""


text_1 = """
# Section 1
some text 

asdfasdf

## Subsection a
Didididi
Dododododo
# Section 2
"""

text_2 = """---
tags: [_meta/definition, _meta/concept, _auto/_meta/notation, this_tag_will_be_removed]
---
# Topic
This is some note with some stuff.
"""

text_3 = """
# Section 1
Some stuff

#_meta/question Will `remove_in_line_tags` remove this line? Yes it will!

# Hello

 #tag Will `remove_in_line_tags` remove this line? Yes it will!
"""

text_4 = """
# Some thing

I have a [[this is a note#this is an anchor in the note|link]]

## Another topic
This is a link without a specified display text: [[some_kind_of_note]].

This is a link to an anchor without a specified display text: [[another_note#another anchor]].
"""
    
text_5 = """# A header
![[This note is embedded]].

The link above should will not be replaced by `replace_links_with_display_text`,
unless `remove_embedded_note_links` is set to `True`."""
    
text_6 = """
# Header

I want to link to some embedded note[^1]

[^1]: ![[link_to_embedded_note_1]]

You can also let the footnote mention be alphanumeric[^1][^note]

[^note]: ![[link_to_embedded_note_2]]
"""
    
text_7 = r"""---
cssclass: clean-embeds
aliases: []
tags: [_meta/concept, _meta/literature_note]
---
# Grothendieck-Witt ring elements of a finite field are given by rank and discriminant up to squares[^1]

$\operatorname{GW}(\mathbb{F}_q)\cong \mathbb{Z}\times \mathbb{F}_q^{\times}/(\mathbb{F}_q^{\times})^2$[^2] where the isomorphism is given by the rank and discriminant.


[^2]: ![[notation_GW_k_Grothendieck_Witt_ring_of_a_field]]


# See Also
- [[grothendieck_witt_ring_of_a_polynomial_ring_over_a_field_is_isomorphic_to_that_of_the_field]]

# Meta
## References
![[_reference_pauli_wickelgren_aa1]]


## Citations and Footnotes
[^1]: Pauli, Wickelgren, Example 3.7, Page 4"""


text_8 = r"""---
cssclass: clean-embeds
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.
"""

text_9 = r"""
This LaTeX Equation has `**` surrounding it:
**$$\mathcal{O}_X$$**

This LaTeX Equation has an id:
$$5 \neq 7$$ ^221b51

This LaTeX Equation also has an id:
$$5 \neq 7
$$ ^221b51
This is the end
"""

text_10 = r"""This is a single line display math mode LaTeX equation:

$$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:

$$
5 + 2 = 7
$$

These are multiple consecutive display math mode LaTeX equations:

$$1+1 = 2
$$
$$5 + 7 = 14$$
$$

8 + 4 = 12

$$
"""

text_11 = r"""$$asdf$$
$$asdf$$
$$asdf$$
After text."""

text_12 = r"""

"""

We also use the multiple example vaults.

The following vault will be used mainly for some basic file interactions of the MarkdownFile class.

.
└── algebraic_geometry
    ├── a1_homotopy_theory
    │   ├── pauli_wickelgren_aa1
    │   │   ├── 3_the_grothendieck_witt_ring_of_k
    │   │   │   └── pauli_wickelgren_aa1_example 3.7.md
    │   │   └── _index_pauli_wickelgren_aa1.md
    |   └── _index_a1_homotopy_theory.md
    └── _index_algebraic_geometry.md

pauli_wickelgren_aa1_example 3.7.md will contain the contents of text_7.

def make_example_vault(temp_dir: PathLike):
    temp_dir = Path(temp_dir)
    os.mkdir(temp_dir / 'algebraic_geometry')
    os.mkdir(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory')
    os.mkdir(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1')
    os.mkdir(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k')

    (temp_dir / 'algebraic_geometry' / '_index_algebraic_geometry.md').touch()
    (temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / '_index_a1_homotopy_theory.md').touch()
    (temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '_index_pauli_wickelgren_aa1.md').touch()
    (temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k' / 'pauli_wickelgren_aa1_example 3.7.md').touch()

    with open((temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k' / 'pauli_wickelgren_aa1_example 3.7.md'), 'w') as writer:
        writer.write(
            text_7
        )

# with tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir:
#     make_example_vault(temp_dir)
#     os.startfile(os.getcwd())
#     input()

We use the following example vault to demonstrate examples of embedded notes and other functionalities:

.
├── note_which_becomes_entirely_embedded_1.md
├── note_with_embedded_links_1.md
└── note_with_paragraphs_that_are_embedded_1.md
def make_example_vault_2(temp_dir: PathLike):
    temp_dir = Path(temp_dir)

    file_1 = temp_dir / 'note_which_becomes_entirely_embedded_1.md'
    file_2 = temp_dir / 'note_with_embedded_links_1.md'
    file_3 = temp_dir / 'note_with_paragraphs_that_are_embedded_1.md'

    file_1.touch()
    file_2.touch()
    file_3.touch()

    with open(file_1, 'w') as writer:
        writer.write(
            r"""Hello, this is a note which becomes entirely embedded.

The comment is not visible.
This is the end of the comment %%"""
        )
        
    with open(file_2, 'w') as writer:
        writer.write(
            r"""This is a note.

There are some embedded text here:

![[note_which_becomes_entirely_embedded_1]]


![[note_with_paragraphs_that_are_embedded_1#^65809f]]

![[note_with_paragraphs_that_are_embedded_1#^221b51]]

![[note_with_paragraphs_that_are_embedded_1#Section]]"""
        )

    
    with open(file_3, 'w') as writer:
        writer.write(
            r"""This paragraph becomes embedded.

# Thank you for watching 
cheese 
bandit
$$asdf$$
asdf 
^65809f



This paragraph has not id.

$$5 \neq 7
$$ 
^221b51

# This section has an id ^123456

# This section has no id 
^fff123

$$\mathcal{O}_X$$  ^latexthing

# Section
Some kind of section?

Lalalala
## Subsection
argonaut
# Section
Maybe?
""")

Constructing a MarkdownFile object


source

MarkdownFile.from_vault_note

 MarkdownFile.from_vault_note
                               (vn:trouver.markdown.obsidian.vault.VaultNo
                               te)

Return a MarkddownFile object from a VaultNote object.

Raises - FileNotFoundError - If vn represents a note file which does not exist.


source

MarkdownFile.from_file

 MarkdownFile.from_file (file_path:os.PathLike)

Return a MarkdownFile object from a specified file.

Raises - FileNotFoundError - If file_path points to a file which does not exist.


source

MarkdownFile.from_list

 MarkdownFile.from_list (list_of_lines:list[str])

Return a MarkdownFile object from a list of lines.

This may not work correctly if the markdown text is not sufficiently well-formatted. These formattings include: - comments must start the line with '%%'. - comments must end with '%%' followed by whitespaces and nothing else. - indents should be done with tabs?


source

MarkdownFile.from_string

 MarkdownFile.from_string (text:str)

Return a MarkdownFile object from a str.

The most convenient way to construct a MarkdownFile object is by the MarkdownFile.from_vault_note factory method.

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault(temp_dir)

    vault_note = VaultNote(temp_dir, name='pauli_wickelgren_aa1_example 3.7')
    mf = MarkdownFile.from_vault_note(vault_note)
    test_eq(str(mf), text_7)

    vault_note = VaultNote(temp_dir, rel_path='does_not_exist.md')
    assert not vault_note.exists()
    with ExceptionExpected(ex=FileNotFoundError):
        mf = MarkdownFile.from_vault_note(vault_note)

We can similarly construct a MarkdownFile by the MarkdownFile.from_file factory method without having to use a VaultNote object.

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault(temp_dir)

    temp_dir = Path(temp_dir)
    mf = MarkdownFile.from_file(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k' / 'pauli_wickelgren_aa1_example 3.7.md')

    assert not os.path.exists(temp_dir / 'does_not_exist.md')
    with ExceptionExpected(ex=FileNotFoundError):
        mf = MarkdownFile.from_file(temp_dir / 'does_not_exist.md')

If the list of lines of the Markdown file are available, then the MarkdownFile.from_list factory method can be used.

Similarly, if the entire string of the Markdown file is available, then the MarkdownFile.from_str factory method can be used.

list_of_lines = template_text.splitlines()
template_mf_1 = MarkdownFile.from_list(list_of_lines)
template_mf_2 = MarkdownFile.from_string(template_text)

print(str(template_mf_1))
test_eq(str(template_mf_1), str(template_mf_2))
---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation

Getting headings of a MarkdownFile object

In a Markdown file, one can set headings. In fact, you can consider the text here as text rendered with Markdown! More generally, you can type Markdown in Jupyter notebooks.

For example, typing the following text

There are multiple methods in the MarkdownFile class which retrieve the headings of a Markdown file and their locations.

template_mf = MarkdownFile.from_string(template_text)

source

MarkdownFile.get_headings

 MarkdownFile.get_headings (levels:Union[int,Iterator[int],NoneType]=None,
                            include_start:bool=True)

Return a list of heading titles in the markdown file.

Type Default Details
levels Union[int, Iterator[int], None] None The levels of the headings to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. Defaults to None, in which case all heading-levels are searched.
include_start bool True If True and if this object contains text that is not under a heading (i.e. the text does not start with a heading), then include -1 as a key with the empty str as value.
Returns list[str] Each str is the heading, including leading sharps '#'.

The get_headings function returns only a list of headings.

print(template_mf.get_headings())
assert template_mf.get_headings() == [
    '# Topic[^1]', '# See Also', '# Meta', '## References', '## Citations and Footnotes', '']

assert template_mf.get_headings((3,4,6), include_start=True) == ['']
['# Topic[^1]', '# See Also', '# Meta', '## References', '## Citations and Footnotes', '']

The empty heading '' is returned when include_start=True and there is text belonging to no heading - this can only happen at the start of the Markdown file before any headings are specified. Any YAML frontmatter meta is considered as “text belonging to no heading”.

Setting include_start=False excludes the empty heading altogether.

assert template_mf.get_headings(include_start=False) == [
    '# Topic[^1]', '# See Also', '# Meta', '## References', '## Citations and Footnotes']

The parameter levels specifies which level headings to return. The argument passed to levels does not affect whether or not the empty heading '' is included.

assert template_mf.get_headings(levels=1, include_start=True) == [
    '# Topic[^1]', '# See Also', '# Meta', '']
assert template_mf.get_headings(levels=1, include_start=False) == [
    '# Topic[^1]', '# See Also', '# Meta']
# Since list(range(2,6)) == [2, 3, 4, 5], the below returns all headings of levels 2, 3, 4, 5.
assert template_mf.get_headings(levels=range(2,6), include_start=True) == [
    '## References', '## Citations and Footnotes', '']

source

MarkdownFile.get_headings_by_line_number

 MarkdownFile.get_headings_by_line_number
                                           (levels:Union[Iterator[int],int
                                           ,NoneType]=None,
                                           include_start:bool=True)

Return a dict of heading titles in the markdown file.

Type Default Details
levels Union[Iterator[int], int, None] None The levels of the headings to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If None then all heading-levels are searched.
include_start bool True If True and if this object contains text that is not under a heading (i.e. the text does not start with a heading), then include -1 as a key with the empty str as value.
Returns dict[int, str] The keys are line numbers and each value is str is the heading string, including the leading sharps '#', but without any leading or trailing whitespace characters.

The get_headings_by_line_number function returns a dict whose keys are line numbers to headers and whose corresponding values are the full header str.

Similarly as with get_headings, setting include_start=True includes the empty header; the corresponding line number is always -1.

line_numbers_and_headings = template_mf.get_headings_by_line_number()
print(line_numbers_and_headings)
assert line_numbers_and_headings == {
5: '# Topic[^1]',
 7: '# See Also',
 9: '# Meta',
 10: '## References',
 12: '## Citations and Footnotes',
 -1: ''}

assert template_mf.get_headings_by_line_number(include_start=False) == {
5: '# Topic[^1]',
 7: '# See Also',
 9: '# Meta',
 10: '## References',
 12: '## Citations and Footnotes', }
{5: '# Topic[^1]', 7: '# See Also', 9: '# Meta', 10: '## References', 12: '## Citations and Footnotes', -1: ''}
# TODO: add more examples

source

MarkdownFile.get_headings_and_text

 MarkdownFile.get_headings_and_text
                                     (levels:Union[Iterator[int],int,NoneT
                                     ype]=None, include_start:bool=True)

Return a list of headings and the text under each heading.

The text under each heading does not include the text of subheadings.

Type Default Details
levels Union[Iterator[int], int, None] None The levels of the headings to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If None, then all heading-levels are searched.
include_start bool True If True and if this object contains text that is not under a heading (i.e. the text does not start with a heading), then include -1 as a key with the empty str as value.
Returns dict[str, str] Each key is the entire str of the heading, including the leading sharps '#', but not including leading or trailing whitespace characters Each value is the str under that heading until the next heading, including at trailing next line characters \n. If include_start is True, then one of the keys is the empty str and the corresponding value is the start of the text that is not under any heading.

The get_headings_and_text function returns a dict whose keys are full headers and whose values are text under the headers.

headings_and_text = template_mf.get_headings_and_text()
print(headings_and_text)
assert headings_and_text == { '': '---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note]\n---',
 '# Topic[^1]': '',
 '# See Also': '',
 '# Meta': '',
 '## References': '',
 '## Citations and Footnotes': '[^1]: Citation' }

print(template_mf.get_headings_and_text(None, False))
assert template_mf.get_headings_and_text(None, False) == {
 '# Topic[^1]': '',
 '# See Also': '',
 '# Meta': '',
 '## References': '',
 '## Citations and Footnotes': '[^1]: Citation' }
{'': '---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note]\n---', '# Topic[^1]': '', '# See Also': '', '# Meta': '', '## References': '', '## Citations and Footnotes': '[^1]: Citation'}
{'# Topic[^1]': '', '# See Also': '', '# Meta': '', '## References': '', '## Citations and Footnotes': '[^1]: Citation'}
# TODO: add more examples

source

MarkdownFile.get_headings_tree

 MarkdownFile.get_headings_tree ()

Return a dict representing the tree of headings in the markdown file.

Returns

  • dict[Union[str, int], Union[str, dict]]
    • The keys are 1. line numbers or 2. the str 'title'. The values are dict or str (the blank str if root node) respectively. The dicts in themselves recursively represent trees and the str are headings, including the leading sharps. In particular, the root level dict also has the blank string '' associated to the key 'title'.
headings_tree = template_mf.get_headings_tree()
print(headings_tree)
assert headings_tree == {
    'title': '',
    5: {'title': '# Topic[^1]'},
    7: {'title': '# See Also'},
    9: {'title': '# Meta',
        10: {'title': '## References'},
        12: {'title': '## Citations and Footnotes'}
        }
}
{'title': '', 5: {'title': '# Topic[^1]'}, 7: {'title': '# See Also'}, 9: {'title': '# Meta', 10: {'title': '## References'}, 12: {'title': '## Citations and Footnotes'}}}

source

MarkdownFile.get_line_number_of_heading

 MarkdownFile.get_line_number_of_heading (title:Optional[str]=None,
                                          from_line:int=0, levels:Union[It
                                          erator[int],int,NoneType]=None)

Return the line number of the heading with the specified title after the specified line number.

Type Default Details
title Union[str, None] None Title of the heading. Does not include the leading sharps ('#'). If None, then return the line number of any heading after the specified line number.
from_line int 0 The line number to start searching for the heading with title from.
levels Union[Iterator[int], int, None] None The levels of the heading to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If None, then all heading-levels are searched.
Returns int An index in self.parts. If no index/line number of the matching heading exists, then return -1.

Note that the argument to title does not include the starting hashtags #.

line_number = template_mf.get_line_number_of_heading(title='See Also')
assert line_number == 7

If the heading of the specified title does not exist, then -1 is returned.

assert template_mf.get_line_number_of_heading(title='Nonexistent title') == -1

We can search for headers of specified titles from specified lines onward:

assert template_mf.get_line_number_of_heading(title='Topic[^1]', from_line=3) == 5
assert template_mf.get_line_number_of_heading(title='Topic[^1]', from_line=6) == -1

We can also specify the levels that the header must be:

assert template_mf.get_line_number_of_heading(title='Topic[^1]', levels=(1,2,6)) == 5
assert template_mf.get_line_number_of_heading(title='Topic[^1]', levels=(3, 5)) == -1

source

MarkdownFile.get_line_numbers_under_heading

 MarkdownFile.get_line_numbers_under_heading (title:Optional[str]=None,
                                              from_line:int=0, levels:Unio
                                              n[Iterator[int],int,NoneType
                                              ]=None, include_subheadings:
                                              bool=True)

Return the line numbers belonging to the heading.

Type Default Details
title Union[str, None] None Title of the heading. Does not include the leading sharps ('#'). If None, then return the line number of any heading after the specified line number.
from_line int 0 The line number to start searching for the heading with title from.
levels Union[Iterator[int], int, None] None The levels of the heading to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If None, then all heading-levels are searched.
include_subheadings bool True If True, then include the subheadings.
Returns Union[tuple[int], int] (start, end) where self.parts[start:end] represents the parts under the heading, including the start of the heading. If the heading of the specified title does not exist, then returns -1.
print(template_text)
template_mf = MarkdownFile.from_string(template_text)
assert template_mf.get_line_numbers_under_heading(title='Topic[^1]') == (5,7)
assert template_mf.get_line_numbers_under_heading(title='See Also') == (7,9)
assert template_mf.get_line_numbers_under_heading(title='Meta') == (9,14)
assert template_mf.get_line_numbers_under_heading(title='References') == (10,12)
assert template_mf.get_line_numbers_under_heading(title='Citations and Footnotes') == (12,14)
---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation

If include_subheadings=False, then the line numbers for only the section without any subsections is returned.

assert template_mf.get_line_numbers_under_heading(title='Topic[^1]', include_subheadings=False) == (5,7)
assert template_mf.get_line_numbers_under_heading(title='See Also', include_subheadings=False) == (7,9)
assert template_mf.get_line_numbers_under_heading(title='Meta', include_subheadings=False) == (9,10)
assert template_mf.get_line_numbers_under_heading(title='References', include_subheadings=False) == (10,12)
assert template_mf.get_line_numbers_under_heading(title='Citations and Footnotes', include_subheadings=False) == (12,14)

Adding/removing lines in a MarkdownFile object

# TODO examples of insert_line, remove_line, pop_line, add_line_to_end, add_blank_line_to_end, add_line_in_section

source

MarkdownFile.insert_line

 MarkdownFile.insert_line (index:int,
                           line_dict:dict[str,typing.Union[__main__.Markdo
                           wnLineEnum,str]])

Add a line at the specified index/line number to self.parts.

Type Details
index int The index at which to add line_dict into self.parts.
line_dict dict[str, Union[MarkdownLineEnum, str]] See self.parts.
Returns None

source

MarkdownFile.remove_line

 MarkdownFile.remove_line (index:int=-1)

Remove a line from self.parts.

Type Default Details
index int -1 The index of the line to remove from self.parts.
Returns None

source

MarkdownFile.remove_lines

 MarkdownFile.remove_lines (start:int, end:int)

Remove lines from self.parts.

Type Details
start int The index of the first line to remove from self.parts.
end int The end index to remove; the line of index end is not removed.
Returns None

source

MarkdownFile.pop_line

 MarkdownFile.pop_line (index:int=-1)

Remove a line from self.parts and get its value.

Type Default Details
index int -1 The index of the line to pop from self.parts.
Returns dict[str, Union[MarkdownLineEnum, str]] The popped line

source

MarkdownFile.add_line_to_end

 MarkdownFile.add_line_to_end
                               (line_dict:dict[str,typing.Union[__main__.M
                               arkdownLineEnum,str]])

Add a line to the end of self.parts.

Type Details
line_dict dict[str, Union[MarkdownLineEnum, str]] See self.parts.
Returns None

source

MarkdownFile.add_blank_line_to_end

 MarkdownFile.add_blank_line_to_end ()

Add a blank line to the end of self.parts.


source

MarkdownFile.add_line_in_section

 MarkdownFile.add_line_in_section (title:str,
                                   line_dict:dict[str,typing.Union[__main_
                                   _.MarkdownLineEnum,str]],
                                   start:bool=True)

Add a line in section specified by its title.

Type Default Details
title str Title of the heading (without the leading sharps '#')
line_dict dict[str, Union[MarkdownLineEnum, str]] The line to add
start bool True If True, add to the start of the section. If False, add to the end of the section.
Returns None

Removing or clearing sections in a MarkdownFile object


source

MarkdownFile.remove_section

 MarkdownFile.remove_section (title:str)

Remove the section with the specified title, including subsections, if the section exists.

Type Details
title str The title of the section to remove (without the starting '#'’s)
Returns None

The remove_section method removes all lines belonging to a section, including subsections.

# TODO remove_section, clear_section, clear_all_sections
template_mf = MarkdownFile.from_string(template_text)
template_mf.remove_section('Topic[^1]')
assert len(template_mf.parts) == 12
template_mf.remove_section('Meta')  # This removes subsections too!
assert str(template_mf) == """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# See Also
"""

Attempting to remove a Non-existent section does nothing.

mf_2 = MarkdownFile.from_string(template_text)
mf_2.remove_section('Non existing section')
assert str(mf_2), template_text

source

MarkdownFile.clear_section

 MarkdownFile.clear_section (title:str, leave_blank_line:bool=True,
                             clear_subsections:Optional[str]=None)

Clear the section with the specified title, if it exists.

Does not clear subsections.

Type Default Details
title str Title of the section (Without the leading sharps '#')
leave_blank_line bool True If True, leaves a blank line at the end of the section.
clear_subsections Optional[str] None 'clear', 'delete', or None. If 'clear', then just clears the contents of subsections, but does not affect the headers. If 'delete', then clears the contents of the subsections and deletes the headers. If None, then does not affect either.
Returns None
mf = MarkdownFile.from_string(text_1)
mf.clear_section('Section 1', leave_blank_line=True)
headings_and_text = mf.get_headings_and_text()
assert headings_and_text['# Section 1'] == ''
assert mf.get_line_number_of_heading('Subsection a') == 3
assert mf.parts[4]['line'] == 'Didididi'
print(mf)

# Section 1

## Subsection a
Didididi
Dododododo
# Section 2

Setting leave_blank_line=False leaves no blank line between the section and the next:

mf = MarkdownFile.from_string(text_1)
mf.clear_section('Section 1', leave_blank_line=False)
headings_and_text = mf.get_headings_and_text()
assert headings_and_text['# Section 1'] == ''
assert mf.get_line_number_of_heading('Subsection a') == 2
assert mf.parts[3]['line'] == 'Didididi'
print(mf)

# Section 1
## Subsection a
Didididi
Dododododo
# Section 2

source

MarkdownFile.clear_all_sections

 MarkdownFile.clear_all_sections (leave_blank_lines:bool=True)

Clear all sections.

Does not clear frontmatter metadata. Leaves all headers intact.

Type Default Details
leave_blank_lines bool True
Returns None If True, leaves a blank line in each section
mf = MarkdownFile.from_string(text_1)
mf.clear_all_sections(leave_blank_lines=True)
assert len(mf.parts) == 3
print(mf)
# Section 1
## Subsection a
# Section 2

Metadata in a MarkdownFile object

Here are some things that we can do with a MarkdownFile object with frontmatter YAML metadata:

template_mf = MarkdownFile.from_string(template_text)
print (template_mf.metadata(), '\n')
assert template_mf.metadata() == {'cssclass': 'clean-embeds', 'aliases': [], 'tags': ['_meta/literature_note']}
assert template_mf.has_metadata()
assert template_mf.metadata_lines() == (0, 4)


new_metadata = {'aliases': ['an_awesome_note', 'no_more_cssclass', 'no_more_tags']}
template_mf.replace_metadata(new_metadata)
print('The following is the MarkdownFile with new frontmatter YAML metadata:\n')
print(template_mf, '\n')
assert str(template_mf) == """---
aliases: [an_awesome_note, no_more_cssclass, no_more_tags]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation"""


template_mf.remove_metadata()
print('The following is the MarkdownFile with frontmatter YAML metadata removed:\n')
print(template_mf)
assert str(template_mf) == """# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation"""
{'cssclass': 'clean-embeds', 'aliases': [], 'tags': ['_meta/literature_note']} 

The following is the MarkdownFile with new frontmatter YAML metadata:

---
aliases: [an_awesome_note, no_more_cssclass, no_more_tags]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation 

The following is the MarkdownFile with frontmatter YAML metadata removed:

# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation

Note that if the MarkdownFile does not have any YAML frontmatter metadata, then the metadata method returns None:

mf_1 = MarkdownFile.from_string(text_1)
assert mf_1.metadata() is None

If the MarkdownFile’s YAML frontmatter metadata has formatting issues then metadata raises a ValueError. In actuality, the error message also yields the appropraite yaml.YAMLError in the PyYAML library, e.g. yaml.parser.ParserError, yaml.scanner.ScannerError, or yaml.reader.ReaderError.

text_with_bad_yaml = "---\nsome_metadata_field: [\badly_formatted_string]\n---\nThe rest of the note contents..."
mf = MarkdownFile.from_string(text_with_bad_yaml)
with ExceptionExpected(ValueError): # By virtue of giving a ReaderError
    mf.metadata()


text_with_bad_yaml = "---\nfield: field2: \n---\nThe rest of the note contents..."
mf = MarkdownFile.from_string(text_with_bad_yaml)
with ExceptionExpected(ValueError): # By virtue of giving a ScannerError
    mf.metadata()


text_with_bad_yaml = "---\nfield: John\n- field2: Mary \n---\nThe rest of the note contents..."
mf = MarkdownFile.from_string(text_with_bad_yaml)
with ExceptionExpected(ValueError): # By virtue of giving a ParserError
    mf.metadata()

source

MarkdownFile.has_metadata

 MarkdownFile.has_metadata ()

Return True if this MarkdownFile object has fronmatter YAML metadata.

If the MarkdownFile object has any frontmatter YAML metadata, then it is expected to be at the very start; in particular, it must not be preceded by any whitespace characters.


source

MarkdownFile.metadata_lines

 MarkdownFile.metadata_lines ()

Return the indices in self.parts which are metadata.

Assumes that self.parts is nonempty.

If the MarkdownFile object has any frontmatter YAML metadata, then it is expected to be at the very start; in particular, it must not be preceded by any whitespace characters.

Returns

  • tuple
    • The tuple consists of 2 ints, a and b, where self.parts[a:b+1] represent the metadata lines, including the '---' before and after.

source

MarkdownFile.replace_metadata

 MarkdownFile.replace_metadata (new_metadata:dict[str],
                                enquote_entries_in_fields:list[str]=[])

Replace the frontmatter metadata of this MarkdownFile object.

Optionally also enquotes string entries in fields specified by enquote_entries_in_fields.

Warning - This method is only tested when the values of new_metadata are either str or list[str].

Type Default Details
new_metadata dict[str] The dictionary representing the new metadata. The keys are the names of fields. The values are the field values, usually expected to be a single string or a list of strings
enquote_entries_in_fields list[str] [] A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of new_metadata, then that string is essentially ignored (in particular, no errors are raised).
Returns None
# TODO: basic example

In the case that the metadata comprises of strings which need to be escaped (because they have backslashes), then the enquote_entries_in_fields parameter of the MarkdownFile.replace_metadata method can be specified to enquote and escape such strings, cf. dict_to_metadata_lines.

In particular, the MarkdownFile.replace_metadata method with the MarkdownFile object’s own .metadata() passed in the following example should ideally not modify the string of the MarkdownFile object. This feature needs to be tested with more examples, however.

mf = MarkdownFile.from_string(
    r'''---
latex_in_original: ["\\mathscr{O}_{\\text {Proj } S_{*}}(n)"]
---

''')

original_metadata = mf.metadata()
original_str = str(mf).strip()

dict_to_metadata_lines(mf.metadata(), enquote_entries_in_fields=['latex_in_original'])
mf.replace_metadata(mf.metadata(), enquote_entries_in_fields=['latex_in_original'])

test_eq(mf.metadata(), original_metadata)
test_eq(str(mf).strip(), original_str)

Multiple methods in the MarkdownFile class, including MarkdownFile.add_tags, MarkdownFile.remove_tags, and MarkdownFile.replace_auto_tags_with_regular_tags depend on the MarkdownFile.replace_metadata method. Arguments for the enquote_entries_in_metadata_fields must be specified appropriately when using these methods.


source

MarkdownFile.remove_metadata

 MarkdownFile.remove_metadata ()

Remove the frontmatter metadata of this MarkdownFile object.


source

MarkdownFile.add_metadata_section

 MarkdownFile.add_metadata_section (check_exists:bool=True)

Add a frontmatter YAML metadata at the very beginning.

Type Default Details
check_exists bool True If True, Check if there is already a metadata section at the beginning and do not add a metadata section if it exists.
Returns None

If the MarkdownFile has no frontmatter YAML metadata, then we can use the add_metadata_section method to add blank frontmatter YAML metadata:

mf = MarkdownFile.from_string(text_1)
assert not mf.has_metadata()
mf.add_metadata_section()
print(mf)
---

---


# Section 1
some text 

asdfasdf

## Subsection a
Didididi
Dododododo
# Section 2

If the MarkdownFile object already has frontmatter YAML metadata, then the add_metadata_section method does nothing.

template_mf = MarkdownFile.from_string(template_text)
mf.add_metadata_section()
assert str(template_mf) == template_text

Tags in MarkdownFile objects

YAML metadata tags

In Obsidian, one can add tags to notes both within text and in the frontmatter YAML metadata.


source

MarkdownFile.has_tag

 MarkdownFile.has_tag (tag:str)

Return True if the Markdown file has the specified tag in its YAML frontmatter metadata.

More specifically, return True if the MarkdownFile objeect

  1. has YAML frontmatter metadata,
  2. the metadata has a 'tags' section,, and
  3. the 'tags' section is a list with the specified tag.

Note that tag should not start with the hashtag # charater.

Type Details
tag str The tag. Does not start with the hashtag '#'.
Returns bool

source

MarkdownFile.add_tags

 MarkdownFile.add_tags (tags:list[str], skip_repeats:bool=True,
                        skip_repeated_auto:bool=True,
                        enquote_entries_in_metadata_fields:list[str]=[])

Add tags to the frontmatter metadata.

The order of the tags may be changed.

Ultimately the replace_metadata method is used to modify the YAML metadata. Use the enquote_entries_in_metadata_fields parameter to ensure that the replace_metadata invocation preserves enquoted metadata values.

Type Default Details
tags list[str] The str representing the tags. May or may not start with '#', e.g. '#_meta/definition' or '_meta/definition'.
skip_repeats bool True If True, then this MarkdownFile will just have unique tags; merges pre-existing repeated tags if necessary. Also, the order of the tags may be changed.
skip_repeated_auto bool True If True, then only add tags starting with ’_auto/’ if the corresponding non-auto tag does not exist, e.g. ’_auto/_meta/definition’ is not added if the note already has ’_meta/definition’.
enquote_entries_in_metadata_fields list[str] [] A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of new_metadata, then that string is essentially ignored (in particular, no errors are raised).
Returns None

source

MarkdownFile.remove_tags

 MarkdownFile.remove_tags (tags:list[str],
                           enquote_entries_in_metadata_fields:list[str]=[]
                           )

Remove specified tags from the frontmatter metadata, if the frontmatter metadata and the specified tags.

If the MarkdownFile object does not have a frontmatter or if the frontmatter does not include a tags line, then the MarkdownFile object is not modified.

Assumes that this MarkdownFile object has a frontmatter and that the frontmatter includes a tags line.

Any repeated tags are either merged into one (if the tag is not in tags) or are removed (if the tag is in tags).

Ultimately the replace_metadata method is used to modify the YAML metadata. Use the enquote_entries_in_metadata_fields parameter to ensure that the replace_metadata invocation preserves enquoted metadata values.

Type Default Details
tags list[str] The str representing the tags. May or may not start with '#', e.g. '#_meta/definition' or '_meta/definition'.
enquote_entries_in_metadata_fields list[str] [] A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of new_metadata, then that string is essentially ignored (in particular, no errors are raised).
Returns None
mf = MarkdownFile.from_string(text_2)
assert mf.has_tag('_meta/definition')

mf.add_tags(['new_tag'])  #This may change the order of the tags.
assert mf.has_tag('new_tag')

assert mf.has_tag('this_tag_will_be_removed')
mf.remove_tags(['this_tag_will_be_removed'])
assert not mf.has_tag('this_tag_will_be_removed')

assert not mf.has_tag('no_tag')
mf.remove_tags(['no_tag']) # Does nothing
assert not mf.has_tag('no_tag')

print(mf)
---
tags: [_meta/concept, _auto/_meta/notation, new_tag, _meta/definition]
---
# Topic
This is some note with some stuff.

If the MarkdownFile object does not have frontmatter YAML metadata or if its frontmatter YAML metadata does not have a tags section, then no changes are made to the MarkdownFile object:

# Example of a MarkdownFile object without frontmatter YAML metadata:
mf = MarkdownFile.from_string(text_1)

mf_str_before = str(mf)

assert mf.metadata() is None
mf.remove_tags(['_meta/definition'])
assert mf.metadata() is None

test_eq(str(mf), mf_str_before)

# Example of a MarkdownFile object with frontmatter YAML metadata but without a tags section.
mf = MarkdownFile.from_string(text_8)

mf_str_before = str(mf)

assert mf.metadata() is not None
assert 'tag' not in mf.metadata() 
mf.remove_tags(['_meta/definition'])

test_eq(str(mf), mf_str_before)

The add_tags method has a skip_repeats parameter.

mf.add_tags(['new_tag'], skip_repeats=True)  # Only one `new_tag` will be present after this.
print(mf)
mf.add_tags(['new_tag'], skip_repeats=False)  # After this, mf will have 2 `new_tag`'s  # After this, mf will have 2 `new_tag`'s  # After this, mf will have 2 `new_tag`'s
print(mf)
mf.add_tags(['new_tag'], skip_repeats=True)  # Only one `new_tag` will be present after this, even though there were multiple `new_tag`'s before this.
print(mf)
---
cssclass: clean-embeds
tags: [new_tag]
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

%%This is a comment. 
The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.
---
cssclass: clean-embeds
tags: [new_tag, new_tag]
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

%%This is a comment. 
The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.
---
cssclass: clean-embeds
tags: [new_tag]
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

%%This is a comment. 
The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.

source

MarkdownFile.replace_auto_tags_with_regular_tags

 MarkdownFile.replace_auto_tags_with_regular_tags (exclude:list[str]=None,
                                                   enquote_entries_in_meta
                                                   data_fields:list[str]=[
                                                   ])

Replace tags in the frontmatter metadata starting with _auto/ with tags without the _auto/.

Ultimately the replace_metadata method is used to modify the YAML metadata. Use the enquote_entries_in_metadata_fields parameter to ensure that the replace_metadata invocation preserves enquoted metadata values.

Type Default Details
exclude list[str] None The tags whose _auto/ tags should not be converted. The str should not start with '#' and should not start with '_auto/'.
enquote_entries_in_metadata_fields list[str] [] A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of new_metadata, then that string is essentially ignored (in particular, no errors are raised).
Returns None

One can use / characters to create “subtags”. trouver recognizes _auto tags, which are tags that trouver adds to mark notes that it has processed in certain ways.

For example, trove.markdown.osidian.personal.machine_learning.tag_categorization can label the “types” of the content of information notes. For example, if this labelling process determines an information note to be a definition note, then it will add a _auto/_meta/definition tag, as opposed to a _meta/definition tag. This way, one can recognize that the tag was added automatically via a machine learning model and may be prone to error.

The replace_auto_tags_with_regular_tags replaces _auto tags with regular tags. This can be useful, for example, once tags for a multitude of notes have been verified.

mf = MarkdownFile.from_string(text_2)
assert mf.has_tag('_auto/_meta/notation')
mf.replace_auto_tags_with_regular_tags()
assert mf.has_tag('_meta/notation')
assert not mf.has_tag("_auto/_meta/notation")
print(mf)
---
tags: [_meta/definition, _meta/concept, _meta/notation, this_tag_will_be_removed]
---
# Topic
This is some note with some stuff.

display math mode tags

The difference between headers and display math mode tags is that the former has a space between the hashtags and the text and the latter does not. For example, # Header is a header and #tag is a tag.


source

MarkdownFile.remove_in_line_tags

 MarkdownFile.remove_in_line_tags ()

Remove lines starting with in line tags.

mf = MarkdownFile.from_string(text_3)
mf.remove_in_line_tags()
print(str(mf))
print(mf.parts)
assert len(mf.parts) == 7
assert str(mf) == """
# Section 1
Some stuff


# Hello
"""

# Section 1
Some stuff


# Hello

[{'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}, {'line': '# Section 1', 'type': <MarkdownLineEnum.HEADING: 1>}, {'line': 'Some stuff', 'type': <MarkdownLineEnum.DEFAULT: 0>}, {'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}, {'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}, {'line': '# Hello', 'type': <MarkdownLineEnum.HEADING: 1>}, {'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}]
template_mf = MarkdownFile.from_string(template_text)
template_mf.remove_in_line_tags()
assert str(template_mf) == template_text

Extract raw content from a MarkownFile object

One can add a multitude of meta-data to Obsidian Markdown notes - frontmatter metadata, headers/footers, links, embedded links, tags, etc.

We can extract raw content from notes by removing a combination of these meta-data.

See also remove_in_line_tags.


source

MarkdownFile.remove_headers

 MarkdownFile.remove_headers ()

Remove all headers.

We can remove all of the headers and leave the rest of the text intact

mf = MarkdownFile.from_string(template_text)
mf.remove_headers()
assert str(mf) == """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---



[^1]: Citation"""

mf = MarkdownFile.from_string(text_1)
mf.remove_headers()
assert str(mf) == """
some text 

asdfasdf

Didididi
Dododododo"""

source

MarkdownFile.remove_double_blank_lines

 MarkdownFile.remove_double_blank_lines ()

Remove blank lines so that there are no consecutive blank lines

When removing some of the “metadata”, the content of the note can be left with a lot of consecutive blank lines. To ensure that machine-learning models will not develop some kind of blank line bias, we can remove such consecutive blank lines.

mf = MarkdownFile.from_string(template_text)
mf.remove_headers()
mf.remove_double_blank_lines()
assert str(mf) == """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---

[^1]: Citation"""

source

MarkdownFile.merge_display_math_mode

 MarkdownFile.merge_display_math_mode ()

Merge chunks of display_math_mode latex lines into single lines

mf = MarkdownFile.from_string(text_10)
mf.merge_display_math_mode()
print(mf)
assert len(mf.parts) == 13
This is a single line display math mode LaTeX equation:

$$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:

$$ 5 + 2 = 7 $$

These are multiple consecutive display math mode LaTeX equations:

$$1+1 = 2 $$
$$5 + 7 = 14$$
$$  8 + 4 = 12  $$

source

MarkdownFile.merge_display_math_mode_into_preceding_text

 MarkdownFile.merge_display_math_mode_into_preceding_text
                                                           (separator:str=
                                                           '\n')

Merge chunks of display math mode latex lines into single lines and merge those single lines into preceding text lines.

Type Default Details
separator str
The str with which to join the latex lines into the text lines. Note that the display math mode latex lines are not joined with this str.
Returns None
mf = MarkdownFile.from_string(text_10)
mf.merge_display_math_mode_into_preceding_text(separator=' ')
print(mf)
assert len(mf.parts) == 5
This is a single line display math mode LaTeX equation:  $$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:  $$ 5 + 2 = 7 $$

These are multiple consecutive display math mode LaTeX equations:  $$1+1 = 2 $$ $$5 + 7 = 14$$ $$  8 + 4 = 12  $$

We can set separator to its default value \n.

mf = MarkdownFile.from_string(text_10)
mf.merge_display_math_mode_into_preceding_text(separator='\n')
print(mf)
assert len(mf.parts) == 5  # Some of the parts have 'line' as multi-line str i.e. as str with `\n` characters.
assert '\n' in mf.parts[0]['line']
assert '\n' not in mf.parts[1]['line']
This is a single line display math mode LaTeX equation:

$$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:

$$ 5 + 2 = 7 $$

These are multiple consecutive display math mode LaTeX equations:

$$1+1 = 2 $$
$$5 + 7 = 14$$
$$  8 + 4 = 12  $$

If the text starts with display math mode LaTeX, then that text is combined into one.

mf = MarkdownFile.from_string(text_11)
mf.merge_display_math_mode_into_preceding_text(separator=' ')
print(mf)
assert len(mf.parts) == 2
$$asdf$$ $$asdf$$ $$asdf$$
After text.

Writing a MarkdownFile object to a file

We can write the contents of a MarkdownFile object to the file represented by VaultNote object.


source

MarkdownFile.parts_of_id

 MarkdownFile.parts_of_id (par_id:str)

Return the indices of the lines within the Markdown file belonging to the specified text id.

This id can be used as an anchor for a link in Obsidian. For example, [[note#^65809f]] is a link to a note named note to the text with id 65809f. Such a text is marked with a trailing ^65809f.

Type Details
par_id str Must begin with '\^'.
Returns Union[tuple[int], None] (start,end) where self.parts[start:end] consists of the lines of the specified id. If the specified id does not exist for the note, then None is returned.

Links in Obsidian can be anchored at “paragraphs” of text. As Wikilinks, such links have the format [[<note_name>#^<id_of_paragraph>]]. Note that the id begins with a carat ^.

We can get the parts of the Markdown file to which the id refers to:

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault_2(temp_dir)
    vault = Path(temp_dir)
    vn = VaultNote(vault, name = 'note_with_paragraphs_that_are_embedded_1')
    mf = MarkdownFile.from_vault_note(vn)

    start, end = mf.parts_of_id('^65809f')
    assert start == 3 and end == 8
    print(mf.text_of_lines(start, end), '\n')
    # lines = [mf.parts[i]['line'] for i in range(start, end)]
    # print('\n'.join(lines))

    start, end = mf.parts_of_id('^221b51')
    assert start == 13 and end == 16
    print(mf.text_of_lines(start, end), '\n')

    start, end = mf.parts_of_id('^123456')
    assert start == 17 and end == 18
    print(mf.text_of_lines(start, end), '\n')

    start, end = mf.parts_of_id('^fff123')
    assert start == 20 and end == 21
    print(mf.text_of_lines(start, end), '\n')

    start, end = mf.parts_of_id('^latexthing')
    assert start == 22 and end == 23
    print(mf.text_of_lines(start, end), '\n')
cheese 
bandit
$$asdf$$
asdf 
^65809f 

$$5 \neq 7
$$ 
^221b51 

# This section has an id ^123456 

^fff123 

$$\mathcal{O}_X$$  ^latexthing 

Hidden tests

# hide 
# TODO _look_at_start_of_file, _line_dict

Misc TODO


source

MarkdownFile.text_of_lines

 MarkdownFile.text_of_lines (start:int, end:int)

Return the text of self.parts[start:end].


source

MarkdownFile.write

 MarkdownFile.write (vn:trouver.markdown.obsidian.vault.VaultNote,
                     mode:str='w')

Write to the file specified by a VaultNote object.

If the file that the VaultNote object represents does not exist, then this method creates it.

Type Default Details
vn VaultNote Represents the file.
mode str w The specific mode to write the file with.
Returns None enquote_entries_in_metadata_fields: list[str] = [] # A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of new_metadata, then that string is essentially ignored (in particular, no errors are raised).

source

MarkdownFile.copy

 MarkdownFile.copy (deep:bool)