markdown.markdown.file

Functions for parsing markdown files

from collections import OrderedDict
import os
from pathlib import Path 
import tempfile
import shutil

from fastcore.test import *
from nbdev.showdoc import show_doc

Frontmatter meta

/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/fastcore/docscrape.py:230: UserWarning: potentially wrong underline length... 
```markdown 
--- in 
Return ranges in the markdown text string where front matter meta occurs.
...
  else: warn(msg)

source

find_front_matter_meta_in_markdown_text

 find_front_matter_meta_in_markdown_text (text:str)

*Return ranges in the markdown text string where front matter meta occurs.

text is assumed to start with the front matter meta. The front matter meta is of the following YAML format.

---
key: entry
---
```*

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| text | str |  |
| **Returns** | **Optional** | **Each tuple is of the form `(a,b)` where `text[a:b]` is a markdown front matter. This list will be of length at most 1 because the front matter must come at the start of the markdown document.** |


The frontmatter meta in an Obsidian Markdown note is surrounded by `---` and must be at the top. We can identify the frontmatter meta in some text:

::: {#cell-6 .cell}
``` {.python .cell-code}
sample_text = r'''---
cssclass: clean-embeds
aliases: []
tags: [_meta/TODO/change_title, _meta/definition, _meta/literature_note]
---
# Topic[^1]

This is some text. Lalalalala

# See Also

# Meta
## References
![[_reference_foag]]

## Citations and Footnotes
[^1]: Some citation'''

start, end = find_front_matter_meta_in_markdown_text(sample_text)
just_frontmatter = sample_text[start:end]
print(just_frontmatter)
assert just_frontmatter.startswith('---')
assert just_frontmatter.endswith('---')
assert 'cssclass:' in just_frontmatter
assert 'aliases:' in just_frontmatter
assert 'tags:' in just_frontmatter

---
cssclass: clean-embeds
aliases: []
tags: [_meta/TODO/change_title, _meta/definition, _meta/literature_note]
---

:::

If there is no frontmatter meta, then find_front_matter_meta_in_markdown_text returns None.

sample_text = r'''
There is no frontmatter meta here.
'''

assert find_front_matter_meta_in_markdown_text(sample_text) is None


sample_text = r'''
---
aliases: []
--
Notice that the front matter meta is incorrectly formatted!
'''

assert find_front_matter_meta_in_markdown_text(sample_text) is None

# hide 
# Some additional tests
sample_text_1 = ''
assert find_front_matter_meta_in_markdown_text(sample_text_1) is None
sample_text_2 = "hello I know a song that gets on everybody\'s nose"
assert find_front_matter_meta_in_markdown_text(sample_text_2) is None
sample_text_3 = '---\n---'
start, end = find_front_matter_meta_in_markdown_text(sample_text_3)
assert sample_text_3[start:end] == sample_text_3
sample_text_4 = '---\n---\n---'
start, end = find_front_matter_meta_in_markdown_text(sample_text_4)
assert sample_text_4[start:end] == '---\n---'

source

dict_to_metadata_lines

 dict_to_metadata_lines (data:dict[str,typing.Union[str,list[str]]],
                         enquote_entries_in_fields=list[str])

*Convert a dict to a list of str of yaml frontmatter metadata that Obsidian recognizes.

This function is used in MarkdownFile.replace_metadata.*

	Type	Default	Details
data	dict		The keys are str of the labels/names of the metadata. The values are the metadata, which are usually str or list.
enquote_entries_in_fields	GenericAlias	list	A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of `new_metadata`, then that string is essentially ignored (in particular, no errors are raised).
Returns	list		Each str entry is the line for the yaml frontmatter metadata of an Obsidian Markdown note.

dict_to_metadata_lines takes a dictionary and converts it to a string usable as Obsidian Markdown frontmatter meta.

sample_dict = OrderedDict([
            ('cssclass', 'clean-embeds'),
            ('aliases', []),
            ('tags', ['_meta/literature_note', '_meta/research', '_meta/self_written'])])
sample_output = dict_to_metadata_lines(sample_dict, [])
test_eq(sample_output,
    ['cssclass: clean-embeds', 'aliases: []', 'tags: [_meta/literature_note, _meta/research, _meta/self_written]'])

# sample_lines = dict_to_metadata_lines(sample_dict)
# sample_output = 
# print('\n'.join(sample_lines))
# assert sample_lines == sample_output

We can specify entries of fields to be enquoted. This might be necessary if a string needs to be escaped (because it has a character such as the backslash r'\', the mid slash r'|', a square bracket r'[', etc.). For example, if a field is expected to have LaTeX strings, then it is good practice to specify such strings to be enquoted:

sample_dict = OrderedDict([
    ('latex_in_original', [r'\\mathscr{O}_{\\text {Proj } S_{*}}(n)'])
])
sample_output = dict_to_metadata_lines(sample_dict, enquote_entries_in_fields=['latex_in_original'])
test_eq(sample_output, ['latex_in_original: ["\\\\\\\\mathscr{O}_{\\\\\\\\text {Proj } S_{*}}(n)"]'])

# An example of an entry that is not a string
sample_dict = OrderedDict([
    ('latex_in_original', [1])
])
sample_output = dict_to_metadata_lines(sample_dict, enquote_entries_in_fields=['latex_in_original'])
test_eq(sample_output, ['latex_in_original: ["1"]'])

source

parse_metadata_string

 parse_metadata_string (metadata_str:str, raise_error:bool=True,
                        raise_warning:bool=True)

*Attempt to parse the string for YAML frontmatter metadata of an Obsidian Markdown note.

Raises

ValueError
- If raise_error is True and if any yaml.YAMLError exceptions are raised when reading (i.e. parsing or scanning the YAML metadata. In doing so, metadata_str is printed. Moreover, the appropriate yaml.YAMLError (e.g. a yaml.parser.ParserError, yaml.scanner.ScannerError, or yaml.reader.ReaderError) is also raised.
Warning
- If raise_error is False and raise_warning is True and if any yaml.YAMLError exceptions are raise when reading.*

	Type	Default	Details
metadata_str	str		The string for YAML frontmatter metadata of an Obsidian Markdown note
raise_error	bool	True	If `True`, then raise an Error.
raise_warning	bool	True	If `raise_error` is false and `raise_warning` is `True`, then raise a warning message.
Returns	Optional		The keys are `str` of the labels/names of the metadata. The values are the metadata, which are usually `str` or `list`. If the YAML metadata string cannot be parsed, then this return value is `None`.

The parse_metadata_string function attempts to parse the string for YAML frontmatter metadata of an Obsidian Markdown note. If the argument can be parsed via the yaml.safe_load function, then the output of the yaml.safe_load invocation is returned:

good_metadata_str = r"""
field: [hi, bye]
fieldy: ooh
"""

metadata_output = parse_metadata_string(good_metadata_str)
assert type(metadata_output) == dict
test_eq(metadata_output, {'field': ['hi', 'bye'], 'fieldy': 'ooh'})

If the YAML frontmatter metadata string cannot be parsed, by the yaml.safe_load function, then an error or a warning can be optionally raised. If raise_error is False, then parse_metadata_string returns None.

bad_metadata_str = "some_metadata_field: [\badly_formatted_string]"  # ReaderError
with (ExceptionExpected(ValueError)):
    metadata_output = parse_metadata_string(bad_metadata_str, raise_error=True)
# bad_metadata_str = "field: field2:" # ScannerError

test_warns(lambda: parse_metadata_string(bad_metadata_str, raise_error=False, raise_warning=True))

metadata_output = parse_metadata_string(bad_metadata_str, raise_error=False, raise_warning=False)
test_eq(metadata_output, None)

yaml_string = 'latex_from_original: ["[t]", "hi"]'
data = yaml.safe_load(yaml_string)
yaml.dump(data, default_flow_style=True)

"{latex_from_original: ['[t]', hi]}\n"

yaml.dump(r'\begin{align*}  \end{align*}')

'\\begin{align*}  \\end{align*}\n...\n'

yaml.dump('[t]')

"'[t]'\n"

Replace embedded links with text

source

replace_embedded_links_with_text

 replace_embedded_links_with_text (text:str, vault:os.PathLike)

*Return the text with all embedded links replaced with the text of the corresponding notes

Assumes that the notes of the links exist in the vault and have unique names in the vault. Note that embedded links are not always to notes (e.g. they can point to images), or even to existing notes. In such cases, the embedded link will be replaced with blank text.*

MarkdownFile class

source

MarkdownLineEnum

 MarkdownLineEnum (value, names=None, module=None, qualname=None,
                   type=None, start=1)

An enumeration.

The following are the members of the MarkdownLineEnum class:

for line_type in MarkdownLineEnum:
    print(line_type.name)

DEFAULT
HEADING
CODE_BLOCK
META
ORDERED_LIST
UNORDERED_LIST
BLOCKQUOTE
HORIZONTAL_RULE
COMMENT
BLANK_LINE
UNKNOWN
FOOTNOTE_DESCRIPTION
DISPLAY_LATEX_SINGLE
DISPLAY_LATEX_START
DISPLAY_LATEX_END
DISPLAY_LATEX

source

MarkdownFile

 MarkdownFile
               (parts:list[dict[str,typing.Union[__main__.MarkdownLineEnum
               ,str]]])

*Parses and represents the contents of an Obsidian styled Markdown file.

The Markdown file must be formatted in certain ways. In general, text components of different types (see MarkdownLineEnum) must be on different lines - no text components of different types may occupy the same line. In particular,

Comments (surrounded by %%) must not be on the same line as non-comments.
Display math mode LaTeX (surrounded by $$) must not be on the same line as non-In line LaTeX.

Attributes

text - str
parts - list[dict[str, Union[MarkdownLineEnum, str]]]
- Represents the lines of the markdown file. Each dict has two keys, 'type' and 'line', which respectively hold a MarkdownLineEnum and a str as values. While a value of 'line' may have new line characters '\n', it does not need to — more specifically, the __str__ method (and the text_of_lines method) adds new line characters '\n' in between lines.*

Example text and example vault used in the rest of the page

The examples demonstrated for the MarkdownFile class here will be based upon the following text:

template_text = """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation"""


text_1 = """
# Section 1
some text 

asdfasdf

## Subsection a
Didididi
Dododododo
# Section 2
"""

text_2 = """---
tags: [_meta/definition, _meta/concept, _auto/_meta/notation, this_tag_will_be_removed]
---
# Topic
This is some note with some stuff.
"""

text_3 = """
# Section 1
Some stuff

#_meta/question Will `remove_in_line_tags` remove this line? Yes it will!

# Hello

 #tag Will `remove_in_line_tags` remove this line? Yes it will!
"""

text_4 = """
# Some thing

I have a [[this is a note#this is an anchor in the note|link]]

## Another topic
This is a link without a specified display text: [[some_kind_of_note]].

This is a link to an anchor without a specified display text: [[another_note#another anchor]].
"""
    
text_5 = """# A header
![[This note is embedded]].

The link above should will not be replaced by `replace_links_with_display_text`,
unless `remove_embedded_note_links` is set to `True`."""
    
text_6 = """
# Header

I want to link to some embedded note[^1]

[^1]: ![[link_to_embedded_note_1]]

You can also let the footnote mention be alphanumeric[^1][^note]

[^note]: ![[link_to_embedded_note_2]]
"""
    
text_7 = r"""---
cssclass: clean-embeds
aliases: []
tags: [_meta/concept, _meta/literature_note]
---
# Grothendieck-Witt ring elements of a finite field are given by rank and discriminant up to squares[^1]

$\operatorname{GW}(\mathbb{F}_q)\cong \mathbb{Z}\times \mathbb{F}_q^{\times}/(\mathbb{F}_q^{\times})^2$[^2] where the isomorphism is given by the rank and discriminant.


[^2]: ![[notation_GW_k_Grothendieck_Witt_ring_of_a_field]]


# See Also
- [[grothendieck_witt_ring_of_a_polynomial_ring_over_a_field_is_isomorphic_to_that_of_the_field]]

# Meta
## References
![[_reference_pauli_wickelgren_aa1]]


## Citations and Footnotes
[^1]: Pauli, Wickelgren, Example 3.7, Page 4"""


text_8 = r"""---
cssclass: clean-embeds
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.
"""

text_9 = r"""
This LaTeX Equation has `**` surrounding it:
**$$\mathcal{O}_X$$**

This LaTeX Equation has an id:
$$5 \neq 7$$ ^221b51

This LaTeX Equation also has an id:
$$5 \neq 7
$$ ^221b51
This is the end
"""

text_10 = r"""This is a single line display math mode LaTeX equation:

$$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:

$$
5 + 2 = 7
$$

These are multiple consecutive display math mode LaTeX equations:

$$1+1 = 2
$$
$$5 + 7 = 14$$
$$

8 + 4 = 12

$$
"""

text_11 = r"""$$asdf$$
$$asdf$$
$$asdf$$
After text."""

text_12 = r"""

"""

We also use the multiple example vaults.

The following vault will be used mainly for some basic file interactions of the MarkdownFile class.

.
└── algebraic_geometry
    ├── a1_homotopy_theory
    │   ├── pauli_wickelgren_aa1
    │   │   ├── 3_the_grothendieck_witt_ring_of_k
    │   │   │   └── pauli_wickelgren_aa1_example 3.7.md
    │   │   └── _index_pauli_wickelgren_aa1.md
    |   └── _index_a1_homotopy_theory.md
    └── _index_algebraic_geometry.md

pauli_wickelgren_aa1_example 3.7.md will contain the contents of text_7.

def make_example_vault(temp_dir: PathLike):
    temp_dir = Path(temp_dir)
    os.mkdir(temp_dir / 'algebraic_geometry')
    os.mkdir(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory')
    os.mkdir(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1')
    os.mkdir(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k')

    (temp_dir / 'algebraic_geometry' / '_index_algebraic_geometry.md').touch()
    (temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / '_index_a1_homotopy_theory.md').touch()
    (temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '_index_pauli_wickelgren_aa1.md').touch()
    (temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k' / 'pauli_wickelgren_aa1_example 3.7.md').touch()

    with open((temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k' / 'pauli_wickelgren_aa1_example 3.7.md'), 'w') as writer:
        writer.write(
            text_7
        )

# with tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir:
#     make_example_vault(temp_dir)
#     os.startfile(os.getcwd())
#     input()

We use the following example vault to demonstrate examples of embedded notes and other functionalities:

.
├── note_which_becomes_entirely_embedded_1.md
├── note_with_embedded_links_1.md
└── note_with_paragraphs_that_are_embedded_1.md

def make_example_vault_2(temp_dir: PathLike):
    temp_dir = Path(temp_dir)

    file_1 = temp_dir / 'note_which_becomes_entirely_embedded_1.md'
    file_2 = temp_dir / 'note_with_embedded_links_1.md'
    file_3 = temp_dir / 'note_with_paragraphs_that_are_embedded_1.md'

    file_1.touch()
    file_2.touch()
    file_3.touch()

    with open(file_1, 'w') as writer:
        writer.write(
            r"""Hello, this is a note which becomes entirely embedded.

The comment is not visible.
This is the end of the comment %%"""
        )
        
    with open(file_2, 'w') as writer:
        writer.write(
            r"""This is a note.

There are some embedded text here:

![[note_which_becomes_entirely_embedded_1]]


![[note_with_paragraphs_that_are_embedded_1#^65809f]]

![[note_with_paragraphs_that_are_embedded_1#^221b51]]

![[note_with_paragraphs_that_are_embedded_1#Section]]"""
        )

    
    with open(file_3, 'w') as writer:
        writer.write(
            r"""This paragraph becomes embedded.

# Thank you for watching 
cheese 
bandit
$$asdf$$
asdf 
^65809f



This paragraph has not id.

$$5 \neq 7
$$ 
^221b51

# This section has an id ^123456

# This section has no id 
^fff123

$$\mathcal{O}_X$$  ^latexthing

# Section
Some kind of section?

Lalalala
## Subsection
argonaut
# Section
Maybe?
""")

Constructing a `MarkdownFile` object

source

MarkdownFile.from_vault_note

 MarkdownFile.from_vault_note
                               (vn:trouver.markdown.obsidian.vault.VaultNo
                               te)

*Return a MarkddownFile object from a VaultNote object.

Raises - FileNotFoundError - If vn represents a note file which does not exist.*

source

MarkdownFile.from_file

 MarkdownFile.from_file (file_path:os.PathLike)

*Return a MarkdownFile object from a specified file.

Raises - FileNotFoundError - If file_path points to a file which does not exist.*

source

MarkdownFile.from_list

 MarkdownFile.from_list (list_of_lines:list[str])

*Return a MarkdownFile object from a list of lines.

This may not work correctly if the markdown text is not sufficiently well-formatted. These formattings include: - comments must start the line with '%%'. - comments must end with '%%' followed by whitespaces and nothing else. - indents should be done with tabs?*

source

MarkdownFile.from_string

 MarkdownFile.from_string (text:str)

Return a MarkdownFile object from a str.

The most convenient way to construct a MarkdownFile object is by the MarkdownFile.from_vault_note factory method.

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault(temp_dir)

    vault_note = VaultNote(temp_dir, name='pauli_wickelgren_aa1_example 3.7')
    mf = MarkdownFile.from_vault_note(vault_note)
    test_eq(str(mf), text_7)

    vault_note = VaultNote(temp_dir, rel_path='does_not_exist.md')
    assert not vault_note.exists()
    with ExceptionExpected(ex=FileNotFoundError):
        mf = MarkdownFile.from_vault_note(vault_note)

We can similarly construct a MarkdownFile by the MarkdownFile.from_file factory method without having to use a VaultNote object.

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault(temp_dir)

    temp_dir = Path(temp_dir)
    mf = MarkdownFile.from_file(temp_dir / 'algebraic_geometry' / 'a1_homotopy_theory' / 'pauli_wickelgren_aa1' / '3_the_grothendieck_witt_ring_of_k' / 'pauli_wickelgren_aa1_example 3.7.md')

    assert not os.path.exists(temp_dir / 'does_not_exist.md')
    with ExceptionExpected(ex=FileNotFoundError):
        mf = MarkdownFile.from_file(temp_dir / 'does_not_exist.md')

If the list of lines of the Markdown file are available, then the MarkdownFile.from_list factory method can be used.

Similarly, if the entire string of the Markdown file is available, then the MarkdownFile.from_str factory method can be used.

list_of_lines = template_text.splitlines()
template_mf_1 = MarkdownFile.from_list(list_of_lines)
template_mf_2 = MarkdownFile.from_string(template_text)

print(str(template_mf_1))
test_eq(str(template_mf_1), str(template_mf_2))

---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation

Getting headings of a `MarkdownFile` object

In a Markdown file, one can set headings. In fact, you can consider the text here as text rendered with Markdown! More generally, you can type Markdown in Jupyter notebooks.

For example, typing the following text

There are multiple methods in the MarkdownFile class which retrieve the headings of a Markdown file and their locations.

template_mf = MarkdownFile.from_string(template_text)

source

MarkdownFile.get_headings

 MarkdownFile.get_headings (levels:Union[int,Iterator[int],NoneType]=None,
                            include_start:bool=True)

Return a list of heading titles in the markdown file.

	Type	Default	Details
levels	Union	None	The levels of the headings to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. Defaults to `None`, in which case all heading-levels are searched.
include_start	bool	True	If `True` and if this object contains text that is not under a heading (i.e. the text does not start with a heading), then include `-1` as a key with the empty str as value.
Returns	list		Each str is the heading, including leading sharps `'#'`.

The get_headings function returns only a list of headings.

print(template_mf.get_headings())
assert template_mf.get_headings() == [
    '# Topic[^1]', '# See Also', '# Meta', '## References', '## Citations and Footnotes', '']

assert template_mf.get_headings((3,4,6), include_start=True) == ['']

['# Topic[^1]', '# See Also', '# Meta', '## References', '## Citations and Footnotes', '']

The empty heading '' is returned when include_start=True and there is text belonging to no heading - this can only happen at the start of the Markdown file before any headings are specified. Any YAML frontmatter meta is considered as “text belonging to no heading”.

Setting include_start=False excludes the empty heading altogether.

assert template_mf.get_headings(include_start=False) == [
    '# Topic[^1]', '# See Also', '# Meta', '## References', '## Citations and Footnotes']

The parameter levels specifies which level headings to return. The argument passed to levels does not affect whether or not the empty heading '' is included.

assert template_mf.get_headings(levels=1, include_start=True) == [
    '# Topic[^1]', '# See Also', '# Meta', '']
assert template_mf.get_headings(levels=1, include_start=False) == [
    '# Topic[^1]', '# See Also', '# Meta']
# Since list(range(2,6)) == [2, 3, 4, 5], the below returns all headings of levels 2, 3, 4, 5.
assert template_mf.get_headings(levels=range(2,6), include_start=True) == [
    '## References', '## Citations and Footnotes', '']

source

MarkdownFile.get_headings_by_line_number

 MarkdownFile.get_headings_by_line_number
                                           (levels:Union[Iterator[int],int
                                           ,NoneType]=None,
                                           include_start:bool=True)

Return a dict of heading titles in the markdown file.

	Type	Default	Details
levels	Union	None	The levels of the headings to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If `None` then all heading-levels are searched.
include_start	bool	True	If `True` and if this object contains text that is not under a heading (i.e. the text does not start with a heading), then include `-1` as a key with the empty str as value.
Returns	dict		The keys are line numbers and each value is str is the heading string, including the leading sharps `'#'`, but without any leading or trailing whitespace characters.

The get_headings_by_line_number function returns a dict whose keys are line numbers to headers and whose corresponding values are the full header str.

Similarly as with get_headings, setting include_start=True includes the empty header; the corresponding line number is always -1.

line_numbers_and_headings = template_mf.get_headings_by_line_number()
print(line_numbers_and_headings)
assert line_numbers_and_headings == {
5: '# Topic[^1]',
 7: '# See Also',
 9: '# Meta',
 10: '## References',
 12: '## Citations and Footnotes',
 -1: ''}

assert template_mf.get_headings_by_line_number(include_start=False) == {
5: '# Topic[^1]',
 7: '# See Also',
 9: '# Meta',
 10: '## References',
 12: '## Citations and Footnotes', }

{5: '# Topic[^1]', 7: '# See Also', 9: '# Meta', 10: '## References', 12: '## Citations and Footnotes', -1: ''}

# TODO: add more examples

source

MarkdownFile.get_headings_and_text

 MarkdownFile.get_headings_and_text
                                     (levels:Union[Iterator[int],int,NoneT
                                     ype]=None, include_start:bool=True)

*Return a list of headings and the text under each heading.

The text under each heading does not include the text of subheadings.*

	Type	Default	Details
levels	Union	None	The levels of the headings to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If `None`, then all heading-levels are searched.
include_start	bool	True	If `True` and if this object contains text that is not under a heading (i.e. the text does not start with a heading), then include `-1` as a key with the empty str as value.
Returns	dict		Each key is the entire str of the heading, including the leading sharps `'#'`, but not including leading or trailing whitespace characters Each value is the str under that heading until the next heading, including at trailing next line characters `\n`. If `include_start` is `True`, then one of the keys is the empty str and the corresponding value is the start of the text that is not under any heading.

The get_headings_and_text function returns a dict whose keys are full headers and whose values are text under the headers.

headings_and_text = template_mf.get_headings_and_text()
print(headings_and_text)
assert headings_and_text == { '': '---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note]\n---',
 '# Topic[^1]': '',
 '# See Also': '',
 '# Meta': '',
 '## References': '',
 '## Citations and Footnotes': '[^1]: Citation' }

print(template_mf.get_headings_and_text(None, False))
assert template_mf.get_headings_and_text(None, False) == {
 '# Topic[^1]': '',
 '# See Also': '',
 '# Meta': '',
 '## References': '',
 '## Citations and Footnotes': '[^1]: Citation' }

{'': '---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note]\n---', '# Topic[^1]': '', '# See Also': '', '# Meta': '', '## References': '', '## Citations and Footnotes': '[^1]: Citation'}
{'# Topic[^1]': '', '# See Also': '', '# Meta': '', '## References': '', '## Citations and Footnotes': '[^1]: Citation'}

# TODO: add more examples

source

MarkdownFile.get_headings_tree

 MarkdownFile.get_headings_tree ()

*Return a dict representing the tree of headings in the markdown file.

Returns

dict[Union[str, int], Union[str, dict]]
- The keys are 1. line numbers or 2. the str 'title'. The values are dict or str (the blank str if root node) respectively. The dicts in themselves recursively represent trees and the str are headings, including the leading sharps. In particular, the root level dict also has the blank string '' associated to the key 'title'.*

headings_tree = template_mf.get_headings_tree()
print(headings_tree)
assert headings_tree == {
    'title': '',
    5: {'title': '# Topic[^1]'},
    7: {'title': '# See Also'},
    9: {'title': '# Meta',
        10: {'title': '## References'},
        12: {'title': '## Citations and Footnotes'}
        }
}

{'title': '', 5: {'title': '# Topic[^1]'}, 7: {'title': '# See Also'}, 9: {'title': '# Meta', 10: {'title': '## References'}, 12: {'title': '## Citations and Footnotes'}}}

source

MarkdownFile.get_line_number_of_heading

 MarkdownFile.get_line_number_of_heading (title:Optional[str]=None,
                                          from_line:int=0, levels:Union[It
                                          erator[int],int,NoneType]=None)

Return the line number of the heading with the specified title after the specified line number.

	Type	Default	Details
title	Optional	None	Title of the heading. Does not include the leading sharps (`'#'`). If `None`, then return the line number of any heading after the specified line number.
from_line	int	0	The line number to start searching for the heading with `title` from.
levels	Union	None	The levels of the heading to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If `None`, then all heading-levels are searched.
Returns	int		An index in `self.parts`. If no index/line number of the matching heading exists, then return -1.

Note that the argument to title does not include the starting hashtags #.

line_number = template_mf.get_line_number_of_heading(title='See Also')
assert line_number == 7

If the heading of the specified title does not exist, then -1 is returned.

assert template_mf.get_line_number_of_heading(title='Nonexistent title') == -1

We can search for headers of specified titles from specified lines onward:

assert template_mf.get_line_number_of_heading(title='Topic[^1]', from_line=3) == 5
assert template_mf.get_line_number_of_heading(title='Topic[^1]', from_line=6) == -1

We can also specify the levels that the header must be:

assert template_mf.get_line_number_of_heading(title='Topic[^1]', levels=(1,2,6)) == 5
assert template_mf.get_line_number_of_heading(title='Topic[^1]', levels=(3, 5)) == -1

source

MarkdownFile.get_line_numbers_under_heading

 MarkdownFile.get_line_numbers_under_heading (title:Optional[str]=None,
                                              from_line:int=0, levels:Unio
                                              n[Iterator[int],int,NoneType
                                              ]=None, include_subheadings:
                                              bool=True)

Return the line numbers belonging to the heading.

	Type	Default	Details
title	Optional	None	Title of the heading. Does not include the leading sharps (`'#'`). If `None`, then return the line number of any heading after the specified line number.
from_line	int	0	The line number to start searching for the heading with `title` from.
levels	Union	None	The levels of the heading to search for. Each int is between 1 and 6 inclusive, as each heading can be of levels 1 to 6. If `None`, then all heading-levels are searched.
include_subheadings	bool	True	If `True`, then include the subheadings.
Returns	Union		`(start, end)` where `self.parts[start:end]` represents the parts under the heading, including the start of the heading. If the heading of the specified title does not exist, then returns -1.

print(template_text)
template_mf = MarkdownFile.from_string(template_text)
assert template_mf.get_line_numbers_under_heading(title='Topic[^1]') == (5,7)
assert template_mf.get_line_numbers_under_heading(title='See Also') == (7,9)
assert template_mf.get_line_numbers_under_heading(title='Meta') == (9,14)
assert template_mf.get_line_numbers_under_heading(title='References') == (10,12)
assert template_mf.get_line_numbers_under_heading(title='Citations and Footnotes') == (12,14)

---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation

If include_subheadings=False, then the line numbers for only the section without any subsections is returned.

assert template_mf.get_line_numbers_under_heading(title='Topic[^1]', include_subheadings=False) == (5,7)
assert template_mf.get_line_numbers_under_heading(title='See Also', include_subheadings=False) == (7,9)
assert template_mf.get_line_numbers_under_heading(title='Meta', include_subheadings=False) == (9,10)
assert template_mf.get_line_numbers_under_heading(title='References', include_subheadings=False) == (10,12)
assert template_mf.get_line_numbers_under_heading(title='Citations and Footnotes', include_subheadings=False) == (12,14)

Adding/removing lines in a `MarkdownFile` object

# TODO examples of insert_line, remove_line, pop_line, add_line_to_end, add_blank_line_to_end, add_line_in_section

source

MarkdownFile.insert_line

 MarkdownFile.insert_line (index:int,
                           line_dict:dict[str,typing.Union[__main__.Markdo
                           wnLineEnum,str]])

Add a line at the specified index/line number to self.parts.

	Type	Details
index	int	The index at which to add `line_dict` into `self.parts`.
line_dict	dict	See `self.parts`.
Returns	None

source

MarkdownFile.remove_line

 MarkdownFile.remove_line (index:int=-1)

Remove a line from self.parts.

	Type	Default	Details
index	int	-1	The index of the line to remove from `self.parts`.
Returns	None

source

MarkdownFile.remove_lines

 MarkdownFile.remove_lines (start:int, end:int)

Remove lines from self.parts.

	Type	Details
start	int	The index of the first line to remove from `self.parts`.
end	int	The end index to remove; the line of index `end` is not removed, but the line of index `end -` is.
Returns	None

source

MarkdownFile.pop_line

 MarkdownFile.pop_line (index:int=-1)

Remove a line from self.parts and get its value.

	Type	Default	Details
index	int	-1	The index of the line to pop from `self.parts`.
Returns	dict		The popped line

source

MarkdownFile.add_line_to_end

 MarkdownFile.add_line_to_end
                               (line_dict:dict[str,typing.Union[__main__.M
                               arkdownLineEnum,str]])

Add a line to the end of self.parts.

	Type	Details
line_dict	dict	See `self.parts`.
Returns	None

source

MarkdownFile.add_blank_line_to_end

 MarkdownFile.add_blank_line_to_end ()

Add a blank line to the end of self.parts.

source

MarkdownFile.add_line_in_section

 MarkdownFile.add_line_in_section (title:str,
                                   line_dict:dict[str,typing.Union[__main_
                                   _.MarkdownLineEnum,str]],
                                   start:bool=True)

Add a line in section specified by its title.

	Type	Default	Details
title	str		Title of the heading (without the leading sharps `'#'`)
line_dict	dict		The line to add
start	bool	True	If `True`, add to the start of the section. If `False`, add to the end of the section.
Returns	None

Removing or clearing sections in a `MarkdownFile` object

source

MarkdownFile.remove_section

 MarkdownFile.remove_section (title:str)

Remove the section with the specified title, including subsections, if the section exists.

	Type	Details
title	str	The title of the section to remove (without the starting `'#'`’s)
Returns	None

The remove_section method removes all lines belonging to a section, including subsections.

# TODO remove_section, clear_section, clear_all_sections
template_mf = MarkdownFile.from_string(template_text)
template_mf.remove_section('Topic[^1]')
assert len(template_mf.parts) == 12
template_mf.remove_section('Meta')  # This removes subsections too!
assert str(template_mf) == """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---
# See Also
"""

Attempting to remove a Non-existent section does nothing.

mf_2 = MarkdownFile.from_string(template_text)
mf_2.remove_section('Non existing section')
assert str(mf_2), template_text

source

MarkdownFile.clear_section

 MarkdownFile.clear_section (title:str, leave_blank_line:bool=True,
                             clear_subsections:Optional[str]=None)

*Clear the section with the specified title, if it exists.

Does not clear subsections.*

	Type	Default	Details
title	str		Title of the section (Without the leading sharps `'#'`)
leave_blank_line	bool	True	If `True`, leaves a blank line at the end of the section.
clear_subsections	Optional	None	`'clear'`, `'delete'`, or `None`. If `'clear'`, then just clears the contents of subsections, but does not affect the headers. If `'delete'`, then clears the contents of the subsections and deletes the headers. If `None`, then does not affect either.
Returns	None

mf = MarkdownFile.from_string(text_1)
mf.clear_section('Section 1', leave_blank_line=True)
headings_and_text = mf.get_headings_and_text()
assert headings_and_text['# Section 1'] == ''
assert mf.get_line_number_of_heading('Subsection a') == 3
assert mf.parts[4]['line'] == 'Didididi'
print(mf)


# Section 1

## Subsection a
Didididi
Dododododo
# Section 2

Setting leave_blank_line=False leaves no blank line between the section and the next:

mf = MarkdownFile.from_string(text_1)
mf.clear_section('Section 1', leave_blank_line=False)
headings_and_text = mf.get_headings_and_text()
assert headings_and_text['# Section 1'] == ''
assert mf.get_line_number_of_heading('Subsection a') == 2
assert mf.parts[3]['line'] == 'Didididi'
print(mf)


# Section 1
## Subsection a
Didididi
Dododododo
# Section 2

source

MarkdownFile.clear_all_sections

 MarkdownFile.clear_all_sections (leave_blank_lines:bool=True)

*Clear all sections.

Does not clear frontmatter metadata. Leaves all headers intact.*

	Type	Default	Details
leave_blank_lines	bool	True
Returns	None		If True, leaves a blank line in each section

mf = MarkdownFile.from_string(text_1)
mf.clear_all_sections(leave_blank_lines=True)
assert len(mf.parts) == 3
print(mf)

# Section 1
## Subsection a
# Section 2

Metadata in a `MarkdownFile` object

Here are some things that we can do with a MarkdownFile object with frontmatter YAML metadata:

template_mf = MarkdownFile.from_string(template_text)
print (template_mf.metadata(), '\n')
assert template_mf.metadata() == {'cssclass': 'clean-embeds', 'aliases': [], 'tags': ['_meta/literature_note']}
assert template_mf.has_metadata()
assert template_mf.metadata_lines() == (0, 4)


new_metadata = {'aliases': ['an_awesome_note', 'no_more_cssclass', 'no_more_tags']}
template_mf.replace_metadata(new_metadata)
print('The following is the MarkdownFile with new frontmatter YAML metadata:\n')
print(template_mf, '\n')
assert str(template_mf) == """---
aliases: [an_awesome_note, no_more_cssclass, no_more_tags]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation"""


template_mf.remove_metadata()
print('The following is the MarkdownFile with frontmatter YAML metadata removed:\n')
print(template_mf)
assert str(template_mf) == """# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation"""

{'cssclass': 'clean-embeds', 'aliases': [], 'tags': ['_meta/literature_note']} 

The following is the MarkdownFile with new frontmatter YAML metadata:

---
aliases: [an_awesome_note, no_more_cssclass, no_more_tags]
---
# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation 

The following is the MarkdownFile with frontmatter YAML metadata removed:

# Topic[^1]

# See Also

# Meta
## References

## Citations and Footnotes
[^1]: Citation

Note that if the MarkdownFile does not have any YAML frontmatter metadata, then the metadata method returns None:

mf_1 = MarkdownFile.from_string(text_1)
assert mf_1.metadata() is None

If the MarkdownFile’s YAML frontmatter metadata has formatting issues then metadata raises a ValueError. In actuality, the error message also yields the appropraite yaml.YAMLError in the PyYAML library, e.g. yaml.parser.ParserError, yaml.scanner.ScannerError, or yaml.reader.ReaderError.

text_with_bad_yaml = "---\nsome_metadata_field: [\badly_formatted_string]\n---\nThe rest of the note contents..."
mf = MarkdownFile.from_string(text_with_bad_yaml)
with ExceptionExpected(ValueError): # By virtue of giving a ReaderError
    mf.metadata()


text_with_bad_yaml = "---\nfield: field2: \n---\nThe rest of the note contents..."
mf = MarkdownFile.from_string(text_with_bad_yaml)
with ExceptionExpected(ValueError): # By virtue of giving a ScannerError
    mf.metadata()


text_with_bad_yaml = "---\nfield: John\n- field2: Mary \n---\nThe rest of the note contents..."
mf = MarkdownFile.from_string(text_with_bad_yaml)
with ExceptionExpected(ValueError): # By virtue of giving a ParserError
    mf.metadata()

source

MarkdownFile.has_metadata

 MarkdownFile.has_metadata ()

*Return True if this MarkdownFile object has fronmatter YAML metadata.

If the MarkdownFile object has any frontmatter YAML metadata, then it is expected to be at the very start; in particular, it must not be preceded by any whitespace characters.*

source

MarkdownFile.metadata_lines

 MarkdownFile.metadata_lines ()

*Return the indices in self.parts which are metadata.

Assumes that self.parts is nonempty.

If the MarkdownFile object has any frontmatter YAML metadata, then it is expected to be at the very start; in particular, it must not be preceded by any whitespace characters.

Returns

tuple
- The tuple consists of 2 ints, a and b, where self.parts[a:b+1] represent the metadata lines, including the '---' before and after.*

source

MarkdownFile.replace_metadata

 MarkdownFile.replace_metadata (new_metadata:dict[str],
                                enquote_entries_in_fields:list[str]=[])

*Replace the frontmatter metadata of this MarkdownFile object.

Optionally also enquotes string entries in fields specified by enquote_entries_in_fields.

Warning - This method is only tested when the values of new_metadata are either str or list[str].*

	Type	Default	Details
new_metadata	dict		The dictionary representing the new metadata. The keys are the names of fields. The values are the field values, usually expected to be a single string or a list of strings
enquote_entries_in_fields	list	[]	A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of `new_metadata`, then that string is essentially ignored (in particular, no errors are raised).
Returns	None

# TODO: basic example

In the case that the metadata comprises of strings which need to be escaped (because they have backslashes), then the enquote_entries_in_fields parameter of the MarkdownFile.replace_metadata method can be specified to enquote and escape such strings, cf. dict_to_metadata_lines.

In particular, the MarkdownFile.replace_metadata method with the MarkdownFile object’s own .metadata() passed in the following example should ideally not modify the string of the MarkdownFile object. This feature needs to be tested with more examples, however.

mf = MarkdownFile.from_string(
    r'''---
latex_in_original: ["\\mathscr{O}_{\\text {Proj } S_{*}}(n)"]
---

''')

original_metadata = mf.metadata()
original_str = str(mf).strip()

dict_to_metadata_lines(mf.metadata(), enquote_entries_in_fields=['latex_in_original'])
mf.replace_metadata(mf.metadata(), enquote_entries_in_fields=['latex_in_original'])

test_eq(mf.metadata(), original_metadata)
test_eq(str(mf).strip(), original_str)

Multiple methods in the MarkdownFile class, including MarkdownFile.add_tags, MarkdownFile.remove_tags, and MarkdownFile.replace_auto_tags_with_regular_tags depend on the MarkdownFile.replace_metadata method. Arguments for the enquote_entries_in_metadata_fields must be specified appropriately when using these methods.

source

MarkdownFile.remove_metadata

 MarkdownFile.remove_metadata ()

Remove the frontmatter metadata of this MarkdownFile object.

source

MarkdownFile.add_metadata_section

 MarkdownFile.add_metadata_section (check_exists:bool=True)

Add a frontmatter YAML metadata at the very beginning.

	Type	Default	Details
check_exists	bool	True	If `True`, Check if there is already a metadata section at the beginning and do not add a metadata section if it exists.
Returns	None

If the MarkdownFile has no frontmatter YAML metadata, then we can use the add_metadata_section method to add blank frontmatter YAML metadata:

mf = MarkdownFile.from_string(text_1)
assert not mf.has_metadata()
mf.add_metadata_section()
print(mf)

---

---


# Section 1
some text 

asdfasdf

## Subsection a
Didididi
Dododododo
# Section 2

If the MarkdownFile object already has frontmatter YAML metadata, then the add_metadata_section method does nothing.

template_mf = MarkdownFile.from_string(template_text)
mf.add_metadata_section()
assert str(template_mf) == template_text

Tags in `MarkdownFile` objects

YAML metadata tags

In Obsidian, one can add tags to notes both within text and in the frontmatter YAML metadata.

source

MarkdownFile.has_tag

 MarkdownFile.has_tag (tag:str)

*Return True if the Markdown file has the specified tag in its YAML frontmatter metadata.

More specifically, return True if the MarkdownFile objeect

has YAML frontmatter metadata,
the metadata has a 'tags' section,, and
the 'tags' section is a list with the specified tag.

Note that tag should not start with the hashtag # charater.*

	Type	Details
tag	str	The tag. Does not start with the hashtag `'#'`.
Returns	bool

source

MarkdownFile.add_tags

 MarkdownFile.add_tags (tags:Union[str,list[str]], skip_repeats:bool=True,
                        skip_repeated_auto:bool=True,
                        enquote_entries_in_metadata_fields:list[str]=[])

*Add tags to the frontmatter metadata.

The order of the tags may be changed.

Ultimately the replace_metadata method is used to modify the YAML metadata. Use the enquote_entries_in_metadata_fields parameter to ensure that the replace_metadata invocation preserves enquoted metadata values.*

	Type	Default	Details
tags	Union		The str representing the tags. May or may not start with `'#'`, e.g. `'#_meta/definition'` or `'_meta/definition'`.
skip_repeats	bool	True	If `True`, then this MarkdownFile will just have unique tags; merges pre-existing repeated tags if necessary. Also, the order of the tags may be changed.
skip_repeated_auto	bool	True	If `True`, then only add tags starting with ’_auto/’ if the corresponding non-auto tag does not exist, e.g. ’_auto/_meta/definition’ is not added if the note already has ’_meta/definition’.
enquote_entries_in_metadata_fields	list	[]	A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of `new_metadata`, then that string is essentially ignored (in particular, no errors are raised).
Returns	None

source

MarkdownFile.remove_tags

 MarkdownFile.remove_tags (tags:list[str],
                           enquote_entries_in_metadata_fields:list[str]=[]
                           )

*Remove specified tags from the frontmatter metadata, if the frontmatter metadata and the specified tags.

If the MarkdownFile object does not have a frontmatter or if the frontmatter does not include a tags line, then the MarkdownFile object is not modified.

Assumes that this MarkdownFile object has a frontmatter and that the frontmatter includes a tags line.

Any repeated tags are either merged into one (if the tag is not in tags) or are removed (if the tag is in tags).

	Type	Default	Details
tags	list		The str representing the tags. May or may not start with `'#'`, e.g. `'#_meta/definition'` or `'_meta/definition'`.
enquote_entries_in_metadata_fields	list	[]	A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of `new_metadata`, then that string is essentially ignored (in particular, no errors are raised).
Returns	None

mf = MarkdownFile.from_string(text_2)
assert mf.has_tag('_meta/definition')

mf.add_tags(['new_tag'])  #This may change the order of the tags.
assert mf.has_tag('new_tag')

assert mf.has_tag('this_tag_will_be_removed')
mf.remove_tags(['this_tag_will_be_removed'])
assert not mf.has_tag('this_tag_will_be_removed')

assert not mf.has_tag('no_tag')
mf.remove_tags(['no_tag']) # Does nothing
assert not mf.has_tag('no_tag')

print(mf)

---
tags: [_meta/concept, new_tag, _meta/definition, _auto/_meta/notation]
---
# Topic
This is some note with some stuff.

If the MarkdownFile object does not have frontmatter YAML metadata or if its frontmatter YAML metadata does not have a tags section, then no changes are made to the MarkdownFile object:

# Example of a MarkdownFile object without frontmatter YAML metadata:
mf = MarkdownFile.from_string(text_1)

mf_str_before = str(mf)

assert mf.metadata() is None
mf.remove_tags(['_meta/definition'])
assert mf.metadata() is None

test_eq(str(mf), mf_str_before)

# Example of a MarkdownFile object with frontmatter YAML metadata but without a tags section.
mf = MarkdownFile.from_string(text_8)

mf_str_before = str(mf)

assert mf.metadata() is not None
assert 'tag' not in mf.metadata() 
mf.remove_tags(['_meta/definition'])

test_eq(str(mf), mf_str_before)

The add_tags method has a skip_repeats parameter.

mf.add_tags(['new_tag'], skip_repeats=True)  # Only one `new_tag` will be present after this.
print(mf)
mf.add_tags(['new_tag'], skip_repeats=False)  # After this, mf will have 2 `new_tag`'s  # After this, mf will have 2 `new_tag`'s  # After this, mf will have 2 `new_tag`'s
print(mf)
mf.add_tags(['new_tag'], skip_repeats=True)  # Only one `new_tag` will be present after this, even though there were multiple `new_tag`'s before this.
print(mf)

---
cssclass: clean-embeds
tags: [new_tag]
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

%%This is a comment. 
The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.
---
cssclass: clean-embeds
tags: [new_tag, new_tag]
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

%%This is a comment. 
The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.
---
cssclass: clean-embeds
tags: [new_tag]
---
# Topic[^1]
Here is a LaTeX Equation:

$$ 5 \neq 7$$
Hey
Okay, now here is another one:
$$\begin{align*}
\sum_{k=1}^n k = \frac{n(n+1)}{2}
\end{align*}$$

%%This is a comment. 
The comment is not visible.
This is the end of the comment %%

This is the end of this note. This is visible.

source

MarkdownFile.replace_auto_tags_with_regular_tags

 MarkdownFile.replace_auto_tags_with_regular_tags (exclude:list[str]=None,
                                                   enquote_entries_in_meta
                                                   data_fields:list[str]=[
                                                   ])

*Replace tags in the frontmatter metadata starting with _auto/ with tags without the _auto/.

	Type	Default	Details
exclude	list	None	The tags whose `_auto/` tags should not be converted. The str should not start with `'#'` and should not start with `'_auto/'`.
enquote_entries_in_metadata_fields	list	[]	A list of str of fields in the YAML metadata whose entries need to be enquoted. If there is a string that is not a key of `new_metadata`, then that string is essentially ignored (in particular, no errors are raised).
Returns	None

One can use / characters to create “subtags”. trouver recognizes _auto tags, which are tags that trouver adds to mark notes that it has processed in certain ways.

For example, trove.markdown.osidian.personal.machine_learning.tag_categorization can label the “types” of the content of information notes. For example, if this labelling process determines an information note to be a definition note, then it will add a _auto/_meta/definition tag, as opposed to a _meta/definition tag. This way, one can recognize that the tag was added automatically via a machine learning model and may be prone to error.

The replace_auto_tags_with_regular_tags replaces _auto tags with regular tags. This can be useful, for example, once tags for a multitude of notes have been verified.

mf = MarkdownFile.from_string(text_2)
assert mf.has_tag('_auto/_meta/notation')
mf.replace_auto_tags_with_regular_tags()
assert mf.has_tag('_meta/notation')
assert not mf.has_tag("_auto/_meta/notation")
print(mf)

---
tags: [_meta/definition, _meta/concept, _meta/notation, this_tag_will_be_removed]
---
# Topic
This is some note with some stuff.

display math mode tags

The difference between headers and display math mode tags is that the former has a space between the hashtags and the text and the latter does not. For example, # Header is a header and #tag is a tag.

source

MarkdownFile.remove_in_line_tags

 MarkdownFile.remove_in_line_tags ()

Remove lines starting with in line tags.

mf = MarkdownFile.from_string(text_3)
mf.remove_in_line_tags()
print(str(mf))
print(mf.parts)
assert len(mf.parts) == 7
assert str(mf) == """
# Section 1
Some stuff


# Hello
"""


# Section 1
Some stuff


# Hello

[{'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}, {'line': '# Section 1', 'type': <MarkdownLineEnum.HEADING: 1>}, {'line': 'Some stuff', 'type': <MarkdownLineEnum.DEFAULT: 0>}, {'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}, {'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}, {'line': '# Hello', 'type': <MarkdownLineEnum.HEADING: 1>}, {'line': '', 'type': <MarkdownLineEnum.BLANK_LINE: 9>}]

template_mf = MarkdownFile.from_string(template_text)
template_mf.remove_in_line_tags()
assert str(template_mf) == template_text

Extract raw content from a `MarkownFile` object

One can add a multitude of meta-data to Obsidian Markdown notes - frontmatter metadata, headers/footers, links, embedded links, tags, etc.

We can extract raw content from notes by removing a combination of these meta-data.

MarkdownFile.replace_links_with_display_text

 MarkdownFile.replace_links_with_display_text
                                               (remove_embedded_note_links
                                               :bool=False)

Remove nonembedded links and replaces them with their display text.

	Type	Default	Details
remove_embedded_note_links	bool	False	If `True`, remove links to embedded notes as well. If `False`, does not modify embedded notes.`
Returns	None

mf = MarkdownFile.from_string(text_4)
mf.replace_links_with_display_text()
print(str(mf))
assert str(mf) == """
# Some thing

I have a link

## Another topic
This is a link without a specified display text: some_kind_of_note.

This is a link to an anchor without a specified display text: another_note > another anchor."""


# Some thing

I have a link

## Another topic
This is a link without a specified display text: some_kind_of_note.

This is a link to an anchor without a specified display text: another_note > another anchor.

If remove_embedded_note_links=True, then embedded links will be replaced with their “display text” as a link; they will not be replaced with the underlying embedded text.

mf = MarkdownFile.from_string(text_5)
mf.replace_links_with_display_text()
assert str(mf) == text_5

mf.replace_links_with_display_text(remove_embedded_note_links=True)
print(str(mf))
assert str(mf) == """# A header
This note is embedded.

The link above should will not be replaced by `replace_links_with_display_text`,
unless `remove_embedded_note_links` is set to `True`."""

# A header
This note is embedded.

The link above should will not be replaced by `replace_links_with_display_text`,
unless `remove_embedded_note_links` is set to `True`.

source

MarkdownFile.remove_footnotes_to_embedded_links

 MarkdownFile.remove_footnotes_to_embedded_links
                                                  (remove_footnote_mention
                                                  s:bool=True)

*Remove footnotes to embedded links.

These are footnotes whose only content are embedded links, e.g. [^1]: ![[embedded_note]]*

	Type	Default	Details
remove_footnote_mentions	bool	True	If `True`, removes the mentions to the footnote to the embedded links in the text.
Returns	None

I very often use footnotes with only embedded links. We can remove such footnotes.

mf = MarkdownFile.from_string(text_6)
mf.remove_footnotes_to_embedded_links()
assert str(mf) == """
# Header

I want to link to some embedded note


You can also let the footnote mention be alphanumeric
"""

Setting remove_footnote_mentions=False removes the content of the footnotes themselves, but leaves the mentions intact:

mf = MarkdownFile.from_string(text_6)
mf.remove_footnotes_to_embedded_links(remove_footnote_mentions=False)
assert str(mf) == """
# Header

I want to link to some embedded note[^1]


You can also let the footnote mention be alphanumeric[^1][^note]
"""

# hide
mf = MarkdownFile.from_string(text_7)
mf.remove_footnotes_to_embedded_links(remove_footnote_mentions=True)
assert '[^2]' not in str(mf)

source

MarkdownFile.remove_headers

 MarkdownFile.remove_headers ()

Remove all headers.

We can remove all of the headers and leave the rest of the text intact

mf = MarkdownFile.from_string(template_text)
mf.remove_headers()
assert str(mf) == """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---



[^1]: Citation"""

mf = MarkdownFile.from_string(text_1)
mf.remove_headers()
assert str(mf) == """
some text 

asdfasdf

Didididi
Dododododo"""

source

MarkdownFile.remove_double_blank_lines

 MarkdownFile.remove_double_blank_lines ()

Remove blank lines so that there are no consecutive blank lines

When removing some of the “metadata”, the content of the note can be left with a lot of consecutive blank lines. To ensure that machine-learning models will not develop some kind of blank line bias, we can remove such consecutive blank lines.

mf = MarkdownFile.from_string(template_text)
mf.remove_headers()
mf.remove_double_blank_lines()
assert str(mf) == """---
cssclass: clean-embeds
aliases: []
tags: [_meta/literature_note]
---

[^1]: Citation"""

source

MarkdownFile.replace_embedded_links_with_text

 MarkdownFile.replace_embedded_links_with_text (vault:os.PathLike,
                                                recursive:bool=True, remov
                                                e_paragraph_id:bool=True)

*Remove embedded links and replaces them with their underlying text, as found in notes in the vault.

Assumes that the embedded links do not loop infinitely.

For embedded links to notes that do not exist in the vault, the embedded links are replaced with blank str.

No new entries are added to self.parts even if the embedded links have multiple lines.*

	Type	Default	Details
vault	PathLike
recursive	bool	True	If `True`, then recursively replaces embedded links in the text of the embedded links.
remove_paragraph_id	bool	True	If `True`, then removes the paragraph id’s in the text of the embedded links. Leaves the paragraph id’s of the origianl text in tact.
Returns	None

# TODO: test recursive
with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault_2(temp_dir)
    vault = Path(temp_dir)
    vn = VaultNote(vault, name='note_with_embedded_links_1')
    mf = MarkdownFile.from_vault_note(vn)
    mf.replace_embedded_links_with_text(vault)
    assert str(mf) == r"""This is a note.

There are some embedded text here:

Hello, this is a note which becomes entirely embedded.

The comment is not visible.
This is the end of the comment %%


cheese 
bandit
$$asdf$$
asdf 

$$5 \neq 7
$$ 

# Section
Some kind of section?

Lalalala
## Subsection
argonaut"""

setting remove_paragraph_id=False keeps the paragraph id’s in the embedded text.

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault_2(temp_dir)
    vault = Path(temp_dir)
    vn = VaultNote(vault, name='note_with_embedded_links_1')
    mf = MarkdownFile.from_vault_note(vn)
    mf = MarkdownFile.from_vault_note(vn)
    mf.replace_embedded_links_with_text(vault, remove_paragraph_id=False)
    print(str(mf))
    assert str(mf) == r"""This is a note.

There are some embedded text here:

Hello, this is a note which becomes entirely embedded.

The comment is not visible.
This is the end of the comment %%


cheese 
bandit
$$asdf$$
asdf 
^65809f

$$5 \neq 7
$$ 
^221b51

# Section
Some kind of section?

Lalalala
## Subsection
argonaut"""

This is a note.

There are some embedded text here:

Hello, this is a note which becomes entirely embedded.
%%This is a comment. 
The comment is not visible.
This is the end of the comment %%


cheese 
bandit
$$asdf$$
asdf 
^65809f

$$5 \neq 7
$$ 
^221b51

# Section
Some kind of section?

Lalalala
## Subsection
argonaut

source

MarkdownFile.remove_html_tags

 MarkdownFile.remove_html_tags ()

*Remove HTML tags that are typeset in single lines.

HTML tags that span multiple lines are ignored.*

# TODO: test

source

MarkdownFile.merge_display_math_mode

 MarkdownFile.merge_display_math_mode ()

Merge chunks of display_math_mode latex lines into single lines

mf = MarkdownFile.from_string(text_10)
mf.merge_display_math_mode()
print(mf)
assert len(mf.parts) == 13

This is a single line display math mode LaTeX equation:

$$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:

$$ 5 + 2 = 7 $$

These are multiple consecutive display math mode LaTeX equations:

$$1+1 = 2 $$
$$5 + 7 = 14$$
$$  8 + 4 = 12  $$

source

MarkdownFile.merge_display_math_mode_into_preceding_text

 MarkdownFile.merge_display_math_mode_into_preceding_text
                                                           (separator:str=
                                                           '\n')

Merge chunks of display math mode latex lines into single lines and merge those single lines into preceding text lines.

	Type	Default	Details
separator	str		The str with which to join the latex lines into the text lines. Note that the display math mode latex lines are not joined with this str.
Returns	None

mf = MarkdownFile.from_string(text_10)
mf.merge_display_math_mode_into_preceding_text(separator=' ')
print(mf)
assert len(mf.parts) == 5

This is a single line display math mode LaTeX equation:  $$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:  $$ 5 + 2 = 7 $$

These are multiple consecutive display math mode LaTeX equations:  $$1+1 = 2 $$ $$5 + 7 = 14$$ $$  8 + 4 = 12  $$

We can set separator to its default value \n.

mf = MarkdownFile.from_string(text_10)
mf.merge_display_math_mode_into_preceding_text(separator='\n')
print(mf)
assert len(mf.parts) == 5  # Some of the parts have 'line' as multi-line str i.e. as str with `\n` characters.
assert '\n' in mf.parts[0]['line']
assert '\n' not in mf.parts[1]['line']

This is a single line display math mode LaTeX equation:

$$\mathcal{O}_X$$

This is a single multi-line display math mode LaTeX equation:

$$ 5 + 2 = 7 $$

These are multiple consecutive display math mode LaTeX equations:

$$1+1 = 2 $$
$$5 + 7 = 14$$
$$  8 + 4 = 12  $$

If the text starts with display math mode LaTeX, then that text is combined into one.

mf = MarkdownFile.from_string(text_11)
mf.merge_display_math_mode_into_preceding_text(separator=' ')
print(mf)
assert len(mf.parts) == 2

$$asdf$$ $$asdf$$ $$asdf$$
After text.

Writing a `MarkdownFile` object to a file

We can write the contents of a MarkdownFile object to the file represented by VaultNote object.

source

MarkdownFile.parts_of_id

 MarkdownFile.parts_of_id (par_id:str)

*Return the indices of the lines within the Markdown file belonging to the specified text id.

This id can be used as an anchor for a link in Obsidian. For example, [[note#^65809f]] is a link to a note named note to the text with id 65809f. Such a text is marked with a trailing ^65809f.*

	Type	Details
par_id	str	Must begin with `'\^'`.
Returns	Optional	`(start,end)` where `self.parts[start:end]` consists of the lines of the specified id. If the specified id does not exist for the note, then `None` is returned.

Links in Obsidian can be anchored at “paragraphs” of text. As Wikilinks, such links have the format [[<note_name>#^<id_of_paragraph>]]. Note that the id begins with a carat ^.

We can get the parts of the Markdown file to which the id refers to:

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault_2(temp_dir)
    vault = Path(temp_dir)
    vn = VaultNote(vault, name = 'note_with_paragraphs_that_are_embedded_1')
    mf = MarkdownFile.from_vault_note(vn)

    start, end = mf.parts_of_id('^65809f')
    assert start == 3 and end == 8
    print(mf.text_of_lines(start, end), '\n')
    # lines = [mf.parts[i]['line'] for i in range(start, end)]
    # print('\n'.join(lines))

    start, end = mf.parts_of_id('^221b51')
    assert start == 13 and end == 16
    print(mf.text_of_lines(start, end), '\n')

    start, end = mf.parts_of_id('^123456')
    assert start == 17 and end == 18
    print(mf.text_of_lines(start, end), '\n')

    start, end = mf.parts_of_id('^fff123')
    assert start == 20 and end == 21
    print(mf.text_of_lines(start, end), '\n')

    start, end = mf.parts_of_id('^latexthing')
    assert start == 22 and end == 23
    print(mf.text_of_lines(start, end), '\n')

cheese 
bandit
$$asdf$$
asdf 
^65809f 

$$5 \neq 7
$$ 
^221b51 

# This section has an id ^123456 

^fff123 

$$\mathcal{O}_X$$  ^latexthing

Hidden tests

# hide 
# TODO _look_at_start_of_file, _line_dict

Misc TODO

source

MarkdownFile.text_of_lines

 MarkdownFile.text_of_lines (start:int, end:int)

Return the text of self.parts[start:end], adding new line characters ' ' in between.

source

MarkdownFile.write

 MarkdownFile.write (vn:trouver.markdown.obsidian.vault.VaultNote,
                     mode:str='w', check_validity:bool=True)

*Write to the file specified by a VaultNote object.

If the file that the VaultNote object represents does not exist, then this method creates it.

Raises - ValueError - If check_validity is True and if str(self) cannot be parsed as a MarkdownFile object. If this error is raised, then most likely the issue is that the frontmatter metadata cannot be parsed by pyyaml and the metadata field elements need to be escaped and/or enquoted.
It is recommended to use .replace_metadata and passing arguments to the enquote_entries_in_fields parameter or to use the dict_to_metadata_lines function to ensure that metadata fields are enquoted and escaped.*

The MarkdownFile.write method raises a ValueError if check_validity is True and if the output of .__str__() on the MarkdownFile object cannot be parsed as a MarkdownFile object itself.

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault(temp_dir)

    # The following text is parseable by pyyaml.
    metadata_markdownfile_text = r'''---
latex_in_original: ["\\text{hi}"]
---
blahblah
'''
    mf = MarkdownFile.from_string(metadata_markdownfile_text)
    print("Note that `mf` is completely fine right after it is constructed:")
    print(str(mf))
    MarkdownFile.from_string(str(mf))  # This does not raise errors

    mf.add_tags('_new_tag')
    print('\nAfter adding a tag, however, the metadata changes. Note that the `latex_in_original` field is no longer enquoted and escaped:')
    print(mf.__str__())
    print('\n\nTo prevent further problems, the `.write` method raises a `ValueError`')

    vault_note = VaultNote(temp_dir, name='pauli_wickelgren_aa1_example 3.7')

    with ExceptionExpected(ex=ValueError):
        mf.write(vault_note)

Note that `mf` is completely fine right after it is constructed:
---
latex_in_original: ["\\text{hi}"]
---
blahblah

After adding a tag, however, the metadata changes. Note that the `latex_in_original` field is no longer enquoted and escaped:
---
latex_in_original: [\text{hi}]
tags: [_new_tag]
---
blahblah


To prevent further problems, the `.write` method raises a `ValueError`

If check_validity is False, then the .write method will attempt to write to the file of the VaultNote object. However, note that the resulting file writing may result not be fully parseable as a MarkdownFile object.

with tempfile.TemporaryDirectory(prefix='temp_dir_', dir=os.getcwd()) as temp_dir:
    make_example_vault(temp_dir)

    # The following text is parseable by pyyaml.
    metadata_markdownfile_text = r'''---
latex_in_original: ["\\text{hi}"]
---
blahblah
'''
    mf = MarkdownFile.from_string(metadata_markdownfile_text)
    print("Note that `mf` is completely fine right after it is constructed:")
    print(str(mf))
    MarkdownFile.from_string(str(mf))  # This does not raise errors

    mf.add_tags('_new_tag')
    print('\nAfter adding a tag, however, the metadata changes. Note that the `latex_in_original` field is no longer enquoted and escaped:')
    print(mf.__str__())
    print('\n\nTo prevent further problems, the `.write` method raises a `ValueError`')

    vault_note = VaultNote(temp_dir, name='pauli_wickelgren_aa1_example 3.7')

    mf.write(vault_note, check_validity=False)
    new_mf = MarkdownFile.from_vault_note(vault_note)
    with ExceptionExpected(ValueError): # The ValueError is from the .metadata method.
        new_mf.metadata()

Note that `mf` is completely fine right after it is constructed:
---
latex_in_original: ["\\text{hi}"]
---
blahblah

After adding a tag, however, the metadata changes. Note that the `latex_in_original` field is no longer enquoted and escaped:
---
latex_in_original: [\text{hi}]
tags: [_new_tag]
---
blahblah


To prevent further problems, the `.write` method raises a `ValueError`

source

MarkdownFile.copy

 MarkdownFile.copy (deep:bool)

Frontmatter meta

find_front_matter_meta_in_markdown_text

dict_to_metadata_lines

parse_metadata_string

Replace embedded links with text

replace_embedded_links_with_text

MarkdownFile class

MarkdownLineEnum

MarkdownFile

Example text and example vault used in the rest of the page

Constructing a MarkdownFile object

MarkdownFile.from_vault_note

MarkdownFile.from_file

MarkdownFile.from_list

MarkdownFile.from_string

Getting headings of a MarkdownFile object

MarkdownFile.get_headings

MarkdownFile.get_headings_by_line_number

MarkdownFile.get_headings_and_text

MarkdownFile.get_headings_tree

MarkdownFile.get_line_number_of_heading

MarkdownFile.get_line_numbers_under_heading

Adding/removing lines in a MarkdownFile object

MarkdownFile.insert_line

MarkdownFile.remove_line

MarkdownFile.remove_lines

MarkdownFile.pop_line

MarkdownFile.add_line_to_end

MarkdownFile.add_blank_line_to_end

MarkdownFile.add_line_in_section

Removing or clearing sections in a MarkdownFile object

MarkdownFile.remove_section

MarkdownFile.clear_section

MarkdownFile.clear_all_sections

Metadata in a MarkdownFile object

MarkdownFile.has_metadata

MarkdownFile.metadata_lines

MarkdownFile.replace_metadata

MarkdownFile.remove_metadata

MarkdownFile.add_metadata_section

Tags in MarkdownFile objects

YAML metadata tags

MarkdownFile.has_tag

MarkdownFile.add_tags

MarkdownFile.remove_tags

MarkdownFile.replace_auto_tags_with_regular_tags

display math mode tags

MarkdownFile.remove_in_line_tags

Extract raw content from a MarkownFile object

MarkdownFile.replace_links_with_display_text

MarkdownFile.remove_footnotes_to_embedded_links

MarkdownFile.remove_headers

MarkdownFile.remove_double_blank_lines

MarkdownFile.replace_embedded_links_with_text

MarkdownFile.remove_html_tags

MarkdownFile.merge_display_math_mode

MarkdownFile.merge_display_math_mode_into_preceding_text

Writing a MarkdownFile object to a file

MarkdownFile.parts_of_id

Hidden tests

Misc TODO

MarkdownFile.text_of_lines

MarkdownFile.write

MarkdownFile.copy

Constructing a `MarkdownFile` object

Getting headings of a `MarkdownFile` object

Adding/removing lines in a `MarkdownFile` object

Removing or clearing sections in a `MarkdownFile` object

Metadata in a `MarkdownFile` object

Tags in `MarkdownFile` objects

Extract raw content from a `MarkownFile` object

Writing a `MarkdownFile` object to a file