import pandas as pd
from fastai.text.all import *
from trouver.helper import _test_directory
import warnings
how_to.train_ml_model.fastai
# A lot of warnings can be printed out
"ignore") warnings.filterwarnings(
General
We use the ULMFiT approach (cf. fast.ai documentation, fast.ai tutorial on text transfer learning) to train a text categorization model.
This approach first trains a language model from the corpus of text in the data and then uses the encoder - the model not including the final layer - of this language model to then train the classification model.
# This line is only present for the sake of this example so that the code in this notebook can run even
# if the device does not have a gpu.
= torch.device("cpu") device
Train information note type model
The append_to_information_note_type_database
function in markdown.obsidian.personal.machine_learning.information_note_types
creates/appends to a csv
file containing data of note types.
# Replace this file with the appropriate file
= _test_directory() / 'ml_examples' / 'information_note_type_labels.csv'
csv_file = _test_directory() / 'ml_examples'
folder
= pd.read_csv(csv_file)
df df
Time added | Time modified | Note name | Full note content | Processed note content | #_meta/concept | #_meta/exercise | #_meta/definition | #_meta/example | #_meta/narrative | ... | #_meta/proof | #_meta/remark | #_meta/TODO/split | #_meta/TODO/merge | #_meta/TODO/delete | #_meta/hint | #_meta/how_to | #_meta/conjecture | #_meta/convention | #_meta/context | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_something_something | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/narrative]\n---\n# Topic[^1]\n\nIn this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n\n\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Page 1 | In this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n | NOT #_meta/concept | NOT #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | IS #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
1 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Definition 1 | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition_note]\n---\n# Ring[^1]\n\nA **ring** is a set with binary operators $+$ and $\cdot$ such that ...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Definition 1 | A ring is a set with binary operators $+$ and $\cdot$ such that ...\n | NOT #_meta/concept | NOT #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
2 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Definition 2 | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition, _meta/notation]\n---\n# Ring of integers modulo $n$[^1]\n\nLet $n \geq 1$ be an integer. The **ring of integers modulo $n$**, denoted by **$\mathbb{Z}/n\mathbb{Z}$**, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n\n\n# See Also\n- [[reference_with_tag_labels_Exercise 1|reference_with_tag_labels_Z_nZ_is_a_ring]]\n# Meta\n## References\n\n## ... | Let $n \geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\mathbb{Z}/n\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n | NOT #_meta/concept | NOT #_meta/exercise | IS #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
3 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Exercise 1 | ---\ncssclass: clean-embeds\naliases: [reference_with_tag_labels_Z_nZ_is_a_ring]\ntags: [_meta/literature_note, _meta/exercise]\n---\n# $\mathbb{Z}/n\mathbb{Z}$ is a ring[^1]\n\nShow that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Exercise 1 | Show that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n | NOT #_meta/concept | IS #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
4 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Theorem 1 | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/concept, _meta/proof]\n---\n# The polynomial ring of a UFD is a UFD[^1]\n\nTheorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Theorem 1 | Theorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n | IS #_meta/concept | NOT #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | IS #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
5 rows × 21 columns
Train language model
To train the language model, we need to create the TextDataLoaders
object. For our purposes, the column containing the input is the Processed note content
column.
= TextDataLoaders.from_df(
dls_lm =folder / 'information_note_type', text_col='Processed note content', bs=16,
df, path=0.2, is_lm=True, seq_len=256) # label_col='#_meta/definition' for later
valid_pct=3) dls_lm.show_batch(max_n
Due to IPython and Windows limitation, python multiprocessing isn't available now.
So `n_workers` has to be changed to 0 to avoid getting stuck
text | text_ | |
---|---|---|
0 | xxbos xxmaj xxunk xxunk . xxmaj let $ xxunk xxunk a xxup xxunk | xxmaj xxunk xxunk . xxmaj let $ xxunk xxunk a xxup xxunk . |
1 | . xxmaj xxunk $ xxunk is a xxup xxunk . xxunk xxmaj xxunk | xxmaj xxunk $ xxunk is a xxup xxunk . xxunk xxmaj xxunk . |
2 | . xxmaj let $ xxunk , xxunk \ in xxmaj xxunk and xxunk | xxmaj let $ xxunk , xxunk \ in xxmaj xxunk and xxunk that |
We now create the TextLearner
object, which is the language model.
Invoking to_fp16
on the Learner lets the model train with 16-bit floats (as opposed to 32-bit floats) and thus memory usage is (in theory) halved. Alternatively, training with 16-bit floats allows one to double the size of the model and double the batch size for training.
= language_model_learner(
learn =0.3,
dls_lm, AWD_LSTM, drop_mult=[accuracy, Perplexity()]).to_fp16() metrics
Training an ML model can take a long time, depending on the amount of data with which to train. As such, it can be useful to save your model and/or its weights in between training cycles. The below demonstrates two ways to load weights:
- The first way assumes that the weights to load are from a model trained with the same dataset as
learn
.learn.load
just loads these weights tolearn
. - The second way is for a model trained with a different dataset. A model trained with a different dataset is likely to have a different vocabulary, so the
pretrained_fnames
argument contains this different vocabulary to take into account when loading weights.
If such weights are not loaded to the model learn
, then learn
is initially a language model that was pretrained on Wikipedia, cf. fast.ai on the ULMFiT approach
# Only use the below code if you want to continue training a model of the same dataset.
# TODO: test the below code and see where the weights have to be.
# learn.load('information_note_type_LM_weights')
# only use the below code if you want to use weights from another pretrained model (whose data points).
# are different from those used to build dls_lm.
# TODO: think about how to set up vocab.
# vocab= load_learner(folder / 'information_note_type' / 'models' / 'information_note_type_LM.pkl').dls.vocab
# pretrained_fnames = ['information_note_type_LM_weights', 'information_note_type_LM_vocab']
# learn = language_model_learner(
# dls_lm, AWD_LSTM, drop_mult=0.3,
# metrics=[accuracy, Perplexity()], pretrained_fnames=pretrained_fnames)
# learn = learn.to_fp16()
Now we train the model:
# Feel free to run the below code multiple times
# Get a good learning rate
# lrfind = learn.lr_find()
# The code below should be uncommented when actually training.
# Note that `learn` is frozen except for this last layer
# Feel free to change up the number of cycles.
# learn.fit_one_cycle(3, lrfind.valley)
# Save the weights; note that this overwrites the existing weights file.
# learn.save('information_note_type_LM_weights')
Now we train the model unfrozen
learn.unfreeze()
# Feel free to run the below code multiple times
# Get a good learning rate
# lrfind = learn.lr_find()
# Note that `learn` is frozen except for this last layer
# Feel free to change up the number of cycles.
# learn.fit_one_cycle(3, lrfind)
There are some components of learn
that can be worthwhile to save;
- An encoder - as explained above, the encoder of the language model is used to then train the categorization model.
learn.dls.vocab
- as explained above, the vocabulary of the model needs to be preserved if the weights oflearn
are to be used in a new language model that uses a different dataset.learn
itself - This can be done withlearn.export
.
# learn.save_encoder('information_note_type_LM_encoder')
# save_pickle(folder / 'information_note_type' / 'models' / 'information_note_type_LM_vocab.pkl', learn.dls.vocab)
# learn = learn.to_fp32()
# learn.export('information_note_type_LM')
Testing the language model
# TEXT = ''
# N_WORDS = 20
# N_SENTENCES = 5
# preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)]
# print('\n'.join(preds))
Train classification model
Now we use the encoder for the language model that we trained to train a classification model.
We first process the DataFrame to combine the labels into a single column:
# Replace this file with the appropriate file
= _test_directory() / 'ml_examples' / 'information_note_type_labels.csv'
csv_file = _test_directory() / 'ml_examples'
folder
= pd.read_csv(csv_file)
df df
Time added | Time modified | Note name | Full note content | Processed note content | #_meta/concept | #_meta/exercise | #_meta/definition | #_meta/example | #_meta/narrative | ... | #_meta/proof | #_meta/remark | #_meta/TODO/split | #_meta/TODO/merge | #_meta/TODO/delete | #_meta/hint | #_meta/how_to | #_meta/conjecture | #_meta/convention | #_meta/context | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_something_something | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/narrative]\n---\n# Topic[^1]\n\nIn this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n\n\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Page 1 | In this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n | NOT #_meta/concept | NOT #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | IS #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
1 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Definition 1 | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition_note]\n---\n# Ring[^1]\n\nA **ring** is a set with binary operators $+$ and $\cdot$ such that ...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Definition 1 | A ring is a set with binary operators $+$ and $\cdot$ such that ...\n | NOT #_meta/concept | NOT #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
2 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Definition 2 | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition, _meta/notation]\n---\n# Ring of integers modulo $n$[^1]\n\nLet $n \geq 1$ be an integer. The **ring of integers modulo $n$**, denoted by **$\mathbb{Z}/n\mathbb{Z}$**, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n\n\n# See Also\n- [[reference_with_tag_labels_Exercise 1|reference_with_tag_labels_Z_nZ_is_a_ring]]\n# Meta\n## References\n\n## ... | Let $n \geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\mathbb{Z}/n\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n | NOT #_meta/concept | NOT #_meta/exercise | IS #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
3 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Exercise 1 | ---\ncssclass: clean-embeds\naliases: [reference_with_tag_labels_Z_nZ_is_a_ring]\ntags: [_meta/literature_note, _meta/exercise]\n---\n# $\mathbb{Z}/n\mathbb{Z}$ is a ring[^1]\n\nShow that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Exercise 1 | Show that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n | NOT #_meta/concept | IS #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | NOT #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
4 | 2023-01-13T00:37 | 2023-01-13T00:37 | reference_with_tag_labels_Theorem 1 | ---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/concept, _meta/proof]\n---\n# The polynomial ring of a UFD is a UFD[^1]\n\nTheorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Theorem 1 | Theorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n | IS #_meta/concept | NOT #_meta/exercise | NOT #_meta/definition | NOT #_meta/example | NOT #_meta/narrative | ... | IS #_meta/proof | NOT #_meta/remark | NOT #_meta/TODO/split | NOT #_meta/TODO/merge | NOT #_meta/TODO/delete | NOT #_meta/hint | NOT #_meta/how_to | NOT #_meta/conjecture | NOT #_meta/convention | NOT #_meta/context |
5 rows × 21 columns
# Make a list of columns
# Change the indices here if the labels change.
= list(df.columns[5:-1])
label_cols
# Create a column of texts which has a list of all the categories.
# When no tag is applicable, let's label it with `'NO TAG'`
def row_to_single_label(row, label_cols):
= np.array([i for i, tag_info in enumerate(row) if tag_info.startswith('IS')])
indices if len(indices) == 0:
return 'NO_TAG'
return ';'.join([label_cols[i] for i in indices])
= df[label_cols].apply(lambda row: row_to_single_label(row, label_cols), axis = 1)
labels "Single label"] = labels df[
We now create the DataLoaders using the DataBlock API
= DataBlock(blocks = (TextBlock.from_df(text_cols = 'Processed note content', seq_len=256),
dls_blk
MultiCategoryBlock),= ColReader(cols = 'text'),
get_x = ColReader(cols = 'Single label', label_delim = ';'),
get_y = TrainTestSplitter(test_size = 0.2, random_state = None))
splitter = dls_blk.dataloaders(df, bs=32, seed=None) dls_clf
Due to IPython and Windows limitation, python multiprocessing isn't available now.
So `n_workers` has to be changed to 0 to avoid getting stuck
=3) dls_clf.show_batch(max_n
= text_classifier_learner(dls_clf, AWD_LSTM, drop_mult=0.5, metrics=accuracy_multi).to_fp16() learn_clf
In the above line, - AWD_LSTM
is the model architecture for the learner. - drop_mult
specifies the % in dropout layer for regularization.
Before exporting the model, convert the learner learn_clf
back to full-precision via the .to_fp32()
method.
# Uncomment the below code to actually load the encoder.
# learn_clf.load_encoder(folder / 'information_note_type' / 'models' / 'information_note_type_encoder') # Load the previously trained language model.
We train the model by unfreezing one layer at a time
# lrfind = learn_clf.lr_ifnd()
# learn_clf.fit_one_cycle(1, lrfind.valley)
# learn_clf.freeze_to(-2)
# lrfind = learn_clf.lr_find()
# lrfind
# learn_clf.fit_one_cycle(1, lrfind.valley)
# learn_clf.freeze_to(-3)
# lrfind = learn_clf.lr_find()
# lrfind
# learn_clf.fit_one_cycle(1, lrfind.valley)
Try out results
# n = random.randint(0, len(df)-1)
# stringy = df['Processed note content'][n]
# print(stringy)
# print(learn_clf.predict(stringy))
# print(df['Single label'][n])
#for df[:5]