how_to.train_ml_model.fastai

A demonstration on using fast.ai to train an ML model

import pandas as pd
from fastai.text.all import *
from trouver.helper.tests import _test_directory

import warnings

# A lot of warnings can be printed out
warnings.filterwarnings("ignore")

General

We use the ULMFiT approach (cf. fast.ai documentation, fast.ai tutorial on text transfer learning) to train a text categorization model.

This approach first trains a language model from the corpus of text in the data and then uses the encoder - the model not including the final layer - of this language model to then train the classification model.

# This line is only present for the sake of this example so that the code in this notebook can run even
# if the device does not have a gpu.
device = torch.device("cpu")

Train information note type model

The append_to_information_note_type_database function in markdown.obsidian.personal.machine_learning.information_note_types creates/appends to a csv file containing data of note types.

# Replace this file with the appropriate file
csv_file = _test_directory() / 'ml_examples' / 'information_note_type_labels.csv'
folder = _test_directory() / 'ml_examples'

df = pd.read_csv(csv_file)
df

	Time added	Time modified	Note name	Full note content	Processed note content	#_meta/concept	#_meta/exercise	#_meta/definition	#_meta/example	#_meta/narrative	...	#_meta/proof	#_meta/remark	#_meta/TODO/split	#_meta/TODO/merge	#_meta/TODO/delete	#_meta/hint	#_meta/how_to	#_meta/conjecture	#_meta/convention	#_meta/context
0	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_something_something	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/narrative]\n---\n# Topic[^1]\n\nIn this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n\n\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Page 1	In this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n	NOT #_meta/concept	NOT #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	IS #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
1	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Definition 1	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition_note]\n---\n# Ring[^1]\n\nA ring is a set with binary operators $+$ and $\cdot$ such that ...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Definition 1	A ring is a set with binary operators $+$ and $\cdot$ such that ...\n	NOT #_meta/concept	NOT #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
2	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Definition 2	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition, _meta/notation]\n---\n# Ring of integers modulo $n$[^1]\n\nLet $n \geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\mathbb{Z}/n\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n\n\n# See Also\n- [[reference_with_tag_labels_Exercise 1\|reference_with_tag_labels_Z_nZ_is_a_ring]]\n# Meta\n## References\n\n## ...	Let $n \geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\mathbb{Z}/n\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n	NOT #_meta/concept	NOT #_meta/exercise	IS #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
3	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Exercise 1	---\ncssclass: clean-embeds\naliases: [reference_with_tag_labels_Z_nZ_is_a_ring]\ntags: [_meta/literature_note, _meta/exercise]\n---\n# $\mathbb{Z}/n\mathbb{Z}$ is a ring[^1]\n\nShow that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Exercise 1	Show that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n	NOT #_meta/concept	IS #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
4	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Theorem 1	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/concept, _meta/proof]\n---\n# The polynomial ring of a UFD is a UFD[^1]\n\nTheorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Theorem 1	Theorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n	IS #_meta/concept	NOT #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	IS #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context

5 rows × 21 columns

Train language model

To train the language model, we need to create the TextDataLoaders object. For our purposes, the column containing the input is the Processed note content column.

dls_lm = TextDataLoaders.from_df(
    df, path=folder / 'information_note_type', text_col='Processed note content', bs=16,
    valid_pct=0.2, is_lm=True, seq_len=256)  #  label_col='#_meta/definition' for later
dls_lm.show_batch(max_n=3)

Due to IPython and Windows limitation, python multiprocessing isn't available now.
So `n_workers` has to be changed to 0 to avoid getting stuck

	text	text_
0	xxbos xxmaj xxunk xxunk . xxmaj let $ xxunk xxunk a xxup xxunk	xxmaj xxunk xxunk . xxmaj let $ xxunk xxunk a xxup xxunk .
1	. xxmaj xxunk $ xxunk is a xxup xxunk . xxunk xxmaj xxunk	xxmaj xxunk $ xxunk is a xxup xxunk . xxunk xxmaj xxunk .
2	. xxmaj let $ xxunk , xxunk \ in xxmaj xxunk and xxunk	xxmaj let $ xxunk , xxunk \ in xxmaj xxunk and xxunk that

We now create the TextLearner object, which is the language model.

Invoking to_fp16 on the Learner lets the model train with 16-bit floats (as opposed to 32-bit floats) and thus memory usage is (in theory) halved. Alternatively, training with 16-bit floats allows one to double the size of the model and double the batch size for training.

learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3,
    metrics=[accuracy, Perplexity()]).to_fp16()

Training an ML model can take a long time, depending on the amount of data with which to train. As such, it can be useful to save your model and/or its weights in between training cycles. The below demonstrates two ways to load weights:

The first way assumes that the weights to load are from a model trained with the same dataset as learn. learn.load just loads these weights to learn.
The second way is for a model trained with a different dataset. A model trained with a different dataset is likely to have a different vocabulary, so the pretrained_fnames argument contains this different vocabulary to take into account when loading weights.

If such weights are not loaded to the model learn, then learn is initially a language model that was pretrained on Wikipedia, cf. fast.ai on the ULMFiT approach

# Only use the below code if you want to continue training a model of the same dataset.
# TODO: test the below code and see where the weights have to be. 

# learn.load('information_note_type_LM_weights')


# only use the below code if you want to use weights from another pretrained model (whose data points).
# are different from those used to build dls_lm.

# TODO: think about how to set up vocab.
# vocab=  load_learner(folder / 'information_note_type' / 'models' / 'information_note_type_LM.pkl').dls.vocab

# pretrained_fnames = ['information_note_type_LM_weights', 'information_note_type_LM_vocab']
# learn = language_model_learner(
#     dls_lm, AWD_LSTM, drop_mult=0.3,
#     metrics=[accuracy, Perplexity()], pretrained_fnames=pretrained_fnames)
# learn = learn.to_fp16()

Now we train the model:

# Feel free to run the below code multiple times

# Get a good learning rate
# lrfind = learn.lr_find()

# The code below should be uncommented when actually training.
# Note that `learn` is frozen except for this last layer
# Feel free to change up the number of cycles.

# learn.fit_one_cycle(3, lrfind.valley)

# Save the weights; note that this overwrites the existing weights file.
# learn.save('information_note_type_LM_weights')

Now we train the model unfrozen

learn.unfreeze()

# Feel free to run the below code multiple times

# Get a good learning rate
# lrfind = learn.lr_find()

# Note that `learn` is frozen except for this last layer
# Feel free to change up the number of cycles.
# learn.fit_one_cycle(3, lrfind)

There are some components of learn that can be worthwhile to save;

An encoder - as explained above, the encoder of the language model is used to then train the categorization model.
learn.dls.vocab - as explained above, the vocabulary of the model needs to be preserved if the weights of learn are to be used in a new language model that uses a different dataset.
learn itself - This can be done with learn.export.

# learn.save_encoder('information_note_type_LM_encoder')
# save_pickle(folder / 'information_note_type' / 'models' / 'information_note_type_LM_vocab.pkl', learn.dls.vocab)
# learn = learn.to_fp32()  
# learn.export('information_note_type_LM')

Testing the language model

# TEXT = ''
# N_WORDS = 20
# N_SENTENCES = 5
# preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)]

# print('\n'.join(preds))

Train classification model

Now we use the encoder for the language model that we trained to train a classification model.

We first process the DataFrame to combine the labels into a single column:

# Replace this file with the appropriate file
csv_file = _test_directory() / 'ml_examples' / 'information_note_type_labels.csv'
folder = _test_directory() / 'ml_examples'

df = pd.read_csv(csv_file)
df

	Time added	Time modified	Note name	Full note content	Processed note content	#_meta/concept	#_meta/exercise	#_meta/definition	#_meta/example	#_meta/narrative	...	#_meta/proof	#_meta/remark	#_meta/TODO/split	#_meta/TODO/merge	#_meta/TODO/delete	#_meta/hint	#_meta/how_to	#_meta/conjecture	#_meta/convention	#_meta/context
0	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_something_something	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/narrative]\n---\n# Topic[^1]\n\nIn this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n\n\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Page 1	In this chapter, we describe some basics of ring theory. Rings are mathematical structures which generalize the structures of the familiar integers, rational numbers, real numbers, complex numberes, etc.\n	NOT #_meta/concept	NOT #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	IS #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
1	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Definition 1	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition_note]\n---\n# Ring[^1]\n\nA ring is a set with binary operators $+$ and $\cdot$ such that ...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Definition 1	A ring is a set with binary operators $+$ and $\cdot$ such that ...\n	NOT #_meta/concept	NOT #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
2	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Definition 2	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/definition, _meta/notation]\n---\n# Ring of integers modulo $n$[^1]\n\nLet $n \geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\mathbb{Z}/n\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n\n\n# See Also\n- [[reference_with_tag_labels_Exercise 1\|reference_with_tag_labels_Z_nZ_is_a_ring]]\n# Meta\n## References\n\n## ...	Let $n \geq 1$ be an integer. The ring of integers modulo $n$, denoted by $\mathbb{Z}/n\mathbb{Z}$, is, informally, the ring whose elements are represented by the integers with the understanding that $0$ and $n$ are equal.\n\nMore precisely, $\mathbb{Z}/n\mathbb{Z}$ has the elements $0,1,\ldots,n-1$.\n\n...\n	NOT #_meta/concept	NOT #_meta/exercise	IS #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
3	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Exercise 1	---\ncssclass: clean-embeds\naliases: [reference_with_tag_labels_Z_nZ_is_a_ring]\ntags: [_meta/literature_note, _meta/exercise]\n---\n# $\mathbb{Z}/n\mathbb{Z}$ is a ring[^1]\n\nShow that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Exercise 1	Show that $\mathbb{Z}/n\mathbb{Z}$ is a ring.\n	NOT #_meta/concept	IS #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	NOT #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context
4	2023-01-13T00:37	2023-01-13T00:37	reference_with_tag_labels_Theorem 1	---\ncssclass: clean-embeds\naliases: []\ntags: [_meta/literature_note, _meta/concept, _meta/proof]\n---\n# The polynomial ring of a UFD is a UFD[^1]\n\nTheorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n\n# See Also\n\n# Meta\n## References\n\n## Citations and Footnotes\n[^1]: Kim, Theorem 1	Theorem 1. Let $R$ be a UFD. Then $R[x]$ is a UFD.\n\nProof. Let $f,g \in R[x]$ and suppose that $fg = 0$. Write $f = \sum_{i=0}^n a_i x^i$ and $g = \sum_{j=0}^m b_j x^j$ for some $a_i,b_j \in R$.\n\n...\n	IS #_meta/concept	NOT #_meta/exercise	NOT #_meta/definition	NOT #_meta/example	NOT #_meta/narrative	...	IS #_meta/proof	NOT #_meta/remark	NOT #_meta/TODO/split	NOT #_meta/TODO/merge	NOT #_meta/TODO/delete	NOT #_meta/hint	NOT #_meta/how_to	NOT #_meta/conjecture	NOT #_meta/convention	NOT #_meta/context

5 rows × 21 columns

# Make a list of columns
# Change the indices here if the labels change.
label_cols = list(df.columns[5:-1])

# Create a column of texts which has a list of all the categories.
# When no tag is applicable, let's label it with `'NO TAG'`
def row_to_single_label(row, label_cols):
    indices = np.array([i for i, tag_info in enumerate(row) if tag_info.startswith('IS')])
    if len(indices) == 0:
        return 'NO_TAG'
    return ';'.join([label_cols[i] for i in indices])

labels = df[label_cols].apply(lambda row: row_to_single_label(row, label_cols), axis = 1)
df["Single label"] = labels

We now create the DataLoaders using the DataBlock API

dls_blk = DataBlock(blocks = (TextBlock.from_df(text_cols = 'Processed note content', seq_len=256),
                              MultiCategoryBlock),
                    get_x = ColReader(cols = 'text'),
                    get_y = ColReader(cols = 'Single label', label_delim = ';'),
                    splitter = TrainTestSplitter(test_size = 0.2, random_state = None))
dls_clf = dls_blk.dataloaders(df, bs=32, seed=None)

Due to IPython and Windows limitation, python multiprocessing isn't available now.
So `n_workers` has to be changed to 0 to avoid getting stuck

dls_clf.show_batch(max_n=3)

learn_clf = text_classifier_learner(dls_clf, AWD_LSTM, drop_mult=0.5, metrics=accuracy_multi).to_fp16()

In the above line, - AWD_LSTM is the model architecture for the learner. - drop_mult specifies the % in dropout layer for regularization.

Before exporting the model, convert the learner learn_clf back to full-precision via the .to_fp32() method.

# Uncomment the below code to actually load the encoder.

# learn_clf.load_encoder(folder / 'information_note_type' / 'models' / 'information_note_type_encoder')  # Load the previously trained language model.

We train the model by unfreezing one layer at a time

# lrfind = learn_clf.lr_ifnd()
# learn_clf.fit_one_cycle(1, lrfind.valley)

# learn_clf.freeze_to(-2)
# lrfind = learn_clf.lr_find()
# lrfind
# learn_clf.fit_one_cycle(1, lrfind.valley)

# learn_clf.freeze_to(-3)
# lrfind = learn_clf.lr_find()
# lrfind
# learn_clf.fit_one_cycle(1, lrfind.valley)

Try out results

# n = random.randint(0, len(df)-1)
# stringy = df['Processed note content'][n]
# print(stringy)
# print(learn_clf.predict(stringy))
# print(df['Single label'][n])
#for df[:5]