markdown.obisidian.personal.machine_learning.database_update

Functions relevant to gathering data for machine learning

trouver

from fastcore.test import *

source

max_ID

 max_ID (df:pandas.core.frame.DataFrame)

Return the largest value of ID in df

df = pd.DataFrame([{'ID': 1}, {'ID': 2}, {'ID': 4}])
test_eq(max_ID(df), 4)

source

append_to_database

 append_to_database (file:os.PathLike,
                     df_to_append:pandas.core.frame.DataFrame,
                     cols:list[str], pivot_column:str,
                     columns_to_update:list[str], backup:bool=True)

Either create a csv file containing data or append to an existing csv file.

The columns of the database file should contain:

  • Time added - The time when the row was added.
  • Time modified - The time when the labels of the row was derived.

All timestamps are in UTC time and specify time to minutes (i.e. no seconds/microseconds).

If df_to_append contains a row with the same entry in pivot_column as an existing row, then only a single row with the entry in pivot_column should remain and the reset of the row should be updated appropriately.

Type Default Details
file PathLike The path to a csv file
df_to_append DataFrame The DataFrame to append
cols list All columns of the dataframe in the order that they should be presented.
pivot_column str If there are multiple rows with the same entry in this column, then only a single row with that entry should remain and that row should be updated appropraitely.
columns_to_update list The columns that should be updated if there are multiple rows with the same entry in the column pivot_column.
backup bool True If True, makes a copy of file in the same directory and with the same name, except with an added extension of .bak.
Returns None
# TODO: test
# TODO: test that rows are properly apended.