from fastcore.test import *
markdown.obisidian.personal.machine_learning.database_update
Functions relevant to gathering data for machine learning
trouver
max_ID
max_ID (df:pandas.core.frame.DataFrame)
Return the largest value of ID
in df
= pd.DataFrame([{'ID': 1}, {'ID': 2}, {'ID': 4}])
df 4) test_eq(max_ID(df),
append_to_database
append_to_database (file:os.PathLike, df_to_append:pandas.core.frame.DataFrame, cols:list[str], pivot_column:str, columns_to_update:list[str], backup:bool=True)
Either create a csv
file containing data or append to an existing csv
file.
The columns of the database file should contain:
Time added
- The time when the row was added.Time modified
- The time when the labels of the row was derived.
All timestamps are in UTC time and specify time to minutes (i.e. no seconds/microseconds).
If df_to_append
contains a row with the same entry in pivot_column
as an existing row, then only a single row with the entry in pivot_column
should remain and the reset of the row should be updated appropriately.
Type | Default | Details | |
---|---|---|---|
file | PathLike | The path to a csv file | |
df_to_append | DataFrame | The DataFrame to append | |
cols | list | All columns of the dataframe in the order that they should be presented. | |
pivot_column | str | If there are multiple rows with the same entry in this column, then only a single row with that entry should remain and that row should be updated appropraitely. | |
columns_to_update | list | The columns that should be updated if there are multiple rows with the same entry in the column pivot_column . |
|
backup | bool | True | If True , makes a copy of file in the same directory and with the same name, except with an added extension of .bak . |
Returns | None |
# TODO: test
# TODO: test that rows are properly apended.