dsci_310_group_11_pkg.preprocess

Module Contents

Functions

preprocessor(df, tort)

DESCRIPTION: Splits the dataset of the specified 'df' (dataframe) into

dsci_310_group_11_pkg.preprocess.preprocessor(df, tort)

DESCRIPTION: Splits the dataset of the specified ‘df’ (dataframe) into training and testing data, generates a ‘target’ variable for the ML model to classify given the value of the quality of each example.

INPUTS: df - a dataframe object that contains the entirety of the dataset, for splitting.

tort - a binary value (0, 1) that specifies whether to return the train or test dataframe.

ACTION: Splits the dataset of the specified ‘df’ (dataframe) into training and testing data, uses np.where to assign a 0 to the target column of examples that have quality < 5, assigns a 1 to the target column of examples that have quality > 5.

RETURNS: IF the function calls 0, then returns the training data, ELSE IF function calls, returns the testing data

TODO: Modularize param_grid values