GitHub - thomaschoi143/ml-deployment-workshop

Creating ML Model

This section is going to demonstrate on how to source data on the web, cleaning of data in pandas dataframe using string manipulation and conversion of all attributes to numeric for model training with Sklearn.

Prerequisites

  1. Installation of Jupiter notebook (Install Anaconda https://www.anaconda.com/products/navigator)
  2. Access to website with databank to source from: https://www.kaggle.com/datasets, https://archive.ics.uci.edu/

Instructions

Improvement(advanced)

Instead of just removing data points with missing values, fill in the missing values with most common/average value of the missing attribute. For non-numeric attributes, this can be challenging as filling in the most common value doesn’t always makes sense. Research and manual input of such missing values could potentially be required.