A Machine Learning excercise.

ABOUT

The objective was to predict the revenue of shops. We had 5 hours to deliver the results

Group of 2: B.M.Pardelhas, Lucie F

Code is visible on Github

DATA

Dataset was given during class. (640840 rows X 9 columns)

MAIN STEPS

  • Dataset exploration
  • Data cleaning
  • Selecting the model
  • Trainning + testing the model
  • Improving Predictions, Feature engineering
  • Delivering the results

TECHNIQUES AND TOOLS

  • Data visualization : correlation matrix, heatmap, pairplots - [Matplotlib, Seaborn]
  • Pycaret
  • Model : xgboost (extreme gradient boosting)

image

RESULTS

image image

IMPROVEMENTS

To get better predictions, we should have trained the model on the opening days only. Separating stores according to size (large, medium, small), and flagging december and summer months can help improve the score also.