:paperclip: FreshPrep Order Rate Prediction Project

:paperclip: FreshPrep Order Rate Prediction Project

- 8 mins

UBC MDS Project

FreshPrep: Order Prediction Rate

Note: As part of my NDA and to protect the privacy of FreshPrep, confidential information is being withheld, such as descriptive statistics of actual data.

Video: A (fun) overview of the FreshPrep project and the dashboard

Background

FreshPrep is an innovative, eco-friendly Vancouver based meal kit company. They provide customers with a variety of quick to prepare healthy meal kit collections, for any diet. Customers are delivered meal kits consisting of proportioned and/or pre-prepared ingredients, along with an easy to follow recipe card for a quick meal preparation.

FreshPrep customers can classify themselves as either:

Active – orders are automatically billed from week to week.
Paused – orders are not automatically billed (i.e. they are skipped) from week to week.

FreshPrep orders can be classified as:

Billed – orders are charged and then delivered to the customer.
Skipped – orders are not charged or delivered to the customer.

bill-skip Fig 1: An example of a billed vs a skipped order for a particular week

Proposal

As Master of Data Science (MDS) students, our capstone team’s project was to build an effective model which would predict the probability of future customer orders. Specifically, which customers would bill orders and which would skip orders in the upcoming weeks. This was done by our model producing a probability for each active and paused customer. Any probability which was close to 1 indicated a customer is very likely to order, while a number close to 0 indicated a customer is very unlikely to order.

This information allowed FreshPrep to forecast which specific customer would be ordering and the total number of orders in the upcoming weeks. In addition, it also helped FreshPrep focus marketing on uncertain customers who hover around a 50% probability of ordering.

Our approach:

  1. Data Wrangling
  2. Exploratory Data Analysis (EDA)
  3. Feature Engineering
  4. Predictive Modeling
  5. Data Visualization/Product

workflow Fig 2: Project workflow

Data Wrangling
A substantial amount of data was provided to us within a PostgreSQL relational database. After our initial examination, we spent a significant amount of time creating a clean and relevant dataset within R for further exploration.

Exploratory Data Analysis (EDA)
Understanding FreshPrep’s business and data was extremely valuable in moving forward with our project. From our cleaned dataset, we uncovered many valuable insights for FreshPrep, specifically:

Feature Engineering
Engineering meaningful features for our model was a challenging area of our project and took a significant amount of time with the amount of data we had received. Our goal was to determine and/or create features which were the strongest predictors for forecasting customer order probabilities. Additionally, the insights discovered from our EDA were very helpful in engineering our final 14 predictive features. The features which were most predictive were:

Predictive Modeling
Our model answered 2 primary questions for FreshPrep. The simpler, ‘how many orders will there be each week?’ And the harder, ‘who will order each week?’ Our team undertook the latter harder question which also provided an answer to the former simpler question.

We chose Logistic Regression as our model for order predictions. This provided us with more interpretable regression weights as well as, more trustworthy probabilities than other models we tried, such as Random Forest. Additionally, this decision also proved to be instrumental in providing a rational which was comprehensible to FreshPrep of which factors drove the model’s predictions.

We trained 2 separate models, one for active customers and one for paused. Both models gave a probability of ordering for each customer. These probabilities are then summed up to provide the total number of expected orders for a given week. For active customers we used this value to give the number of predicted billed orders for a week. However, for paused customers we also used their most trusted behavior for this prediction. Specifically, any probabilities less than 0.5 (our threshold) were assigned a value of 0 (skipped order) while any probabilities greater than 0.5 were assigned a value of 1 (billed order). Similarly, these values were then summed up as the number of expected billed orders for a week for paused customers. This step was done as most customers had a paused status and adding up small probabilities (such as 0.01*1000) lead to overestimation of the number of predicted orders.

Our models could predict the total number of expected billed orders for up to three weeks out. This provided FreshPrep with the opportunity to plan their resources and marketing efforts more efficiently.

How did we do?
Our models preformed with a Mean Absolute Percentage Error (MAPE) of 4.61 on the total number of billed orders from June 2018 to June 2019. What does this mean? In a week where our model predicts 100 orders, the error can be up to +/- 5 orders. However, when we focused on orders only in 2019, giving the models more training data, they performed with a MAPE of 1.5. This would mean the models could predict an error of +/- 1.5 orders when predicting 100 billed orders for a week.

Data Visualization/Product
As part of our final deliverable, we developed an interactive Tableau dashboard for FreshPrep to visualize the predictions produced from our models. The dashboard also allowed FreshPrep to further ‘drill down’ on their data and the predictions for other business driven insights. For example, they could visualize customers in specific demographics, with certain diet restrictions (such as vegans) etc.

The complete project (documentation, scripts, Tableau framework etc.) was delivered to FreshPrep as a reproducible pipeline for easy installation and operation. This included a self-contained Docker image for easy portability and integration in to FreshPrep’s existing infrastructure.

dash1

dash2 Fig 3 and 4: FreshPrep dashboard examples

Potential Enhancements

If we had more time allocated for our capstone project there were a number of areas we could have further explored, such as:

Lessons Learned

Overall, I found the capstone project an enlightening experience in not only applying theoretical skills practically, but also how beneficial working with a team having a diverse set of skills enhances a data science project.

Acknowledgments

I’d like to thank my team members Hayley Boyce, Orphelia Ellogne and Rachel Riggs. Also, a special thank you to our faculty supervisor Micheal Gelbart and our partners from FreshPrep.

Mani Kohli

Mani Kohli

Data Scientist

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora