Table of Contents

    Many data-driven organizations aspire to use robust, accurate predictive modeling for their business problems. But unfortunately, these models rarely come to fruition and remain on the drawing board (or their Jupyter notebooks). This typically happens when organizations misjudge the challenges that come with their machine learning model deployment into their production environment.

    While there is some room for error while integrating models into production environments, there is also a very good probability that these issues will eventually lead to disaster. And that’s exactly why we have created this pre-model deployment checklist. By implementing the steps mentioned in this guide, you can potentially avoid technical glitches and brand damage going ahead.

    1. Reproducibility

    Creating reproducible pipelines and visible workflows for your predictive modeling enables other members of your team (and other engineering teams) to easily understand the code and data you have used to generate your results. Additionally, it enables others to adjust or replicate your methods and results by expanding on your approach with minimal time and effort.

    The aspects that you should include in your reproducible pipelines are:

    • Data Collection
    • Data Pre-Processing
    • Feature Selection
    • Model Validation

    Chances are that you or someone on your team will want to apply changes to the model in the future; whether it’s retraining the model or fixing bugs and scalability issues. Being able to reproduce this journey is absolutely critical for deploying your model into production.

    2. Versioning

    Machine learning models are extremely delicate and sensitive to change. More often than not, tiny incremental changes you apply may cause the model performance to drop, making it to be pretty useless at making predictions. That’s where versioning comes in. It not only helps you retrace your steps in case something happens to your model, but it also allows you to ensure reproducibility. We love reproducibility!

    Chances are you are already using a versioning tool like Git to handle experimentation. Hence, it’s recommended to implement the same methodology for production models. Interesting companies and tools started to appear in the field of data science versioning (mainly adapting Git to data science use cases such as datasets versioning), but honestly – any normal Git provider will do.

    3. Re-Training and Feature Re-Evaluation

    Imagine this – you’re a data scientist working for an online company. You deploy a propensity model that activates different promotions and discounts, and it is so powerful that users are now buying way more than before. They are basically changing their behavior and interaction with the product. This means that the current data now looks completely different (distribution, volume, etc) and the patterns that the model has learned are no longer relevant.

    Retraining the model with new data is extremely important to avoid performance degradation.

    For models that are core to the business you’ll probably want to run an offline process that trains a new model in pre-defined time periods, typically days or weeks. When the offline model beats the current production model, you can send an alert notification for a human reviewer to review it and replace the model in production.

    Many data science teams tend to ignore or miss the fact that it’s not only the model that needs to change dynamically. Feature engineering, extraction methods, and even the data sources you are using can become less effective with time. All of these require constant monitoring and upkeep.

    Retraining a model on a daily basis with a scheduled script is relatively easy. Having to explore new feature extraction methods and new data sources every period of time, can become quite challenging. Given the fact that you want to scale the data science team to reach more than 3-5 use cases, you can’t possibly recreate the entire process of searching for the best data and the features for the job every time from scratch.

    Automated data science platforms that are capable of searching for features and automating the feature engineering process can allow the team to cover more use cases, while maintaining the accuracy and the robustness of your predictive models that are already in production.

    4. Auto-Scaling

    It is extremely important to make sure your model can scale to the required volume of predictions. The volume of required predictions can sometimes become unpredictable (e.g. online users). Also, peaks in traffic may cause server overload when the system is static. However, the wave of containers and serverless infrastructure can allow the building of a flexible auto-scaling mechanism for models, which helps preserve availability with wasted resources kept to a minimum.

    5. Stress Testing and Data Edge-Case Testing

    There are 2 tests that need to be conducted before deploying your model into the production environment:

    a. Stress Testing – Make sure to test your model’s performance and latency with higher volume, by at least one additional order of magnitude than you expect to encounter in production.

    b. Data Edge Case Testing – Try external numerical and categorical values to train set’s distribution, while making sure your system is robust enough for edge cases of data (e.g. a sample where 95% of the features are null). This will help you identify weak spots and write hard coded rules for cases where the sample is completely different from previous ones.

    6. Fallback and Failure Planning

    Production disasters can be caused by a wide range of issues. But first and foremost, it’s important to understand that every model is basically a piece of software. Additional factors that need to be monitored include changes in data schemes, distribution fluctuations, new version issues and more.

    Hence, it is important to have a backup plan when you are constantly monitoring and identifying bugs, exceptions or completely inaccurate predictions. This backup plan can be a simpler model, rule based predictions, or even rolling back to an older version of the model.

    7. A/B Testing (Competitor Model)

    Always try to keep a small percentage of the data going into a different (preferably simpler) model or to a manually developed rule engine to get different predictions from your current production model. This will “keep your model honest”, while ensuring that the feature tweaking was helpful and that the flashy new library you are using is actually performing better.

    This is obviously an order of magnitude more important if the model is part of a larger operation (e.g. models that change pricing, website changes given a model prediction, etc) where the company is probably already using A/B testing and experimentation mechanism.

    8. Monitor Data Drift / Model Drift

    Data is dynamic, but your model is probably not (except for online learning models). Data feeds often change due to the dynamic nature of businesses (changing customers, bugs in the ETL, etc.). There are also cases of human error. For example, a frontend developer can accidentally break the input form, which basically means that any feature generated from that user input will be populated by “null” values.

    Hence, its extremely crucial to monitor incoming data feeds to make sure the model is predicting based on the same (distribution of) data it has originally learned from. To be more practical, you can start with monitoring those 3 basic types of values in each prediction sample:

    • Numerical – Make sure there are no values outside the range of numerical values in the train set. You can always compare the Min, Max and Avg of the values per column in the last X samples the model predicted on against the Min, Max and Avg of the train set’s column.
    • Categorical – Make sure there are no new categorical values that didn’t appear in the train set. For example, if one feature column has US states, getting “Canada” in inference time becomes problematic.
    • Text – Monitor text length, language, most common words, and compare inference data against the train set data.

    9. Write Tests

    Just like any other piece of software, your model should be integrated into the test environment. This could help discover issues in real-time before they get into production. As mentioned earlier, if the frontend developer changed the form that the users are filling, new data is being fed into your model. This is one instance you’ll need to be aware of in advance.

    You need to write comprehensive tests to latch on to potential issues and also update them periodically to make sure you are on the top of things.

    All in all, smooth and error-free model deployment is a key factor when it comes to machine learning. The aforementioned check-list is a great way to get started, but you also need solid infrastructure, sound best practises, and seamless cross-department (data scientists, IT, devs and business figures) collaboration.

    Leapfrog your competition with our innovative automated machine learning platform for improved results