Being an applied data scientist in SAP for more than 3 years, I think a lot about some core problems around “how to successfully ship a machine learning model in products”. Undeniable, model performance and engineering quality are fundamentals. However, to deeply understand key differences between machine learning in production and that in research and to own a clear concept of how to design machine learning system will guide us to bring true value of ML to enterprise users and customers.
I never tell people that I am doing “artificial intelligence”. Maybe AI can be a big word that “hooks” some people, but that is not what I believe in. We are not building “rockets” or “robots”. In one aspect, we are just data miners that taking advantage of big data to find some patterns that may figure out something they previously ignore, which is called “
ML Insight”; in another, we try to automate and accelerate business processing based on “software and algorithms”, which is called “
Automation”. Therefore, in my dictionary, AI is short for “Automated Intelligence”. There is a famous phrase in data science circle – “most models are wrong, but some are useful” - which always reminds me of what I am working on day by day -
I never believe that “ML models” can do everything, but even if it can do something, it will make a big difference.
Now end of fuzzy words, following I would like to share some points that I have learned from past work. Open to community discussion.
Business Purpose First
First of all, everything starts from a “business purpose”. It is a business problem that gives birth to a ML feature, not in other direction. (Here we are not talking about fundamental science that may solve human issues in the long term; business technology plays in a different rule.) Therefore, if the problem and purpose is vague, so will be the model results. We always ask - “why are we doing this and what is this model actually going to predict?” For examples - Why are we sorting lead pipeline? Why are we going to summarize text in incident? Why are we going to detect sentiment in ticket? Why are we building knowledge base in service center? And more.
Industrial Innovation
Next, we map the problem to the existing technology and ask – “is this technology mature enough and robust enough?”. Applied Scientists are not exactly like academic researchers that always pursue for the newest ideas and models with record-breaking accuracy. Of course, it doesn’t stop us from “innovation”.
But “innovation” is not at the cost of “being non-stable and non-useful”. We take full responsibility of model output once the model is running in our products. If we can solve the problem using old but mature models, we don’t jump into any fancies but also complicated models.
But we don’t just ship the model in its original way, we need to reshape it, customize it and even rebuild it. That is truly “industrial innovation”.
Cost Vs Value
ML can be very expensive. Nowadays almost everyone heard of AI chips, from GPU to TPU, ASIC to FPGA. Lots of tech giants dive into this field. I may not be sharp enough to predict the future, but currently we have to admit chips are still expensive and so are the model training and serving. Cost is not a big problem when doing research or in demo stage. However, when models are in production, we have to calculate and reduce each penny of cost. Some popular models like BERT or GPT in NLP are really doing great job in accuracy; however, we will run out of budget if to apply them everywhere. Choosing between “simple” and “complicated”, we always think of
Ockham's Razor. If a cheap model can perform 90 percentage accuracy of an expensive one, why shall we burn more money?
Scalable and Auto ML
In cloud, everything shall be scalable and automated, so are machine learning models. One of the major differences when we step into cloud ML, we no longer serve as consultants to a single customer; instead we are expected to satisfy all cloud tenants. Otherwise, ML is still trapped in “on-premise” pattern. Therefore, our solutions need to adapt to most of customers, which also requires ML pipeline to be truly automated. I strongly believe in the future of AutoML and it will be a winning point in cloud ML. It is not an easy task at this moment, but it is always the goal when we design and customize models and pipelines.
Evaluation Metrics
In job interviews, data scientists usually will be asked on metrics questions such as what is “precision”, “recall” and “F1 score”. When such metrics combined with business purposes, there is not absolutely good or bad. When compare different models, we choose between higher “precision” or higher “recall” based on demands and resources. There will be a
cost metrics coming in the picture to adjust original confusion matrix. If we cannot tolerate false positives (i.e. in fraud detection, model predicts it is a fraud while actually it is not, aka false alarm), we choose one with higher precision. However, sometimes if we have enough resources to tolerate false alarms and want no missing of any fraud, we prefer a model with higher “recall”. Such trade-offs are common in most fields like in healthcare when ML help doctors to detect illness or in social media when Facebook detect fake news.
Error Analysis
Models will make mistakes, so do human beings. If we hope model to perform better in apply or inference time, we force models to learn from errors, the same as human beings. It is far from done when we ship a model in production. It is just the beginning of model lifecycle. Model will get better chances to grow and evolve after it is running on real production data. Sometimes we develop methods of online learning when we need model to adapt to new dataset and learn new patterns. But sometimes, we can apply offline analysis about the errors that models made. If possible, we can also collect feedback from end users. Like in some “recommendation systems”, we don’t know result correctness until users accept or reject.
Model Explanation
Most ML models are black box, but that’s also the reason why we need to make the model output visible and explainable. Some models and algorithms can be explained such as linear regression, decision trees. However, most models don’t have such capability. Researchers also work hard to unbox models including neural networks. In one hand, if we can leverage feature contributions to explain model results, that will be easier. Such as in linear regression, features weights are available and adding some regularization methods can help. But in other hands, we can also look at training data pool to find similar data records and summarize patterns and rules to explain why model predict such result in new data. After all, ML is “pattern recognition” and many of new data may already appear in the history data. Although it requires a bit of manual efforts.
I may still miss lots of important points of machine learning in production. For example, data privacy, model bias, customer adoption and etc. After all it is a huge topic. Here I just share what I am more familiar with. If you are interested in this topic, welcome to discuss or read more from community.
End.
Addition Readings Suggested:
AI competitions don’t produce useful models
Choosing the Right Metric for Evaluating Machine Learning Models