This is the fifth and last part of the series and probably the most important of all. Processes are what deliver the Strategy component with the help of the three others. Any problems with Processes will affect the other four. We are not only talking about the actual process of creating and maintaining machine learning models but also prioritising and making decisions about these workflows.
Where are ML processes coming from?
ML is a new area, and most practitioners freshly arrive from another field like software engineering, data engineering, business intelligence, academia or straight out of higher education. ML, as of today, doesn’t have a standard way of “doing things”. If you don’t have standards, people will make up their own based on their experience, which can lead to applying principles that are valid in their own field but not in Machine Learning.
I will be briefly touching on some of the problems with each to demonstrate how although very useful in their fields; these processes can not be carried over:
Software engineering: Agile/CICD development is all the rage nowadays, and many tried to apply it to ML without modification. The main problem is that software engineering has a very short time horizon and is not concerned with the past, unlike any ML workflow.
Data Engineering: With the emergence of Big Data, data engineering figured out best practices. This relies on executing numerous relatively fixed but interdependent steps in the form of DAGs. This is the primary workflow for ML at the moment, but the fewer but more dynamic and varied steps make the use of static DAGs inconvenient.
Business Intelligence: BI supports the business with analysis and visualisations. They are not overly concerned about how this is done, just the end result that matters. Productionised ML requires robustness as it is directly exposed to users, which is the major deficiency of these workflows.
DevOps practises: Maintaining scalable infrastructure is no small feat. DevOps is a mature area with many success stories but are these applicable to ML? ML infrastructure built by DevOps principles reduces the dynamism and ease of experimentation for a fresh and dynamic paradigm.
Academic practises: High-end research is one of the catalysts of ML. Despite this, their motivations are radically different from productionised ML. “Publish or perish” pushes you to chase SOTA performance with no regard to robustness as there is little longevity in the solutions after publication.
Educational practises: When you start your career, all you have is your education and a large amount of motivation. And we all know if you only have a hammer, everything looks like a nail. University education only covers a part of the toolkit of productionised ML (the part that can be taught in a classroom) and patches over the rest (data and code quality, workflow, feedback, etc.). This needs to be fixed on the job and not just carried on as it is.
What’s next
Establishing new processes in any field is hard. You need to try new ideas that likely won’t work while still doing them long enough so you can have enough evidence to reject their further usage.
Ultimately you need to break down the entire workflow into constituent pieces and first principles based on your strategy and technical constraints. Then iterate potential solutions for each piece while validating if these first principles and benefits are observed. If not, update your hypothesis while maintaining business continuity.
In the next period, I will be writing more about potential solutions. This series has now come to an end, and please follow me if you would like to read my next posts on the topic.
If you liked this post take a look at yesterday's post about productivity issues related to metrics:
https://laszlo.substack.com/p/causes-of-machine-learnings-productivity-5c7