A Twitter thread has been doing the rounds recently about a couple of embarrassing failures of application of AI to COVID (lung CT) data. [link]
The referenced MIT Technology Review article is almost a year old. I still feel it is important to write about this from an ML product design perspective. [link]
Luckily the referenced Nature article is freely available so that one can work from their original sources, not just the cherry-picked examples. [link]
AI and COVID
Of course, when COVID hit, researchers tried to have a meaningful contribution under time pressure with limited resources. This was combined with their traditional research->publish->move on workflow.
The results can be seen in the article. It is not pretty at all.
Using publicly available data and off-the-shelf models, once a relatively high statistical performance was reached, they released a preprint and a "product".
These products were mostly unsuitable for clinical use.
I mean, who would rely on a product hacked together in a month to make calls about patients' lives??? This reminded me of Her Majesty's Government's great idea about DIY ventilators, another unserious effort.
Root cause analysis for lack of usability revealed the blunders listed in the article. (There are some scary points in the article about some of these are actually used in clinical settings and doctors are signed NDAs, but that's a different question)
So what is the takeaway?
First of all, "real" ML products are _not_ designed in this way.
ML products are usually not research questions but applied technology questions. They require a rigorous understanding of the entire problem and solution space rather than just chasing a headline metric with a novel method to be able to publish an article.
First of all, the headline metric is secondary. You actually need to solve the problem rather than optimising a proxy. This is probably the most overlooked aspect of ML product design. It's one thing that you tuned all the hyperparameters and get an F1 statistics worth working with, but then there will be dozens if not hundreds of other questions to justify that the problem was adequately solved.
Second, it is not always clear what the problem is to be solved. Often ML teams attempt to solve an incredibly complex problem at once with little preparation. Of course, they fail as they can't invest enough effort in decomposing the problem and tackling each challenge individually. The primary goal of a POC is to identify (partly through the hundreds of questions above) a slice of the problem space that can be or is likely to be economically solvable. Then use the benefits of this solution to invest in solving different, more difficult slices.
Is this all in POC?
Yes, this is still happening in the POC phase. The purpose of the POC is to have a good enough idea that further investment of resources into the solution is _economically_ feasible.
This doesn't mean to prove that the entire problem is solvable. It just means that the company can make money. This is a fundamental change in the mental model. Most energy early on in an ML product cycle is to figure out how to attribute economic value to each required step: training a new model, human-in-the-loop. Then figure out the value change and optimise the entire stack to make it feasible.
How about Deep Learning and MLOps?
It is important to mention that this has nothing to do with the "complexity" of the solution, another popular trope. It is perfectly fine to use deep techniques in a POC until there is enough understanding of the context so that the DSes know what they are doing.
It also doesn't say anything about productionisation and MLOps. From an ML product design perspective, these are engineering problems. They are assumed to be solvable, and their price is measurable (within reason).
So the goal is to find a portion of the problem that can be solved to make money then use this money to solve the rest of the problem until diminishing returns. The POC's job is to answer if this _likely_ can be done.