According to the Gartner survey: “Through 2020, 80% of AI projects will remain alchemy, run by wizards whose talents will not scale in the organisation.” One of the primary reasons for the lack of understanding of machine learning in corporate environments is the detachment of business value and statistics. In my company we championed a simple framework that enabled all parties to develop a shared language to reason about the various scenarios around the topic.
Business leaders care about money and DSes care about statistical properties. Connecting these two are the key element of the framework. The underlying common cause of both of these metrics are the same: errors and their costs. Statistical models make two types of errors: False Positives (FP) and False Negatives (FN). False Positive is when something is mistakenly selected and False Negative is when something is missed. The fundamental recognition was that the monetary value (or the risk of committing) of these two types of errors are not the same. The cost of a False Positive error is the extra effort (labour cost) to handle the incorrectly selected item, while the cost of a False Negative is the reputation risk coming from the incompleteness of the resulting product.
Let’s take a simple example to make this concrete: the case when an analyst is searching for relevant content for clients. Previously selected content form a declarative problem definition and this problem is hard to resolve imperatively so this is a good candidate for a machine learning solution. Partial automation would mean that an upstream model labels all content regarding “relevant/not relevant” and the analyst decides from these which ones should go to the clients.
Partial automation usually forms a convenient middle step in machine learning projects as it appears halfway between the manual solution and the fully automated solution so various scenarios can be evaluated in the same framework. What are the two types of errors and their consequences?
False Positives and Precision
This means that the model tagged some content as relevant but it is actually not relevant. An analyst must read it to make that judgement which costs time and money. If this happens frequently, the analyst might not be able to physically sieve through all of the content and the product is unfeasible unless more of them are hired. False positive errors therefore primarily cause labour cost. Precision is simply the quality of the selected articles or in other words the lack of false positives expressed as a percentage.
False Negatives and Recall
This is the serious one. The model making a false negative error means that the analysts downstream have no chance selecting a relevant piece because they won’t see that piece at all. That means the client won’t see it either and therefore damages the quality of the service and potentially stop making it valuable for them. There is really no way to solve this other than making better models. Recall is the completeness of the selected articles or in other words the lack of missed items expressed as a percentage.
For a given model DSes can set hyperparameters to create a tradeoff between Precision and Recall. Because Precision and Recall measures different incompatible quantities from a statistical perspective, it is not possible to decide which is the “best” model, that is a purely business question. What is more important for the business: 1 more hour of analyst work spared or 1 more slightly disappointed customer? These are not mathematical questions at all.
To aide the business decision we can draw the various options on a chart:
The chart depicts several scenarios each with different business value:
Low precision, 100% recall, everything is selected and analyst have to filter out the content manually. Clearly not practical but listed as a theoretical option given that this ensures that nothing is missed by the end users.
Higher precision, slightly lower recall, sacrificing a small amount of relevant content, huge amount of irrelevant content can be filtered out automatically. This frees up the labour that was required to filter that content out and make the first step toward automation
Even higher precision, lower recall, with careful consideration further irrelevant content can be thrown out automatically, but this is the point when the lost relevant content must be carefully examined if the system is still acceptable
By selecting parameters to achieve a higher precision and spare more labour cost the resulting system misses too much relevant content (too low recall) and creates an unfeasible solution. Where the “Client acceptability limit” lies exactly should be decided by UX and product managers as they represent the end users in the process.
With the current model, no hyperparameter selection can place the system in the position of Solution 4. DS must create a better model to be able to deliver the performance characteristics of this solution.
Once DS able to create a better model the whole tradeoff negotiation between Precision and Recall must restart as new options appear on the horizon. If no better model is possible the best (from business point of view) should be selected by setting the appropriate hyperparametres. If that still doesn’t yield an economically feasible solution a go/nogo decision must be made.
Above we can see a convenient framework linking business value and statistics. The above terminology helps non-tech team members to participate in the clarification of the business case for an end-to-end ML system with a couple of simple terms. Experience shows that an organisation can adopt phrases like “Precision sensitive problem” meaning labour cost should be reduced or “Is this a recall issue?” meaning some important content is missed. This helps all parties to be on the same page what should be done next. The above framework should be applicable to more complicated modelling problems as well.
False Positive and False Negative errors have fundamentally different business costs and risks
For the same model multiple business options available through Precision-Recall tradeoff.
Selecting from these is a business question and not a mathematical one
The shared vocabulary enable business and UX/PM teams to contribute with domain specific knowledge to DS projects.
Thank you for reading my article, please provide feedback in the comments and subscribe/share!
hello sir, greeting from india, very good postings on the data science relationship to business value. can you make topic on best way of making and structuring directory for data science projects? Thank you , rameshbhai