How Data Scientists can sell "Tech Debt" to their managers?

[Interesting content] Code Quality: Debunking the Speed vs Quality Myth with Empirical Data

Mar 04, 2024

Bar none, every time I present about clean architecture, refactoring, and readability in Data Science, I get a question from the audience: "This is all nice, but how can I sell this to my manager? They think this is a waste of time that doesn't add value, and we should ship features instead."

[Note] I wrote this as a post on LinkedIn but I had to share it with so many people, I might as well put it on my blog.

The problem with such a basic question is that while the notion is fundamentally wrong, it is very hard to mount an argument against it as it relies on ignoring the fundamental principles of our work.

This is a bit like arguing with the Flat Earth Society.

Luckily this space in software engineering is quite mature, and there is plenty of content in this. One of my favourite sources is Codescene; they shared this great analysis regarding the business impact of bad code:

https://codescene.com/blog/code-quality-debunking-the-speed-vs-vs-quality-myth-with-empirical-data

Their "Code Health" metric is proprietary, but we know it aggregates multiple code quality metrics and is weighted by lines of code across files. The charts show that moving your code from averagely bad to good doesn't fix more problems but makes you much faster.

The below chart indicates two phases of bad code:

Early efforts reduce the number of defects, later efforts improve time-to-market

- Very bad code wastes company resources by making the team spend more time on fixes. The code is just too difficult to write good solutions. Nothing is happening because everyone is bug-fixing. Refactoring will free time to be spent on features.

- Mildly bad code makes the team slow to deliver new features. The codebase doesn't break down, but it does hinder implementing new features. Refactoring will make the team faster and deliver more features in unit time.

Why is this important in data science projects?

We all know how many problems we have with data quality and processing pipelines. If you spend too much time fixing bugs, you must start thinking you might be in the first category.

On the other hand, data science and machine learning are about change. Everything is in constant flux because projects are unspecifiable and complex, the internal (business) and external (environment) are dynamic and unforeseen events influence project delivery.

Shipping new features is essential; making this as fast as possible is high business value.

Subscribe for similar content or continue for a primer on prioritising technical debt from the same authors:

https://laszlo.substack.com/p/interesting-content-adam-tornhill

Deliberate Machine Learning

Discussion about this post