

Discover more from Deliberate Machine Learning
Since I started focusing on Code Quality for Data Science (CQ4DS), I have consumed a lot of content on abstract concepts in Software Engineering. One typical trope is attacking OOP based on its various perceived or real but outdated faults: Object-Oriented Programming is Bad, Stop Writing Classes. I will review and argue with these in the context of ML and DS workflows.
If you are interested in discussing the topic, join our community. Invite link for the CQ4DS discord: https://discord.com/invite/8uUZNMCad2

In advance, I need to tell you that I am not a Software Engineer, just a Data Scientist who is curious about the topic and trying to apply the best bits to my (rather different) field. And I need to think a lot about the meta-game of programming. These are not considered to be SWE advice, caveat emptor, YMMV, etc.
What are the typical arguments against OOP?
“Look at this monster. This is soo bad. Here it is with a single function.”
Well, poorly written code is just that, poorly written code. Just because you can reimplement that better with an alternative technique doesn’t make that technique, by definition, better.
“Abstraction, encapsulation, polymorphism, and inheritance (4 OOP principles) are a bad idea in programming.”
Yes, and no one uses these apart from polymorphism. Yes, especially not inheritance.
“Java has long names.”
Good job we are using python. Maybe use shorter, domain-relevant names.
“Too much abstraction makes code harder to understand while abstraction is supposed to mean _simplification_.”
Speculative Generalisation is a code smell leading to unnecessary code. If a coupling is not challenged, don’t decouple. If it is, then refactors, there are techniques for that. And “abstraction” here means hiding away details that, at this point, are not relevant to understanding what the code does.
“Chopping up functions into single line functions in the name of SRP (Single Responsibility Principle) is a bad idea.”
Yes, indeed, it is. Unfortunately, SRP is often misinterpreted: http://www.softwareonthebrain.com/2022/01/the-misunderstood-single-responsibility.html. The focus should be on readability (hopefully done by someone other than you). How fast can you understand this? Is it too long? Chop it up. Too short? Do you need to jump around while trying to understand it? Refactor. Don’t overdo either.
“I can do this with functions.”
Yes, especially in python, where functions are first-class citizens. BTW, this doesn’t make the code “functional programming”. Classes are a familiar concept that everyone learns about somewhere in their study. So they are an excellent jumping board to start from. Unlike, for example, currying, which is necessary for composition.
“OOP, in general, is wrong, and all these new changes (Patterns, SOLID, TDD) are just bandaid.”
Software engineering is one of the most (if not the most, period) challenged, dynamically growing industries in probably the history of humankind. In the last 30 years, tens of millions of people practised it, trillions of dollars were invested in it, and it permeated every corner of our life. Of course, the dominant way of doing it will constantly change, and practitioners must keep up. The original ideas were not based on some deep understanding of software engineering, just made up pretty much on sight based on some craft experience. In fact, one can argue that what people are doing today with classes has nothing to do with the original OOP exactly because practitioners figured out better ways to do it, and that happened to still uses classes as an abstraction.
“Changing state is a bad idea.”
Indeed it is. Therefore classes are used only to define the program structure at runtime (through Dependency Injection), create parametrised behaviour and group interfaces (functions) together. In general, try avoiding state changes and rather persist it somewhere and read it on demand.
“Functional programming will take over everything.”
LISP is the second oldest language that is still around (60+ years old). If it could take over the world, it would have done it already. So you are pretty safe investing your time in class-based programming. And I am intentionally not calling it OOP to avoid confusion with the 4 principles above.
But look at it from a different side: Immutability is a highly recognised benefit both in classes and data. List and dictionary comprehensions are core components of pythonic practices. You can argue that practical programming adopted the part of FP that was easily accessible. You can use these without knowing the difference between a monad and a monoid.
So why do classes (and not OOP) matter? What matters at all?
Programming is primarily a communication exercise. You are expressing your solution to a business problem in a highly structured language. The next person that needs to change it will have a clear understanding of what you did before and can change it according to new requirements.
“Code is read 10x more than written.”
This implies that optimisation should focus first on readability and second on changeability. Because programming is communication, you need to communicate in a simple and straightforward manner that most relevant people (including new joining junior persons) can understand.
Classes provide a convenient structure that is easy to acquire and most people are already familiar with. They have a simple lifecycle. You construct them somewhere, the constructor runs, and then its interfaces (functions) are “ready” to use like a simple proto-service. I like to think about them this way because it means that, eventually, they will be easier to convert into services.
“I didn’t have time to write you a short letter, so I wrote you a long one instead.”
- Mark Twain
There is really no shortcut to simplicity. Complex problems need complex solutions. You need to organise yourself to solve them. This needs conscious activity, which is tiring, and one seeks some remedy. It would be so simple to find some Holy Grail of programming technique where we can just apply rules, and the best solution just falls out.
This is why these programming paradigms (SOLID, SRP, Design Patterns) are often twisted out of context and applied infinitely when they become a caricature of their original intents.
“Some rules can be bent. Others can be broken.” - Morpheus
The rigorous nature of programming (and its practitioners) is also prone to turn any principle or paradigm into laws written in stone and enforce it through intimidation (Anyone had a bad code review experience about three irrelevant variable names?).
As programming is about communication, you should focus on if something “makes sense” rather than if it is “correct”. Given the definition of “correct” is dubious at best. And finding out what “makes sense” is a community activity. This is exactly the reason why agility (the ability to react to change) is so important. Most of the time, you have no idea if what you did is the right thing and might need to change it later. This is exactly why refactoring is a thing. But that also makes getting things right the first time less relevant, leading to a more relaxed and free programming (and communicating) experience where you can focus on what you want to do instead of getting every detail right.
Read more on this topic here: Data Science code quality hierarchy of needs
Summary
In general, I try to focus on delivering value through my content which means that reviewing negative content is usually out of scope. However, I felt it is important to address concerns about classes because these pop up so frequently in communities. I suspect that’s because people have doubts if they are investing their time and effort in the right subject. I don’t want people to be discouraged just because someone argues with an already outdated concept of the paradigm. It’s much easier and straightforward than you would think.
Join our CQ4DS discord: https://discord.com/invite/8uUZNMCad2
And subscribe for new posts:
Why anti-OOP content is wrong?
scikit-learn is literally built with OOP in mind. It’s what provides consistency across all classifiers or regressors. It’s what allows you, the user, to be guaranteed to have fit and predict methods.
There are just so many things that are used by the community that are OOP driven but they don’t recognize. I mean, PyTorch’s object oriented architecture brought simplicity without compromising complexity at a time when TensorFlow was far to verbose and difficult to build with.
Both are paradigms with their use cases. Functional Programming is nice for pipelines and orchestrating tasks. OOP is great for building entities, like User or Product. It’s not one or the other. It’s both. Apply when necessary.