3 Ways Domain Data Models help Data Science Projects
Data Science is mostly about handling data (Doh, … Thanks, Captain Obvious… ) in productionised environments. Despite this, Data Science teams rarely define their own Domain Data Model. Omitting this crucial step makes their processes brittle and hard to change, leaving them with hard to reconcile tech debt at the very beginning of their projects.
What is a Domain Data Model?
I recently gave a presentation about our new course to a group of Data Scientists at a prominent startup. Domain Data Model (or just Domain Model for short) is a core part of our process and terminology on making DS teams more effective.
About halfway through the presentation, I saw confused faces when I kept saying: “Domain Model this and that”, so I thought I’d ask if they knew what a domain model is.
They didn’t. Lesson learned: Never assume your audience knows the terminology, hence this article.
The concept of Domain Models come from Domain Driven Design (DDD) and enables ubiquitous language (Meaning of “ubiquitous” - for non-native speakers like me - “appearing, or found everywhere”). It explicitly means that it will be shared with everyone in the project: Software Engineers, Subject Matter Experts, Data Scientists and Executives.
Entities, Values and Aggregates
The Domain Model describes Domain Objects: Entities - something with an ID, Values - a property value wrapped in a class, and Aggregates - a group of related Entities. These together represent the domain in which the problem is solved.
Through these, you explicitly name the domain terms you will use later in your modelling and analysis. You also define how these domain terms are created from the data you have access to. For example, if your company calls users clients, you will have a “Client” class. If you are a food delivery company, the people who move food from A to B will be represented by a class called “Riders”, but if your company calls them “Couriers”, then you will call them “Couriers” as well.
Once you have these Entity classes, you define their properties explicitly with Value classes and their relationship to other Entities. You also define how to get these classes from data sources that are usually not defining these explicitly (e.g. logs, database tables, pandas dataframes).
Let’s take a look at the benefits of using a Domain Model.
Benefit 1: Smoother analytics and coding in the Data Science project
One of the immediate effects of using a DM is a conceptual separation of analysing data and creating data. The Domain Model is an interface between them. Your analysis will purely depend on the DM. You also don’t need to validate at each point if your data specification still applies, making your analytics code simpler.
Your data processing code will be simpler because the only thing it needs to produce is domain objects. It creates them from raw data, and if it cannot, it flags it in logs or errors. Instead of the many-to-many source to analytics problem, you define a single common layer.
Benefit 2: Smoother communication with Subject Matter Experts and the Executive
Data Science projects rarely appear out of context.
You will be expected to work closely with SMEs with little or no technical knowledge. Ubiquitous language means it is shared by the entire team, not just technical personnel. Domain Model will help you use their terminology, reducing the friction of context switching and translation from their concepts to data science concepts. You can go straight from the meeting to the codebase and implement feedback into your code.
When you report on your project, executives will be happy to hear that the DS team speaks in the same language as the rest of the business. Having a domain model forces the DS team to be on the same page with the rest of the company.
Benefit 3: Smoother interoperation with Software and Data Engineering
DDD is a popular concept in programming. Just like DSes are not working out of context, neither does Software Engineering. They probably go through the same journey and use the same ubiquitous language. Adopting this practice will reduce translation friction when you will productionise your model and keep it low effort to update based on feedback.
As you can see, using a Domain Model will be hugely beneficial to any Data Science project. Maintaining it is a high ROI activity that will yield in communicating with all external stakeholders.
Hypergolic (our consulting company) usually starts each engagement by building a domain model and defining data processing and analytics in terms of it. Get in touch to learn more about this at: https://hypergolic.co.uk/contact/
Working with Domain Models has a lot of nuance (Bounded Contexts, Context Mapping, Translation Boundaries) that cannot fit into an introductory article. I intend to write about this in the future. Please subscribe if you would like to be notified: