I don't know what you think about DORA metrics, but I am always confused about which one is which, so I made this simple diagram to organise them.
On the surface, they measure speed and quality. Behind this, they assess the more abstract "agility", the ability to change quickly and easily.
As you plan to make a change, you go from idea to change to a potential failure and then recover from that failure to benefit from the original idea.
There is one metric for each of the three state changes:
Lead Time for Changes: The amount of time it takes to commit to getting into production. This measures going from the idea state to the change state.
Change Failure Rate: The percentage of deployments causing a failure in production. This measures going from the change state to the failure state.
Time to Restore Service: How long it takes an organisation to recover from a failure in production. This measures going from the failure state to the recovery state (with the change being in production).
And the fourth metric measures how frequently the above cycle is happening over a unit of time:
Deployment Frequency: How often an organisation successfully releases to production.
While large organisations certainly can exploit these in a data-driven manner, small organisations sometimes struggle to measure these meaningfully.
If you have one ML model in production, what is your Deployment Frequency? What is your Change Failure Rate?
As all of the metrics are attached to "agility", probably the best course of action to start is implementing general code quality best practices, which will eventually yield meaningful metrics and can provide value before you can explicitly measure these.
But in general, the mental model above could be nevertheless helpful.