It's probably horses for courses but I don't like tests that fail just because I've *changed* something. I prefer them to fail when I *break* something. This leads to less cognitive overhead and is much more useful (I know when I changed something because I just executed `git push`).

Also, pickling objects and comparing them can make the test suite very fragile. For example, a schema change will break the tests even if nothing is actually wrong.

I've found a better strategy is to make vague but accurate assertions. For example, I don't know exactly how many p-values are going to be less than 0.01 for a linear regression model on randomly generated, synthetic data but I expect it be most of them (depending how much bias I've added to the test data).

Interesting stuff. ML has a lot to learn from SWE and this fascinates me.

Expand full comment