The goal of maintaining production-quality code is that you can change your codebase with confidence, knowing it still adheres to already assumed specifications. This allows you to respond to new requests, attempt new solutions and back down from them if you don’t get it right.
Thanks for mentioning Typer! I came across it years ago when I was first starting out with FastAPI but never thought about it too much. I should have given it more attention given the quality of what tiangolo does but this was a welcome reminder to re-aquaint myself with it.
Hello, Laszlo. Thank you for the material, I have a couple of questions.
1. What exactly do you assert, comparing "y" with the optimal model results pkl file? Model can produce metrics, feature importances, predicted values, etc. What should I consider?
2. In your assertion code "y" stands for predicted labels, I assume. So it returns false if one of predicted labels is not equal to the one from the ideal-model (pkl file). But how can it help in testing? I won't even know which value is wrong and, especially, why is it so.
3. You have get_names methods in your TestNameLoader and SQLNameLoader classes. I am pretty sure, it breaks one of the OOP principles.
4. Same for the Process's run() and the Typer's run() methods.
5. I don't see the purpose of the TestNameLoader class. It only saves the dataset into a pickle file and returns 0-sampleCount rows from it. Shouldn't you name it "Dataset_file_loader" instead?
How can a Data Scientist refactor Jupyter notebooks towards production-quality code?
Thanks for mentioning Typer! I came across it years ago when I was first starting out with FastAPI but never thought about it too much. I should have given it more attention given the quality of what tiangolo does but this was a welcome reminder to re-aquaint myself with it.
Hello, Laszlo. Thank you for the material, I have a couple of questions.
1. What exactly do you assert, comparing "y" with the optimal model results pkl file? Model can produce metrics, feature importances, predicted values, etc. What should I consider?
2. In your assertion code "y" stands for predicted labels, I assume. So it returns false if one of predicted labels is not equal to the one from the ideal-model (pkl file). But how can it help in testing? I won't even know which value is wrong and, especially, why is it so.
3. You have get_names methods in your TestNameLoader and SQLNameLoader classes. I am pretty sure, it breaks one of the OOP principles.
4. Same for the Process's run() and the Typer's run() methods.
5. I don't see the purpose of the TestNameLoader class. It only saves the dataset into a pickle file and returns 0-sampleCount rows from it. Shouldn't you name it "Dataset_file_loader" instead?
Maybe I misconcept something.