Search for code in all your notebooks with one simple command
Happy New Year, I hope everyone had a great holiday and is ready to start 2023!
As always, the Code Quality for Data Science (CQ4DS) discord is open for people who want to improve their coding skills. Invite link:
Searching in notebooks
When I need to use an infrequently used package, I often remember a specific use case I did some time ago. In this case, I need to search in a vast set of notebooks. I had various grep-based tools for these, which are less than ideal because they also match the cell outputs that are not relevant.
I was working on some complex JSON files which were easy to manipulate as python structures. Then it dawned on me that notebooks are just pure JSON files themselves. How difficult would it be to use them in the same way? Apparently, it is not very difficult. The gist of the entire package is:
for cell in json.loads(open(notebook).read())['cells']:
After this, the rest was a simple regex on
I wrapped this into a function and returned the values as a pandas dataframe.
I kept copying the above function into every notebook server I used, which is clearly not a sustainable solution. I thought I should convert it into a package, but the last time I tried to do this, it was a pain. And if I expose it on PyPI, I should do it “properly”.
At the same time, I bumped into the “Hypermodern Python” article series (6!! parts), which is an epic walk-through on all the recent technologies for a modern python environment. This is one of those topics that you know you should do, but you don’t because of … you know … “reasons”.
On a December Saturday, I bit the bullet and decided to go through the entire series and live to chat it on CQ4DS: [link]
I will write more posts on this topic, so subscribe if you want to learn more about tools like poetry, black, GitHub actions, nox, pre-commit hooks and many more.
nb_query quick start
!pip install nb-query from nb_query import nb_query nb_query('import numpy as np')
Package on PyPI: https://pypi.org/project/nb-query/
Repository on GitHub: https://github.com/xLaszlo/nb-query
Docs on RTD: https://nb-query.readthedocs.io/en/latest/
Hypermodern Python: https://medium.com/@cjolowicz/hypermodern-python-d44485d9d769
The project is WIP. If you have any feedback, join the CQ4DS discord and share it on the #hypermodern-python channel: