The Downsides of using Jupyter for Software Development (Part 2)
This is Part 2 of an article I published a few weeks ago. Please read Part 1 first if you haven't, at least the first few paragraphs where I state how much I love Jupyter Notebooks so you can understand the context here. Some definitions first:
IDE: Integrated Developer Environment. What Software Developers use (Sublime Text, PyCharm, IntelliJ…). IDEs are designed to develop and maintain large code bases collaboratively.
ICE: Interactive Computing Environment: What Analysts & Scientists use (Excel, Jupyter, SAS, Matlab…). ICEs are designed to quickly analyze data and generate visualizations and reports.
In the previous article we started talking about why using .ipynb in pipelines was a bad idea. Lets finish that first:
With .ipynb you are forcing your sub-par IDE on other developers
If you are a Data Scientist or Analyst reading this, you might have noticed I just called you Developer. That was intentional. When you are developing code you are acting as a developer, and while it is understandable that you don’t produce the same quality code as the people whose job is to do so full time, you have no excuse to not at least try to follow good practices so devs don’t hate you too much. Assuming you are writing notebooks with a Python kernel, the if you place an .ipynb file in a pipeline, you are effectively forcing other developers to use an environment that supports .ipynb files. The .py format is understood by all IDEs, you should prototype your code in notebooks if you wish and then translate it to .py. On top of that, in 2023 it is exceptionally rare for developers to impose code formats to each other, IDEs are 100% compatible with each other (it used to be much more complicated). Do not trigger the PTSD of senior devs please, they have been through enough.
In many companies, we have met a lot of analysts and developers frustrated with each other for several reasons, as the relationship tends to be complicated (let's leave this for another post). Do not generate more unnecessary friction, use .py.
With code cells, fingers are fatter than ever
Yet another of the blessings that Jupyter brings is a curse in disguise when it comes to pipelines. Being able to run any code block in a cell at any time is extremely nice and powerful when prototyping code, but it is also unnecessarily dangerous when that cell block can introduce duplicates in the data warehouse. If you have worked extensively connected to live Data Warehouses/databases from notebooks and you have never at least experienced a bit of dread about making a mistake in a code block, you are either the most precise and disciplined person in the History of Mankind, or you are lying.
Enough about pipelines
This next part of the article is about why you shouldn’t use Jupyter systems as a substitute for IDEs.
Recommended by LinkedIn
The D in IDE is for Developer
Tools like nbdev transform Jupyter from an ICE into an IDE. Nbdev has achieved this objective admirably and what it has achieved from a technical point of view is impressive. However, I humbly believe that this kind of ICE to IDE is fundamentally flawed and should be avoided if possible. There are of course exceptions where using these kinds of tools could be a good idea, for example, if you are currently a small team of DS that only know how to use notebooks and need to deliver something quickly. Regardless, if you do something like this, you should have in your backlog to migrate to a “native” IDE as soon as possible.
I'm aware that this has been a point of heated debate online and I don’t intend to engage in it (so don’t expect me in the comment section) - partly because I don’t find these kind of debates productive, but also because I believe that objectively the polish that IDEs like PyCharm or Sublime Text can provide are not comparable with Jupyter when it comes to developing software. I’ve used both, the comparison is simply unfair because:
I believe the reason why nbdev is popular is because many Analysts & Scientists only know ICEs and have never learned how to use IDEs, or are even unaware of their existence.
Jupyter disincentivizes commenting code and writing documentation
Jupyter Markdown is awesome and it’s yet another reason to use Jupyter for writing analytics reports. But when it comes to developing complex codebases, docstrings and comments are critical, particularly when projects involve more than one person.
Similar to before, can you comment your code, write docstrings, etc, in Jupyter? Absolutely! Should you? Of course! What is the problem then? The problem is that embedded markdown is not directly translatable to code because Jupyter was not designed as an IDE. Consider this simple, self-explanatory example:
Conclusion
Use the right tools for the right job and you will be more productive, your colleagues will appreciate you more, and you will be an overall better Data Person.
Nice post!
Love these rants ❤️