The Downsides of using Jupyter for Software Development (Part 2)

Juan González-Vallinas

Published Jan 11, 2023

This is Part 2 of an article I published a few weeks ago. Please read Part 1 first if you haven't, at least the first few paragraphs where I state how much I love Jupyter Notebooks so you can understand the context here. Some definitions first:

IDE: Integrated Developer Environment. What Software Developers use (Sublime Text, PyCharm, IntelliJ…). IDEs are designed to develop and maintain large code bases collaboratively.

ICE: Interactive Computing Environment: What Analysts & Scientists use (Excel, Jupyter, SAS, Matlab…). ICEs are designed to quickly analyze data and generate visualizations and reports.

In the previous article we started talking about why using .ipynb in pipelines was a bad idea. Lets finish that first:

With .ipynb you are forcing your sub-par IDE on other developers

If you are a Data Scientist or Analyst reading this, you might have noticed I just called you Developer. That was intentional. When you are developing code you are acting as a developer, and while it is understandable that you don’t produce the same quality code as the people whose job is to do so full time, you have no excuse to not at least try to follow good practices so devs don’t hate you too much. Assuming you are writing notebooks with a Python kernel, the if you place an .ipynb file in a pipeline, you are effectively forcing other developers to use an environment that supports .ipynb files. The .py format is understood by all IDEs, you should prototype your code in notebooks if you wish and then translate it to .py. On top of that, in 2023 it is exceptionally rare for developers to impose code formats to each other, IDEs are 100% compatible with each other (it used to be much more complicated). Do not trigger the PTSD of senior devs please, they have been through enough.

In many companies, we have met a lot of analysts and developers frustrated with each other for several reasons, as the relationship tends to be complicated (let's leave this for another post). Do not generate more unnecessary friction, use .py.

With code cells, fingers are fatter than ever

Yet another of the blessings that Jupyter brings is a curse in disguise when it comes to pipelines. Being able to run any code block in a cell at any time is extremely nice and powerful when prototyping code, but it is also unnecessarily dangerous when that cell block can introduce duplicates in the data warehouse. If you have worked extensively connected to live Data Warehouses/databases from notebooks and you have never at least experienced a bit of dread about making a mistake in a code block, you are either the most precise and disciplined person in the History of Mankind, or you are lying.

Enough about pipelines

This next part of the article is about why you shouldn’t use Jupyter systems as a substitute for IDEs.

Recommended by LinkedIn

How to Add a Library in Jupyter Notebook

Ravi Teja 1 year ago

How Tower Enables Reliable Execution of Python Apps on…

Serhii Sokolenko 🇺🇦 1 year ago

Top 5 Uses of Python in Power BI

Anurodh Kumar 1 year ago

The D in IDE is for Developer

Tools like nbdev transform Jupyter from an ICE into an IDE. Nbdev has achieved this objective admirably and what it has achieved from a technical point of view is impressive. However, I humbly believe that this kind of ICE to IDE is fundamentally flawed and should be avoided if possible. There are of course exceptions where using these kinds of tools could be a good idea, for example, if you are currently a small team of DS that only know how to use notebooks and need to deliver something quickly. Regardless, if you do something like this, you should have in your backlog to migrate to a “native” IDE as soon as possible.

I'm aware that this has been a point of heated debate online and I don’t intend to engage in it (so don’t expect me in the comment section) - partly because I don’t find these kind of debates productive, but also because I believe that objectively the polish that IDEs like PyCharm or Sublime Text can provide are not comparable with Jupyter when it comes to developing software. I’ve used both, the comparison is simply unfair because:

The amount of code you can maintain comfortably in a “regular” IDE feels like almost an order of magnitude bigger than in Jupyter.
Mainstream IDEs have had decades to polish and a much bigger user base with which to test what works and what doesn’t.
As discussed above, ICEs like Jupyter do not incentivise good practices.

I believe the reason why nbdev is popular is because many Analysts & Scientists only know ICEs and have never learned how to use IDEs, or are even unaware of their existence.

Jupyter disincentivizes commenting code and writing documentation

Jupyter Markdown is awesome and it’s yet another reason to use Jupyter for writing analytics reports. But when it comes to developing complex codebases, docstrings and comments are critical, particularly when projects involve more than one person.

Similar to before, can you comment your code, write docstrings, etc, in Jupyter? Absolutely! Should you? Of course! What is the problem then? The problem is that embedded markdown is not directly translatable to code because Jupyter was not designed as an IDE. Consider this simple, self-explanatory example:

Conclusion

Use the right tools for the right job and you will be more productive, your colleagues will appreciate you more, and you will be an overall better Data Person.

Ciaran O. 3y

Nice post!

1 Reaction

Christophe Carvenius 3y

Love these rants ❤️

1 Reaction

See more comments

To view or add a comment, sign in

The Downsides of using Jupyter for Software Development (Part 2)

Juan González-Vallinas

With .ipynb you are forcing your sub-par IDE on other developers

With code cells, fingers are fatter than ever

Enough about pipelines

Recommended by LinkedIn

The D in IDE is for Developer

Jupyter disincentivizes commenting code and writing documentation

Conclusion

More articles by Juan González-Vallinas

Others also viewed

How to Set Up a Local OSRM Server with Docker and integrate it with R or Python

Using RDKit in Jupyter Notebooks

Setting Up Your Local Machine for dbt Core: A Comprehensive Guide

SAS Viya empowers open-source

Python SDK for Microsoft Fabric

dhis2py : A python package for interacting with DHIS2 API

Shapely ( Dynamo )

Error Handling Strategies: dry-python/returns vs traditional try...except

Cleaning up your field names with R

THE ROLE OF PYTHON AND JAVASCRIPT IN DATA VISUALIZATION

Explore content categories

With .ipynb you are forcing your sub-par IDE on other developers

With code cells, fingers are fatter than ever

Enough about pipelines

Recommended by LinkedIn

The D in IDE is for Developer

Jupyter disincentivizes commenting code and writing documentation

Conclusion

More articles by Juan González-Vallinas

Scared about losing your job to AI? Do not become a Vertical Barbarian, become a JOAT!

The hidden curse of AI induced mediocrity

Using chat bots like ChatGPT for software development is dangerous if you don't know what you are doing

The downsides of using Jupyter for Software Development (Part 1)

Announcing the release of Bayesian AB testing library

I generated my LinkedIn background image using an Open Source Generative Algorithm

Software == Science, Notebooks > Papers

Road to Life: We made it to the top 15 of The Global Hack!

"Extract Stock Sentiment from News Headlines" is out of beta in datacamp.com

Others also viewed

How to Set Up a Local OSRM Server with Docker and integrate it with R or Python

Using RDKit in Jupyter Notebooks

Setting Up Your Local Machine for dbt Core: A Comprehensive Guide

SAS Viya empowers open-source

Python SDK for Microsoft Fabric

dhis2py : A python package for interacting with DHIS2 API

Shapely ( Dynamo )

Error Handling Strategies: dry-python/returns vs traditional try...except

Cleaning up your field names with R

THE ROLE OF PYTHON AND JAVASCRIPT IN DATA VISUALIZATION

Explore content categories