How to Go From R to Python (and Back Again in Data Science)

How to Go From R to Python (and Back Again in Data Science)

Many moons ago I took my first course in Python. It was 2014 when I built a flask app that accessed the yelp API . It was fun but at the time there wasn't a buzz about data science. Datacamp was only 1 year old.

With no need to practice every day my learning in Python stagnated

Years passed until I started a job where data science was being applied to solve business problems and I needed to practice my programming skills every day. The language being applied at the company I worked for happened to be R so that is what I learnt and from there, things began to snowball.

When you first start out in data science one big question you might ask yourself is which language to learn. It seems so important at the time. The FOMO can kick in when you see jobs advertised for data scientists but in a different language to the one you are learning. The practical answer is it probably doesn't matter which language you learn first. Pick the language that you are likely to use the most regularly in your current situation. If your workplace is using R, learn data science in R. If you are looking for a new job and the companies in your area appear to be favouring Python go with that. If you apply yourself to whichever language, after about a year or so you might realise that learning other languages could be very handy. This is the focus of this short weekly newsletter. How to make the transition to becoming a multi-lingual data scientist.

How to write code for common data science jobs in both R and Python

But you will probably think that there are many ways the same job might be done and you are correct. So we'll focus on the cleanest and most legible code I can find in both languages as well as how not to do!

Seeing how the same task can be done within a language is one thing but comparing between languages is even better. The reason for this is that you will learn that you already know more than you think you do about another language if you are already fluent in one. There are many differences but identifying common threads and patterns can help speed up the process of learning a different data science language.

For example, data frames are a common data structure used in Python and R. They are basically tables of data consisting of columns and rows. One of the jobs you might need to do is adding new columns that manipulate or use values from existing columns.

How to add a new column to a data frame in Python and R using pandas and dplyr

Adding columns to a data frame using R

First let's check out how we might add a column R using either base R or dplyr from the tidyverse. My preference is the dplyr syntax using the pipe operator '%>%'. If you understand SQL then the idea of the pipe will feel familiar. Likewise shell scripts also employ the concept of a pipe operator. The result from the previous step is piped into the next step.

How to add a new column to a data frame using R

Adding columns to a data frame in Python

In comparison, the same task in Python using Pandas might look something like this:

No alt text provided for this image

Comparing the approach in each language

The use of the .apply and str.upper methods achieve the same same result and look most similar to the base R example. There is some similarity between the syntax of pandas and base R. The base R code for this task does feel a little more intuitive and legible for me but this is just a personal bias I have right now.

In addition to pandas, another module that is demonstrated in the Python script above is dfply . This module is designed with the look and feel of dplyr from R and borrows the concept of a pipe operator (this time using >> instead of %>%). If you are learning Python and are already conversant with the tidyverse this package may be of interest to you.

Why learning both Python and R matters

Learning to do the same job using different coding languages is part of the practice towards mastery in data science and will make you more confident and valuable to your employers and customers.

Summary

Firstly, find a need to practice your programming skills every day. This newsletter aims to provide a quick reminder or lesson on how you might do something useful in data science using either R or Python.

Finally, let's summarise how to add a column to a data frame in Python and R

  • Using the apply method from the pandas module in Python is a simple approach and if you are already familiar with the tidyverse then dfply may be worth a look.
  • Using the mutate statement from the dplyr package is a common approach to adding columns in R but if you already know pandas then the base R example may be quicker to pick up.
  • Did you notice how there was no for loop example in R? The base R code easily handles columns from a data frame. Each column is treated as a vector.
  • Adding columns in either Python are R can be achieved using a similar length of code using

Next Steps

Let me know which data science tasks you would like to see compared in both Python and R? We'll start with the basics and move into the more complicated jobs as long as it can be digested in less than 10 minutes.

There is plenty more to explore with methods from the pandas and dplyr packages. One of the biggest tasks of a data scientist is data processing and this is a job in which both packages excel.

If you enjoyed this article you can subscribe and get notifications right in your LinkedIn feed. You can also find more related articles on my personal blog at machinatoonist.com.





I remember this sketch.

Like
Reply

Thank you for sharing this Matt! I have a suggestion for a future one... how to fix/format Date column - the nightmare of anyone that works with Data :)

As one using R, I often heard it would be easier to learn Python if you know R. I cannot confirm this. Just slicing dataframes I find more intuitive and easier in R (base & tidy approach) than in Pandas. Maybe this could one of the next topics?! Thank you Matt! 🙏🏻

To view or add a comment, sign in

More articles by Matt Rosinski

Others also viewed

Explore content categories