The case for scripted data analysis (and against Excel and SPSS)

The case for scripted data analysis (and against Excel and SPSS)

I just read the following blog post, which argues against using R for data analysis teaching and implementation. I have to respectfully disagree with the content of this blog. While I prefer python to R (for many reasons, including the superior syntax, general-purpose nature and of course, speed), the blog post seems to argue that point-and-click-based apps such as SPSS are the better alternatives because of their simplicity! I started with SPSS in the past and I can say from personal experience that this is very bad advice! Why? I'll give you my top 5 reasons:

1- Lack of full customization in SPSS. If you can't change even simple hyperparameters and adjust the algorithm design to your needs, how can you call yourself a data analyst? SPSS is extremely rigid, unlike scripting-based analysis workflows where you have access to all sorts of flexible functions, file formats, workflows etc. Want to change how the algorithm works to suit your specific dataset? Want to change the error metric or handling missing data? Good luck doing that in SPSS without nightmares!

2- Lack of reproducibility. This is big one! Unlike scripted analysis where you can just hit "run" again and get your results all over in a fully transparent manner. Say you discovered that you accidentally normalized your data in an incorrect fashion, what can you do about it? If your analysis is scripted, you will likely need to change just one or two lines of code and click run et voila! Compare this to the nightmare of having to repeat the whole analysis click-by-click!

3- Open-source nature of R (and Python) means you get access to a very broad set of packages and analysis workflows. Almost all decent data analytic packages nowadays are implemented in R or python, why remain isolated from everyone?

4- Ability to handle big data. There is a limit to how much analysis you can do by loading full datasets into memory and performing full-batch analysis. Sooner or later, one will have to learn how to deal with large datasets, so why not learn the tools sooner?

5- Transferable knowledge. Once you've learned one scripting-based language, congratulations! You just learned a good deal of programming, and you can bet it would be pretty easy to learn any other scripting language in a short amount of time. The fundamentals of any programming language are the same, be it variables, conditional statements, loops or exception handling. In contrast, learning where to click and how to find functions in SPSS only helps you with SPSS, and is not transferable to any other application.

I would say that simple tools such as Excel and SPSS are only for beginner-level simple statistical analysis, but if one has to take data analysis to the next level, especially if one needs to apply machine-learning prediction and classification models beyond simple regression (for example, random forests or artificial neural networks), there is no reason what-so-ever not to take a couple of steps to learn one or more scripting languages. It gives you freedom, flexibility and is really much easier than it initially seems.

Cheers.

To view or add a comment, sign in

More articles by Mohamed Amgad, MD, PhD

Others also viewed

Explore content categories