Thoughts on Commercial and Open Source Predictive Analytics Software

Competition in statistical (or predictive analytics) software was once waged between established commercial packages (SAS, SPSS, Matlab, and Stata, to name a few). Today, the same battle is taking place between mostly the same set of commercial software offerings, as a group, and their open source software alternatives (R and Python, first and foremost). The rise of open source has led to a renaissance in how analytics are performed. Everyone in analytics, from beginner to expert, has a view on software. This post considers the current situation in predictive analytics software through my own experience—with SAS and R, respectively, as exemplars of the best that commercial and open source has to offer.

Most statistical programmers pick a software and stay with it. Software choices are notoriously sticky for a reason (or several)—the time it takes to learn the software and the pain and suffering that goes into implementing it have quite a lot to do with it.

And yet it is often the case that the software ‘picks’ us: most of us had the software we use, know, and love for statistical purposes bestowed upon us, the result of a choice made by others, on the basis of circumstance (through school, a job, whatever) that we subsequently stuck with. Or so it had been, until open source came around.

I learned SAS in grad school, honed my skills on the job (mostly at larger organizations with differing needs and objectives, often with large amounts of data), and at some point along the way became more or less a devotee. And then a funny thing happened: a company I was working for decided not to renew its SAS license. Suddenly we had a bunch of legacy code without the software to run it.  We switched to Matlab. It was painful, but life moved on.

After several years spent coding I more or less stopped programming entirely to manage programmers, rather than be one. A few years in, I found myself leading a project to develop a statistical model for a company that did not have a SAS license.   The intern working for me at the time happened to be straight out of college, where he’d studied computer science and had learned R. And so, through him and an R User Group, I made the switch.

With SAS and R representing the old guard and new wave in statistical software, respectively, what can be observed in comparing open source with established commercial alternatives? How does SAS, in particular, stand up to R?

If one were to play word association, some epithets for SAS might include the following:

  • Static: In fact, one benefit of the software is that it doesn’t change all that much. Once you’ve reached a certain level of competency with SAS, there are fewer and fewer surprises. You can become SAS Certified by learning the code base and know that whatever procedure you’re looking to run can be run more or less glitch free, with syntax that doesn’t change all that dramatically.
  • Stable: Stability goes more or less with it being static. You could insert ‘reliable’ here, too. There may be bugs, but they’re pretty few and far between at this point.
  • Well-supported: Support is out there. You can call SAS tech support for questions. There is a ‘Books by Users’ series to help explain how to do things in the software. I call a colleague, on the other hand, if I have an R question, or hit the internet.
  • Fast: hands down, SAS rips through data. Sometimes that takes work in R. It’s a hotly debated topic, speed. Let’s just say the winner depends on the circumstances, the race being run.

R, on the other hand, is:

  • Organic: SAS development follows a kind of central planning—R evolves. With R, versions change, packages get added (and mothballed?). Evolution can have an immediate effect on the people using the software. Though there may be thoughts as to where R is going, it’s hard to call it group think—the direction the software takes is influenced, to a great extent, by the user base. Many individuals and organizations are working on the software, not just one.
  • Democractic (with a small ‘d’): Innovations in R are more bottom up than top down. (You could, by the way, say pretty much the same thing about Python, I think.) R packages, which form the core of the software, are sort of a DIY way for individuals to influence the evolution of the software.
  • Creative: The evolution of R fits nicely somehow with the maker movement, which explains the appeal it has to entrepreneurs. Knowledge is transmitted from one user to the next. R is also a more low level language, allowing the developer greater control. In contrast, SAS looks relatively stodgy. Think about those Apple commercials comparing a Mac to a PC.

Ultimately the choice of predictive analytics software is now more personal than ever. The open source revolution is here for predictive analytics software, with the winners and losers yet to be determined. One thing’s for certain, though: the great democratizing movement of open source has brought with it broader access to data and analytics, both in terms of the work that’s being done and who’s doing it.

I’ll revisit the topic of SAS and R in future posts. For now, just one parting thought: as with any trade, the tools used have implications for the work that’s done. So pick wisely, if you can, for now and the future.

My advice: don’t be parochial, everybody has something to offer. You may choose a favorite, but it can never hurt to learn it all!

Well articulated Lee. I get the same question on which one is better. As you described, it really depends on what you are looking for and the type of usage. By the way, Python is another tool which has similar usability as R, more programmer friendly tan statistician friendly though. However specially for building algorithms, running scaled up solution or inserting into web applications, Python works wonders.

Like
Reply

Great article Lee! Where does IBM's "Watson" or similar platforms fall in all of this?

Like
Reply

To view or add a comment, sign in

More articles by Lee Medoff

Others also viewed

Explore content categories