Is Statistics (for Data Science) the next Computer Science?

Is Statistics (for Data Science) the next Computer Science?

Image Credit: https://pixabay.com/illustrations/online-web-statistics-data-3539409/

Motivation: as a university student (Computer Science major), I am going to try and draw reasonable predictions to jump ahead of the curve in 3-5 years time - about the time it takes to develop a deep skillset in some domain.

Let's project a few things that we know with near 100% certainty.

1) There are going to be more developers than ever, whether through universities and colleges, online courses or coding boot camps. They have skills such as programming logic, object-oriented design, an understanding of web architecture, some skills specific to either/both web apps or mobile apps, and perhaps algorithmic analysis skills.

As a computer science TA, I can say that OOP, basic logic and API usage are within grasp of almost anyone. So non-academic training entities, such as coding boot camps, can proliferate developers faster than universities and colleges. They are also unbounded by the need to have masters-and-up level instructors or precarious top-heavy university administrators. Thus, barriers being so low, the non-theoretical aspects of Computer Science will even further proliferate.

1.1) As advertised, they will go onto be hired by companies and proliferate mobile apps and web applications, which will then generate data, consume server/hardware resources and ripple aggregate digitization waves.

A number of deep, non-accessible, Computer Science domains will remain untouched and subsequently be in heightened demand due to the side effects of the 'basic developer proliferation'; Computer Systems (Rust, C/C++, Concurrency and Performance), Data Science (involving graduate-level math/statistics, not simple neural nets) and Data Engineering (distributed computing and other functions to support Big Data) to enable data scientists through distributed computing and other means.

On Computer Systems, we could discuss the increasing utility of C/C++ even on web browsers (See how to Google Earth uses WebAssembly to thread), the rise of Rust or the increasing application of specialized firmware. However, the skillsets with the most career versatility and scalability are clear - Data Science/Engineering.

However, the skillsets with the most career versatility and scalability are clear - Data Science/Engineering.

Data Science job postings have been growing in demand in tandem with Computer Science job postings. That's a well-established fact. If we project more data entering the ecosystem, we can also project that the Data Science skill set will be in even more demand.

2) Advanced Deep Learning Techniques will rise in application and demand within the next couple of years. The main bottleneck being Mathematics and Statistics knowledge rather than Computer Science knowledge (seemingly, from my dabbling on Kaggle, the only barrier to entry for a basic developer are the mathematical functions - "stochastic gradient descent", "categorical cross-entropy", intimidating language, etc - neural nets themselves are quite intuitive).

Further, as presented in this report by the US National Center for Education Statistics, the percent of total enrollment growth of Math/Stats majors is an order of magnitude lower than Computer/Information Science.

3) Math and Statistics are not as easily grasped as OOP and basic developer skills. However, they're two key facets of the data scientist, differentiating them from basic developers. They still present solid barriers, whether cognitive or otherwise. In fact, many informal developer education courses will tout their 'math-less' content, presumably because they know of the difficulties.

Conclusion

Thus, to take advantage of the wave of incoming basic developers and their subsequent data streams while maximizing career versatility; focusing on the mathematics and statistical skills necessary for data science/engineering would be a very defensible niche with a very high potential return (the demand - and thus potential pay - for data scientist/engineers dependent on the scale, quality and accessibility of raw data).

On a personal note, I see contributing in some way to medical research in my lifetime to be a key goal. Data science is far more applicable to that domain than the other technical skillsets.

I also don't want to discount the Computer Systems domain as performant code is the key to utilizing the hardware of many developing markets. Also, the language Rust seems to be gaining popularity (consistently of the top 3 most loved languages) and I think the language itself is an attempt to making such usually difficult work in C/C++ easier, in order to answer industry demands for performant, concurrent software. (Personally, I'm dabbling with Rust out of curiousity and it does seem safer than C/C++)

Counter Arguments and Mitigating Factors

Data Science/Engineering is downstream from developer work and thus will only ever make up a small fraction of the digital workforce.

A: True. However, we're seeing ever-expanding CS programs in universities alone, not even counting non-university educated developers. While Math/Stats programs see only single-digit growth, if any, and seemingly only exist in universities.

Computer Science was once hard/unsexy/inaccessible too. Why can't that happen to Math/Stats?

A: The question would be then, "Why hasn't that happened to Math/Stats?". And it seemingly hasn't, not even with the same online support in Khan academy or otherwise.

To view or add a comment, sign in

More articles by Brandon Tong

Others also viewed

Explore content categories