An overview for Beginners in DataScience
In a post we made earlier, we had written about the differences between a data analyst, scientist and engineer and discussed how each profile has its own importance. In this post we would like to look at Data scientists more closely. This post will mainly be a summary of various blogs and should give a quick summary to address your basic FAQs on being a Data Scientist. Data scientists have a background in statistics, math and have required coding skills. They generally analyse data and often end up modelling data, and in many cases are also involved in artificial intelligence [Source]. The following are the most common languages used by Data Scientists [Source]:
1. R: R’s flexibility in delivering the required Data Analytic output makes it a recommended language by many involved in Data Sciences. Compared to other languages, R is not too tough to learn. It is available for free. But R is only used for Data Analysis and Modelling and isn’t versatile in its application areas.
2. Python: Popular as a fairly easy language to pick up, python is used by various type of coders including web developers. Since it is easy to learn and has a large online support community, it is considered as an option in Data Sciences. But compared to R, it does not have certain modules or packages. This is also free.
3. SQL: You need to know this to implement Relational Databases. While managing and modifying databases is more of a Data Engineer job, Data Scientists have to know this as well and might be involved in manipulating databases as well. This is especially required when dealing with large data sources.
4. Julia: Considered as a language that might become really popular in the future, Julia is an easily readable language and can deal with numerical analysis. As of right now it is not as vast as R or Python in terms of module and package availability.
5. SAS: SAS is a user-friendly platform because of its GUI and has an expanse of modules/functions that help with data analysis.
Other languages and tools used: Excel, SPSS, MATLAB, Scala among others [Source1, Source2, Source3, Source4].
This might seem daunting to all those who do not have the required background and therefore we would also like to show some ways to transition into Data Sciences. For complete beginners, you can check this link to build context for your way forward. For newbie coders, you could start with basic Data Analytics and explore tools mentioned in this link. While some of them will be fairly easy, some would require you to get familiar with the tool before you end up creating good content. This is assuming you have some knowledge in statistics and data analysis to begin with. There are other GUI (Graphic User Interface) driven data science tools that you can check out in this link.
If you are still deciding on whether or not you should be a part of the Data Science world, this following post by pwc should help you understand what is to be expected. While the jobs analysed in this link pertain to USA, the points made are nevertheless relevant. In this post by Burtch Works, they make predictions on Data Science for 2018. This link shows the results of a survey done in 2018 on Data Science recruitment in India. One disclaimer about this link is that it does not specify the number of people surveyed or how the sample was decided but one can look at this post as a basic overview of recruitment trends. Hope this post helps and do let us know your queries and comments!
Contributors:
Haritha Songola is currently pursuing their master’s degree in Climate Change and Sustainability Studies. Their research work focuses around Food Security and Climate Change, Feasibility of Carbon Budget Scenarios and Sustainable Growth.
Manvirender Singh Rawat is founder of Klaymatrix. He has worked on a wide array of projects in development sector and is constantly trying to bring in his data science expertise.