An overview for Beginners in DataScience

An overview for Beginners in DataScience

In a post we made earlier, we had written about the differences between a data analyst, scientist and engineer and discussed how each profile has its own importance. In this post we would like to look at Data scientists more closely. This post will mainly be a summary of various blogs and should give a quick summary to address your basic FAQs on being a Data Scientist. Data scientists have a background in statistics, math and have required coding skills. They generally analyse data and often end up modelling data, and in many cases are also involved in artificial intelligence [Source]. The following are the most common languages used by Data Scientists [Source]:

1.      R: R’s flexibility in delivering the required Data Analytic output makes it a recommended language by many involved in Data Sciences. Compared to other languages, R is not too tough to learn. It is available for free. But R is only used for Data Analysis and Modelling and isn’t versatile in its application areas.

2.      Python: Popular as a fairly easy language to pick up, python is used by various type of coders including web developers. Since it is easy to learn and has a large online support community, it is considered as an option in Data Sciences. But compared to R, it does not have certain modules or packages. This is also free.

3.      SQL: You need to know this to implement Relational Databases. While managing and modifying databases is more of a Data Engineer job, Data Scientists have to know this as well and might be involved in manipulating databases as well. This is especially required when dealing with large data sources.

4.      Julia: Considered as a language that might become really popular in the future, Julia is an easily readable language and can deal with numerical analysis. As of right now it is not as vast as R or Python in terms of module and package availability.

5.      SAS: SAS is a user-friendly platform because of its GUI and has an expanse of modules/functions that help with data analysis.

Other languages and tools used: Excel, SPSS, MATLAB, Scala among others [Source1, Source2, Source3, Source4].

This might seem daunting to all those who do not have the required background and therefore we would also like to show some ways to transition into Data Sciences. For complete beginners, you can check this link to build context for your way forward. For newbie coders, you could start with basic Data Analytics and explore tools mentioned in this link. While some of them will be fairly easy, some would require you to get familiar with the tool before you end up creating good content. This is assuming you have some knowledge in statistics and data analysis to begin with. There are other GUI (Graphic User Interface) driven data science tools that you can check out in this link.

If you are still deciding on whether or not you should be a part of the Data Science world, this following post by pwc should help you understand what is to be expected. While the jobs analysed in this link pertain to USA, the points made are nevertheless relevant. In this post by Burtch Works, they make predictions on Data Science for 2018. This link shows the results of a survey done in 2018 on Data Science recruitment in India. One disclaimer about this link is that it does not specify the number of people surveyed or how the sample was decided but one can look at this post as a basic overview of recruitment trends. Hope this post helps and do let us know your queries and comments!

Contributors:

Haritha Songola is currently pursuing their master’s degree in Climate Change and Sustainability Studies. Their research work focuses around Food Security and Climate Change, Feasibility of Carbon Budget Scenarios and Sustainable Growth.

Manvirender Singh Rawat is founder of Klaymatrix. He has worked on a wide array of projects in development sector and is constantly trying to bring in his data science expertise.

To view or add a comment, sign in

More articles by Manvirender Rawat

  • Digital India

    There are 462 million internet users (highest number in 2018, source) in India as compared to the total population of…

  • India in 2018 Asian Games

    The first Asian games were held in New Delhi, India in 1951 and they have been held every 4 years since then. The games…

  • What is GIS?

    Geographic Information System (GIS) as the name suggests is a system that captures, stores, analyses and handles…

  • Financing Solar Power

    India’s Nationally Determined Contribution for the Paris Agreement aims to have 40% non-fossil fuel share in the total…

    2 Comments
  • Big data in Agriculture Sector

    Big data is becoming more and more popular even in relatively less mechanised (agriculture) countries like India. ‘Big…

  • Modelling Uncertainity

    Modelling of different types are powerful tools to understand complex systems and run simulations to assess results…

  • Install Hadoop on your machine easily

    Creating a Hadoop environment on your local machine is not an easy task. You would need minimum 8 GB RAM to start with,…

  • Visualizing 'Football World Cup Finals'

    Click to check out our new visualization on 'Football World Cup Finals'

  • The Journey So Far..

    In the year of 2012 I took a leap of faith, quit my corporate job & started my journey as an independent consultant…

    4 Comments
  • The Works !

    Check Out my latest Viz capturing my project portfolio, click here or the image below: #tableau, #Visualization…

Others also viewed

Explore content categories