Data Science: A new Learning
Data Science: A New Learning
I have started learning data science from Udemy few days back. This is a new learning for me as I have never worked on anything related to data science like Python or R. This is the first time I am writing blog on Data science.
Data is the new oil. It means that how every modern IT is depend on the capturing, storing and analysing data for various needs. All of these involved multidisciplinary approach by using the mathematical models, statistical graphs, databases and of course the business and scientific logic behind the data analysis.
Data science is the extraction of knowledge from data, using ideas from mathematics, statistics, machine learning, computer programming, data engineering.
The collection, curation and analysis of data has always been as social as it is technical. Even in the most automated, data driven systems, there are always humans who work behind the scenes, from the software developer and hardware operators who maintain invisible infrastructure to those who collect, label, annotate, clean, validate and manage data.
In this blog I am going to cover the basic of data science which involves business analytics, data analytics, Machine Learning, statistics, Mathematics, Probability and Python.
Analysis Vs Analytics
Analysis is not equal to Analytics. Both are different terms in terms of Business. There is lack of transparent understanding between analysis and analytics.
Analysis: -
There is multiple dataset in analysis. Analysis looks backward over time. Analysis is separating out a whole into parts, study the parts individually and their relationship with one another.
Analytics: -
The Analytics generally refer to the future. It explores potential future events. Analytics is the sensually the application of logical and competence reasoning to the components part of in analysis. A method of logical analysis is typically performed through use of algorithms, which are applied as advanced logic separation of the whole into component parts. It is a pattern that what we can do in the future.
Analytics are divided in two parts:
1. Qualitative = intuition + analysis.
2. Quantitative = formulas + algorithms.
Examples: -
You are the owner of a clothing store and you have great understanding what your customer needs or you performed the detailed analysis for the women clothing items and sure about which fashion trends need to be followed. You may know which cloth items are in demand. This will be qualitative analytics but you might don’t know to introduce the new collection in that case relying on the past sales data and experience data that on which month it would be best to do that. This is an example of quantitative analytics.
Business Analytics and Data Analytics: -
Both business analytics and data analytics involve working with and manipulating data, extracting insights from data, and using that information to enhance business performance. So, what are the fundamental differences between these two functions?
Business Analytics focuses on the larger business implications of data and the actions that should result from them, such as whether a company should develop a new product line or prioritize one project over another. The term business analytics refers to a combination of skills, tools, and applications that allows businesses to measure and improve the effectiveness of core business functions such as marketing, customer service, sales, or IT.
Data Analytics involves combing through massive datasets to reveal patterns and trends, draw conclusions about hypotheses, and support business decisions with data-based insights. Data analysis attempts to answer questions such as, “What is the influence of geography or seasonal factors on customer preferences?” or “What is the likelihood a customer will defect to a competitor?”. The practice of data analytics encompasses many diverse techniques and approaches and is also frequently referred to as data science, data mining, Data Modelling, or big data analytics.
Probability: -
Probability is the likelihood of an event occurring. This event can be pretty much anything – getting heads, rolling a 4 or even bench pressing 225lbs. We measure probability with numeric values between 0 and 1, because we like to compare the relative likelihood of events. Observe the general probability formula.
Probability Formula:
• The Probability of event X occurring equals the number of preferred outcomes over the number of outcomes in the sample space.
• Preferred outcomes are the outcomes we want to occur or the outcomes we are interested in. We also call refer to such outcomes as “Favourable”.
• Sample space refers to all possible outcomes that can occur. Its “size” indicates the number of elements in it.
Probability frequency Distribution
What is a probability frequency distribution?
A collection of the probabilities for each possible outcome of an event.
Why do we need frequency distributions?
We need the probability frequency distribution to try and predict future events when the expected value is unattainable.
What is a frequency?
Frequency is the number of times a given value or outcome appears in the sample space.
What is a frequency distribution table?
The frequency distribution table is a table matching each distinct outcome in the sample space to its associated frequency.
How do we obtain the probability frequency distribution from the frequency distribution table?
By dividing every frequency by the size of the sample space. (Think about the “favoured over all” formula.)
Statistics: -
Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies. Statistics studies methodologies to gather, review, analyse and draw conclusions from data.
Some statistical measures include the following:
- Mean
- Regression Analysis
- Skewness
- Variance
Mean
· A mean is the mathematical average of a group of two or more numerals. The mean for a specified set of numbers can be computed in multiple ways, including the arithmetic mean, which shows how well a specific commodity performs over time, and the geometric mean, which shows the performance results of an investor’s portfolio invested in that same commodity over the same period.
Regression Analysis
· Regression analysis determines the extent to which specific factors such as interest rates, the price of a product or service, or particular industries or sectors influence the price fluctuations of an asset. This is depicted in the form of a straight line called linear regression.
Skewness
· Skewness describes the degree a set of data varies from the standard distribution in a set of statistical data. Most data sets, including commodity returns and stock prices, have either positive skew, a curve skewed toward the left of the data average, or negative skew, a curve skewed toward the right of the data average.
Variance
· Variance is a measurement of the span of numbers in a data set. The variance measures the distance each number in the set is from the mean. Variance can help determine the risk an investor might accept when buying an investment.
Introduction to Programming
What we need to do if we just have started learning Programming. We face many problems in our daily life. Some of them we can solve it by our own and some are complicated problem which are solve by the help of computer. For that we must create program.
Assume there is a problem which must be solve through computer. There are certain steps which must be follow for that we must write algorithms first and then convert it into the computer languages.
What is Python?
Python is a programming language. It is used for web development (server side), software development, mathematics, system scripting.
What can Python do?
· Python can be used on server to create web application.
· Python can be used alongside software to create workflows.
· Python can connect database system. It can also read and modify flies.
· Python can be used for big data.
Benefits: -
· It has several technical advantages to other programming languages.
· Practical Applications
Technical Description: -
· Free and constantly updated
· Can be used in multiple domain
· Intuitive syntax that allows for complex quantitative computations.
Python’s popularity lies on two main pillars. One is that it is an easy-to-learn programming language designed to be highly readable, with a syntax quite clear and intuitive. And the second reason is its user-friendliness does not take away from its strength. Python can execute a variety of complex computations and is one of the most powerful programming languages preferred by specialists.
The Jupyter Notebook App is a server-client application that allows you to edit your code through a web browser.
Machine Learning: -
Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that system can learn from the data.
So, at the end the summary of it that basically the data science is the collection and summary of data. It is how the data behave in the technical environment. It involves many key points. i.e. Machine learning, Statistics, Python, Probability and Business analysis and business analytics. By using it we can understand how the data is behaving in the system.