Enabling Foundation of Statistics

Neal Akyildirim

Published May 18, 2023

What is the problem?

In the vast realm of statistics, navigating through extensive content and frameworks can be overwhelming. While it is not necessary for every team member to be a statistician, understanding the underlying principles and assumptions of statistical inference is crucial in our data-driven world.

What is the solution?

To empower our teams, I recommend implementing a basic statistics onboarding program tailored to their specific needs. This onboarding initiative will primarily focus on essential aspects such as data learning, proper data collection methods, effective analysis techniques, and compelling result presentations.

What are the components of statistics onboarding?

Statistics onboarding necessitates a well-rounded approach encompassing interactive sessions, comprehensive documents, and practical examples for each component. Let's explore some of these components:

Types of Analysis:

Descriptive Analysis: Investigating past occurrences and understanding their causes.
Predictive Analysis: Forecasting future outcomes based on available data.
Prescriptive Analysis: Identifying strategies to influence and shape future outcomes.

Within these analysis types, we employ various methods and frameworks that cater to specific needs and contexts.

The Imperfection of Models:

It is important to acknowledge that all models are inherently flawed, albeit useful. Real-world complexities demand simplifications, and while these models may not capture every nuance, they provide valuable approximations based on their intended purpose.

Inferential Statistics:

Inferential statistics plays a pivotal role in analysis. By leveraging mathematical techniques, we infer trends, classify and segment groups, discover relationships between variables within samples, and generalize or predict from there.

Population vs. Sample Data:

Distinguishing between population (N) and sample (n) data is crucial. While the population represents the complete set of data of interest, studying it entirely is often impractical. Instead, we rely on sample data, which is easier, less time-consuming, and more cost-effective to collect. Parameters characterize population data, while statistics describe sample data.

Normal Distribution:

A majority of statistical tests and probability calculations rely on data conforming to a normal distribution, also known as the Bell Curve or Gaussian curve. Understanding and assessing normal distribution assumptions are vital in inferential statistics.

Recommended by LinkedIn

Understanding Descriptive and Inferential Statistics:…

Jatin kumar Sahoo 10 months ago

5 Myths of Statistics Unmasked: Why Data Storytelling…

Adalbert Ngongang, PhD 1 year ago

What is Statistical Inference? 🍕

Leonardo A. 5 years ago

Measures of Central Tendency:

Central tendency measures, such as the mean, mode, and median, allow us to analyze the average or most representative values in a dataset. The mean provides a reliable measure for numerical data, while the mode represents the most frequent category for categorical data. The median, or 50th percentile, identifies the middle value in a dataset.

Measures of Variability:

To gauge the spread of data, we employ measures of variability, including the range, interquartile range (IQR), variance, and standard deviation. The range captures the difference between the largest and smallest data points. The IQR quantifies the statistical difference between the upper 75% and lower 25% of the dataset, indicating where the majority of values lie. Variance measures the deviation of data points from the mean, while the standard deviation, the square root of variance, provides a more interpretable measure.

Variance of Standard Deviation: Same in terms of measure as Range and IQR. Shows how data is spread apart. Variance is the difference between data point and mean, squaring them, summing them up and taking the average of that. Don't use Variance. Use Standard Deviation which is the square root of Variance.

Modality:

Modality refers to the number of peaks in a distribution. Unimodal distributions have a single peak, bimodal distributions exhibit two peaks, and multimodal distributions display multiple peaks. Analyzing modality helps identify patterns and characteristics within the data.

Skewness:

Skewness measures the symmetry of a distribution. Positive or negative skewness indicates departures from a perfect normal distribution, with positive skewness indicating a tail extending to the right and negative skewness indicating a tail extending to the left. Pearson's skewness coefficient offers a reliable estimation of

Kurtosis: Gives us how tailed the distribution is compared to a normal distribution. (Light or Heavy) High Kurtosis = Heavy tailed and more outliers. Low Kurtosis = Light tailed and less outliers. Use histogram and you will see skewness and kurtosis. You can also use probability plots to see normal distribution.

Types of Data: There are 2 main data types.

Categorical Data: Examples: Persons gender, language, browser type, device type
Nominal: Represents as labels variables with no quantitative value. (No order)
Ordinal: Represents discrete values with numbers and ordered units.
Numerical Data:
Continuous: Measurements. Height of a person.
Discrete: Can only take certain values. Number of users in a page.

There are other data types that can drive from the main data types such as ratios and percentages. Additionally for reporting data there is time data type.

Types of Data Visualizations: There are 4 main visualizations.

Histograms: Use for distributions for numerical variables.
Bar: Use for count of occurrences (similar to distribution) for categorical variables.
Scatter and Line: Use for relationships between two numerical variables.
Time Series: Use for how numerical variables change overtime.

What is the ROI?

By implementing this approach, we can significantly enhance the accuracy of our analyses and elevate the quality of our decision-making processes. It is crucial to empower our product managers by providing them with readily available dashboards and analyses. This enables them to rely on existing resources instead of requiring detailed explanations and walkthroughs for every instance.

With this onboarding we do not only save valuable time and effort but also create room for additional improvements in our analytics capabilities. This allows us to allocate our resources more effectively, whether it be in refining existing analytics models or exploring new avenues for data-driven insights.

To view or add a comment, sign in

Enabling Foundation of Statistics

Neal Akyildirim

Recommended by LinkedIn

More articles by Neal Akyildirim

Others also viewed

Bootstrapping Statistics. What it is and why it’s used.

Unlocking Inferential Statistics and Hypothesis Testing

5 basic statistical tests and when to use them

Statistics with Modelling : Power Play ON

Statistics In Data Analysis

Important Steps for Predicting and Explaining Data

Power of Statistics in Analytics

Unveiling the Magic: How Statistics makes Machines Learn on Structured Data

The Statistical Leap: Moving From Descriptive Reports to Inferential Decisions

A brief note on Statistics for Non-Statisticians

Explore content categories

Recommended by LinkedIn

More articles by Neal Akyildirim

Creative Changes by Target Audience and by Stage.

The Artwork Wins the Click. The Algorithm Wins the Customer.

AI Strategy Approach

Data Quality: Why It Matters More than Ever and How to Build It Right with Dataplex and Gemini

Traditional MLOps vs. MLOps for Generative AI: Navigating the Challenges

Navigating the MLOps Landscape: How It Differs from DevOps and Why It Matters

Data Warehouse Architecture for Marketing Insights

Basic concepts and terms on how machine learns - Neural Networks

AI / ML Development Options with Google Console

A/B Testing

Others also viewed

Bootstrapping Statistics. What it is and why it’s used.

Unlocking Inferential Statistics and Hypothesis Testing

5 basic statistical tests and when to use them

Statistics with Modelling : Power Play ON

Statistics In Data Analysis

Important Steps for Predicting and Explaining Data

Power of Statistics in Analytics

Unveiling the Magic: How Statistics makes Machines Learn on Structured Data

The Statistical Leap: Moving From Descriptive Reports to Inferential Decisions

A brief note on Statistics for Non-Statisticians

Explore content categories