Sampling Distribution Models and the Central Limit Theorem

Sampling Distribution Models and the Central Limit Theorem

After this, you should be able to:

  1.  Derive the correct sampling distribution model when given the population parameters
  2. Correctly apply the Central Limit Theorem to calculate probabilities associated with a sample proportion and sample mean.
No alt text provided for this image

Sampling Distributions

No alt text provided for this image

Parameters; Statistics

In real life parameters of populations are unknown and unknowable. For example, the mean height of US adult (18+) men is unknown and unknowable

Rather than investigating the whole population, we take a sample, calculate a statistic related to the parameter of interest, and make an inference. 

The statistic's sampling distribution tells us how the value of the statistic varies from sample to sample.

DEFINITION: Sampling Distribution

No alt text provided for this image

The sampling distribution of a sample statistic calculated from a sample of n measurements is the probability distribution of values taken by the statistic in all possible samples of size n taken from the same population.

Based on all possible samples of size n.

Constructing a Sampling Distribution

  1. In some cases, the sampling distribution can be determined exactly
  2. In other cases, it must be approximated by using a computer to draw some of the possible samples of size n and drawing a histogram.

Sampling Distribution Models of Sample Proportions

Example: sampling distribution of p cap, the sample proportion

  1. If a coin is fair the probability of a head-on any toss of the coin is p = 0.5 (p is the population parameter)
  2. Imagine tossing this fair coin 4 times and calculating the proportion p cap of the 4 tosses that result in heads (note that p cap = x/4, where x is the number of heads in 4 tosses)
  3. Objective: determine the sampling distribution of p, the proportion of heads in 4 tosses of a fair coin.
No alt text provided for this image

There are 2^4 = 16 equally likely possible outcomes (1 =head, 0 =tail) : (1,1,1,1) (1,1,1,0) (1,1,0,1) (1,0,1,1) (0,1,1,1) (1,1,0,0) (1,0,1,0) (1,0,0,1) (0,1,1,0) (0,1,0,1) (0,0,1,1) (1,0,0,0) (0,1,0,0) (0,0,1,0) (0,0,0,1) (0,0,0,0)

No alt text provided for this image
No alt text provided for this image

E(p cap) = 0*.0625+ 0.25*0.25+ 0.50*0.375 +0.75*0.25+ 1.0*0.0625

= 0.5 = p (the prob of heads)

No alt text provided for this image

Var(p cap)


No alt text provided for this image

The shape of Sampling Distribution of p cap:

No alt text provided for this image

The sampling distribution of p cap is approximately normal when the sample size n is large enough. n large enough means np ≥ 10 and n(1-p) ≥ 10


No alt text provided for this image

Population Distribution, p=.65







No alt text provided for this image

Sampling distribution of p cap for samples of size n



Example

No alt text provided for this image
  1. 8% of the American Caucasian male population is color blind
  2. Use the computer to simulate random samples of size n = 1000


The sampling distribution model for a sample proportion p cap

Provided that the sampled values are independent and the sample size n is large enough, the sampling distribution of p cap is modeled by a normal distribution with E(p cap) = p and standard deviation SD(p cap)

No alt text provided for this image
No alt text provided for this image

where q = 1 – p and where n large enough means np>=10 and nq>=10 The Central Limit Theorem will be a formal statement of this fact.

Example: Binge drinking by college students

  1. A study by the Harvard School of Public Health: 44% of college students binge drink.
  2. At a particular college, 244 students were surveyed; 36% admitted to binge drinking in the past week
  3. Assume the value 0.44 given in the Harvard study is the proportion p of college students that binge drink; that is 0.44 is the population proportion p
  4. Compute the probability that in a sample of 244 students, 36% or less have engaged in binge drinking.
  5. Let p cap be the proportion in a sample of 244 that engage in binge drinking
  6. We want to compute
No alt text provided for this image
No alt text provided for this image

E(p cap) = p = .44: SD(p cap) =

Since np = 244*.44 = 107.36 and nq = 244*.56 = 136.64 are both greater than 10, we can model the sampling distribution of p cap with a normal distribution

No alt text provided for this image
No alt text provided for this image

Example: Snapchat by college students

  1. recent scientifically valid survey: 77% of college students use Snapchat
  2. 1136 college students surveyed; 75% reported that they use Snapchat
  3. Assume the value 0.77 given in the survey is the proportion p of college students that use Snapchat; that is 0.77 is the population proportion p
  4. Compute the probability that in a sample of 1136 students, 75% or fewer use Snapchat
  5. Let p cap be the proportion in a sample of 1136 that use Snapchat
  6. We want to compute 
No alt text provided for this image
No alt text provided for this image

7. E(p) = p = .77; SD(p) =

 8. Since np = 1136*.77 = 874.72 and nq = 1136*.23 = 261.28 are both greater than 10, we can model the sampling distribution of p with a normal distribution

No alt text provided for this image
No alt text provided for this image

Sampling Distribution Models of Sample Means

Another Population Parameter of Frequent Interest: the Population Mean µ

  1. To estimate the unknown value of µ, the sample mean xbar is often used.
  2. We need to examine the Sampling Distribution of the Sample Mean xbar (the probability distribution of all possible
  3. le values of xbar based on a sample of size n)

Example

Professor Stickler has a large statistics class of over 300 students. He asked them the ages of their cars and obtained the following probability distribution.

No alt text provided for this image
  1. SRS n=2 is to be drawn from pop
  2. Find the sampling distribution of the sample mean xbar for samples of size n = 2
  3. 7 possible ages (ages 2 through 8)
  4. Total of 7^2=49 possible samples of size 2
  5. All 49 possible samples with the corresponding sample means and probabilities are:
No alt text provided for this image
No alt text provided for this image

Probability distribution of xbar

No alt text provided for this image

This is the sampling distribution of x-bar because it specifies the probability associated with each possible value of xbar

From the sampling distribution above: P(4 <= xbar <= 6) = p(4)+p(4.5)+p(5)+p(5.5)+p(6) = 12/196 + 18/196 + 24/196 + 26/196 + 28/196 = 108/196

Expected Value and Standard Deviation of the Sampling Distribution of xbar

No alt text provided for this image

Population probability dist E(X)=2(1/14)+3(1/14)+4(2/14)+ … +8(3/14)=5.714 Population mean E(X)= µ = 5.714

Sampling dist. of xbar

No alt text provided for this image

E(Xbar)=2(1/196)+2.5(2/196)+3(5/196)+3.5(8/196)+4(12/196)+4.5(18/196)+5(24/196)+5.5(26/196)+6(28/196)+6.5(24/196)+7(21/196)+7.5(18/196)+8(1/196) = 5.714

Mean of sampling distribution of xbar: E(X) = 5.714

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Sampling Distribution of the Sample Mean Xbar: Example

No alt text provided for this image

An example –A fair 6-sided die is thrown; let X represent the number of dots showing on the upper face. –The probability distribution of X is

No alt text provided for this image

Suppose we want to estimate µ from the mean xbar of a sample of size n = 2 What is the sampling distribution of xbar in this situation?


No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Properties of the Sampling Distribution of xbar:



No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Consequences




THE CENTRAL LIMIT THEOREM (The “World is Normal” Theorem)

No alt text provided for this image

Normal Populations: Important Fact: If the population is normally distributed, then the sampling distribution of x is normally distributed for any sample size n.

No alt text provided for this image

Non-normal Populations: What can we say about the shape of the sampling distribution of xbar when the population from which the sample is selected is not normal?


The Central Limit Theorem (for the sample mean xbar) :

If a random sample of n observations is selected from a population (any population), then when n is sufficiently large, the sampling distribution of xbar will be approximately normal. (The larger the sample size, the better will be the normal approximation to the sampling distribution of xbar.)

The Importance of the Central Limit Theorem

When we select simple random samples of size n, the sample means xbar will vary from sample to sample. We can model the distribution of these sample means with a probability model that is

No alt text provided for this image

How Large Should n Be? To apply the Central Limit Theorem, we will consider a sample size to be large when n > 30

No alt text provided for this image

Summary

No alt text provided for this image

The Central Limit Theorem (for the sample proportion p )

If x “successes” occur in a random sample of n observations selected from a population (any population), then when n is sufficiently large, the sampling distribution of p cap =x/n will be approximately normal (The larger the sample size, the better will be the normal approximation to the sampling distribution of p cap.)

When we select simple random samples of size n from a population with “success” probability p and observe x “successes”, the sample proportions p =x/n will vary from sample to sample. We can model the distribution of these sample proportions with a probability model that is

No alt text provided for this image
No alt text provided for this image

How Large Should n Be? To apply the central limit theorem, we will consider a sample size n to be large when np ≥ 10 and n(1-p) ≥ 10

No alt text provided for this image
No alt text provided for this image

Population Parameters and Sample Statistics

  1. The value of a population parameter is a fixed number, it is NOT random; its value is not known.
  2. The value of a sample statistic is calculated from sample data
  3. The value of a sample statistic will vary from sample to sample (sampling distributions)

Example 1

  1. The probability distribution of 6-month incomes of account executives has mean $20,000 and standard deviation $5,000
  2. A single executive’s income is $20,000. Can it be said that this executive’s income exceeds 50% of all account executive incomes?

ANSWER  No. P(X<$20,000)=? No information given about the shape of the distribution of X; we do not know the median of 6-month incomes.

3. n=64 account executives are randomly selected. What is the probability that the sample mean exceeds $20,500?

No alt text provided for this image

Example 2

  1. A sample of size n=16 is drawn from a normally distributed population with E(X)=20 and SD(X)=8.
No alt text provided for this image

2. Do we need the Central Limit Theorem to solve part a or part b?

No alt text provided for this image

NO. We are given that the population is normal, so the sampling distribution of the mean will also be normal for any sample size n. The CLT is not needed


Example 3

Battery life X~N(20, 10). Guarantee: avg. battery life in a case of 24 exceeds 16 hrs. Find the probability that a randomly selected case meets the guarantee.

No alt text provided for this image

Example 4

No alt text provided for this image

Example 5

12% of students at NCSU are left-handed. What is the probability that in a sample of 100 students, the sample proportion that are left-handed is less than 11%?

No alt text provided for this image
No alt text provided for this image

Thank You !!!


To view or add a comment, sign in

More articles by Ranjith Kumar Ramasamy

  • "t" - Test

    When to use a t-test What type of t-test should I use? Performing a t-test Interpreting test results 1. When to use a…

  • Data Transformation and Types (Non-normal to Normal)

    What is data transformation? Data transformation is a concept that refers to the mathematical function applied to each…

  • Hotelling’s T-square Test for Two Independent Samples

    Univariate case In the univariate case, we have two independent random variables and want to determine whether the…

  • Bayesian Classifier

    Introduction to Classification Example: Teachers classify students as A, B, C, D, and F based on their marks. The…

  • Deriving the Poisson distribution from the Binomial

    Incitement This section shows you how the probability model of a binomial model can be translated across to a Poisson…

    1 Comment

Others also viewed

Explore content categories