When working through a tutorial, don't use the scrollbar. Move around the document by clicking on the hotlinks instead.
All these problems can be modelled as questions about the mean of certain
random variables on large populations. THEORETICALLY, the mean of such a
random variable can be determined by recording the values of the random
variable for each member of the population, adding up the results, and
dividing the sum by the number of observations. Presidential elections are
an attempt to apply exactly this procedure to settle the first question.
However, in most cases it is too impractical and expensive to answer questions
of this kind by determining all values of the random variables involved.
SAMPLING is a less expensive way of solving problems as the above. In sampling
a random variable X, only the values of X for members of the sample need to
be recorded. Then the mean for the sample taken is calculated; and it is hoped
that this mean will be a good approximation to the mean of the whole
population. In developing sampling techniques, statisticians have to answer
the following question:
"How good of an approximation to the population mean is the sample mean likely to be?"
It seems intuitively clear that larger samples should yield better
approximations than smaller ones. But, for a given sample size, how good is
the approximation? And, conversely, if we need a reliable approximation of the
population mean within, say 3% of the actual value, how large a sample should
be taken?
In order to answer these questions, we need to introduce some precise
mathematical terminology. There are basically two ways of conducting sampling:
sampling with replacement, and sampling without replacement. In SAMPLING WITH
REPLACEMENT, the same member of the population can be included in the sample
more than once; in SAMPLING WITHOUT REPLACEMENT, each member of the population
can be included in the population at most once. To understand the origin of
this terminology, consider the following situation: Suppose you have an urn
with a large number of balls in it. Each ball is either black or
white. You want to use sampling of ten balls
to get an idea of the percentage of black
balls in the urn. In sampling with replacement, you repeat the following
procedure 10 times: You pick randomly a ball from the urn, record its color,
and put it back into the urn ("replace it"). In sampling without replacement,
you repeat the following procedure 10 times:
You pick randomly a ball from the urn, record its color,
and do NOT put it back into the urn (you put it onto a pile on the side, for
example). If the sample size is very small relative to the population size,
it really does not matter whether you use sampling with or without replacement,
but if the sample size is of the same order of magnitude as the population size,
it does matter which of the two methods is used.
Suppose the population has size 1000. How many different samples of size 5 can be taken if the sampling is done without replacement?
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
In sampling without replacement, the samples are combinations of elements
of the population.
Sampling has a variety of uses, but we are only concerned with sampling as
a tool to estimate the mean of a random variable X. The population on which
X is defined is often called the PARENT POPULATION. The mean of X on the parent
population is called the POPULATION MEAN and is denoted by m, the standard
deviation of X on the parent population is called the POPULATION STANDARD
DEVIATION, and is denoted by s.
Now let us suppose that samples of size
n are taken. Each such sample has a SAMPLE MEAN that usually differs
from sample to sample and from the population mean m. Thus the sample mean
is a new random variable. It is denoted by X with a bar on top (for
typographical reasons, in this tutorial we will use X' instead).
Note that the probability
space on which X' is defined is not the parent population, but the set
of all possible samples of size n. The mean of this new random variable
will be
denoted by mX' , and its standard deviation is denoted by
sX' . The standard deviation of X' is often called
the STANDARD ERROR, since it tells us how far off the sample mean typically
is from the population mean that it is supposed to approximate.
If the sample size is n and the population size is N, then
we have the following formulas for the sample mean and the standard error:
mX' = m, no matter whether we sample with or without replacement;
sX'2 = s2/n, if the sampling is done with replacement;
Note that these formulas nicely correspond to our intuitions about sampling:
The first tells us that if we average the sample means for all possible
samples, then we get exactly the population mean. The denominator n in
the formula for the variance of the sample mean
implies that the larger the sample size,
the smaller this variance will be, that is, the closer a typical sample
mean will be to the population mean. If the sample size
n is very small relative to the population size N, then
the difference between sampling with
or without replacement will be negligable.
The amount of liquid in a large Coke bottle (in liters) is a random variable with mean 2 and standard deviation 0.05. Out of a consignment of 1000 Coke bottles, 16 are sampled for the amount of liquid they contain. What is the standard deviation of the sample mean?
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
In the above example, the mean was known beforehand. In real applications, the mean is not known, and sampling is used to get an approximate idea how large this unknown mean is. For example, sampling might be used for quality control of consignments of Coke bottles: If the sample mean is too low, the consignment should be rejected. But how low is "too low"? After all, a low sample mean could be purely a result of an unlucky choice of the sample, even if the mean for the consignment is exactly 2 liters, as it should be. So we want to be able to answer questions like: "If a sample of size 16 is drawn from a population with unknown mean, what is the probability that the sample mean is more than 0.1 units smaller than the population mean?"
In order to answer such questions, we need to know the probability
distribution of the sample mean. The Central Limit Theorem tells us that
if samples of size n are drawn from practically any population, and if n is
large, then the sample mean is a normally distributed random variable with
mean m and standard deviation s dividec by the square root of
the sample size n, where m is the population
mean and s is the population standard deviation.
The amount of liquid in a large Coke bottle (in liters) is a random variable with mean 2 and standard deviation 0.05. Out of a consignment of 1000 Coke bottles, 16 are sampled for the amount of liquid they contain. What is the probability that the sample mean is less than 1.99?
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Here m = 2 and s = 0.05. Let X' denote the sample mean. We want to find P(X' < 1.9). Since X' has an approximately normal distribution with a mean of 2 and a standard deviation of 0.05/4 = 0.0125, the latter can be expressed in terms of the standard normal random variable Z as P(Z < (1.99 - 2)/0.0125}).
Suppose the mean number of miles driven by the owner of a Nissan in one year is 14,000, with a standard deviation of 1,000 miles. If 400 Nissan owners are polled about their milage in 1999, what is the probability that the sample mean will be between 13,900 and 14,100 miles?
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Please try again.
Sorry, you chose the wrong answer.
Here m = 14,000 and s = 1,000.
Let X' denote the sample mean.
We want to find P(13,900 < X' < 14,100).
Since X' has an approximately normal distribution with a mean of
14,000 and a
standard deviation of 1,000/20 = 50, the latter can be expressed in
terms of the standard normal random variable Z as P(-2 < Z < 2).
Congratulations, you have successfully completed the last tutorial!
Good luck on the final!
© Winfried Just
Last modified November 14, 1999.
This tutorial was compiled by WJ TUTORIALMAKER 0.21, a program written by Winfried Just.