
Psy 207B Introduction
to Statistics
Erwin Segal
Back
to syllabus
Homework: (Link
to homework page)
Inferential Statistics and
Sampling Distributions
Inferential statistics
has as its purpose asking questions and making decisions about the world
that extends beyond the data at hand.
-
Does smoking cause lung cancer?
-
Do children taught to read using a phonics method learn faster
than those using a whole word method?
-
Does smoking of marijuana improve the quality of life for
terminal AIDS patients?
-
Does the teaching of sexual abstinence decrease the number
of unwanted pregnancies?
-
Will a particular advertisement increase the sales of Dell
Computers?
Such questions are often answered
by considering possible outcomes of experiments under different possible
condition of the world. For example, if smoking is a cause of lung cancer,
then a greater proportion of people who smoke than those who do not should
develop lung cancer. If smoking is not a cause of lung cancer, then the
proportion of smokers who have lung cancer should not differ from the proportion
of non-smokers who have the disease. By collecting data and using principles
of probability researcher can decide which of these possibilities they
believe to be true. The data collected are considered to be a sample
taken from a much larger population. The population has the "true" properties
that exist in the world. The sample may or may not represent the population
accurately.
In the example
introduced in Chapter 10 of the text, if marijuana aids the appetite of
AIDS patients, then patients should eat more after taking a THC pill than
after taking a placebo. Statistical analyses of data will let us answer
such questions as "Does marijuana aid the appetite of AIDS patients?" rationally.
Three
Important Concepts
Three important
concepts lie beneath most inferential statistics, Population, Sample, and
Sampling distribution.
1.
Populations are the sets of individuals that the statistician
is interested in learning about. They (in some sense) exist in the real
world independent of any direct actions by the researcher, although they
are interested in possible measures on them. Some examples can be heights
of adults in the world; appetites of people with late stage AIDS;
Lung capacity of NFL linemen; Probability of heads in a flip of a coin.
Voting preference of eligible voters in New York; Average number of cigarettes
smoked daily by someone one month before the detection of lung cancer.
Each of these variables in their respective populations could be represented
in a frequency distribution, or frequency density function. The population
distribution can be thought of as having a mean, represented by m,
(the Greek letter mu) and a standard deviation, represented by s,
(the Greek letter sigma). m and s
are often called population parameters.
There is a
major problem that often interferes with the use of population parameters
to help us make statistical decisions. We very seldom know exactly what
the population parameters actually are. One task of inferential statistics
is estimating what some of these parameters are. Another important task
is to guess what some population parameters might be and then see whether
these parameters are consistent with the data collected.
2. Samples
are the source of data actually collected by the statistician. A sample
is a set of individuals taken from the population and the variable of interest
is actually measured on each of them. For statistics one almost always
needs the sample to be a random sample
from the population. A random sample is defined as one in which each individual
in the population has an equal chance of being selected. A sample has a
certain size, represented by N, and other statistics are computed from
it, often the mean,
,
and standard deviation, s.
3. Sampling
Distributions: 'Sampling distribution' is one of the
hardest concepts to understand in statistics. It is a probability distribution
of a statistic where that statistic is computed from each of an infinite
number of random samples generated in the same way as the sample we have
taken from the population. For example, if a random sample with N=10 is
taken from a population with m = 100, and s
= 10, a sampling distribution of means would be the probability distribution
of means computed from an infinite number of random samples with N=10 taken
from that population. If the properties of the population were known, then
the properties of the sampling distribution of a statistic (
's,
s's, proportions of hits, etc.) can be computed.
One critical
issue is that we do not know whether
the population we are interested in has the parameters that we are using
to generate the sampling distribution. It just tells us IF these are the
parameters of the population, THEN the sampling distribution is the probability
distribution of the statistic in mind.
Hypothesis
testing, parameter estimation, and power of tests all flow from the concept
of sampling distributions. In Hypothesis Testing the probability
(or likelihood) of getting outcomes similar to those we actually obtained
if
the hypotheses we make about the populationare true inform the
decisions we make. If our sample has some property, such as a mean, that
is not likely to occur in the sampling distribution it is tested against,
we conclude that the population from which the sampling distribution is
generated is NOT the population from which we have a random sample. We
thus reject the hypothesis that is being tested, and decide that some alternative
hypothesis is correct..
In Parameter
Estimation, we ask what are the least likely sampling distributions our
actual sample statistic could still have a high enough probability of occurring
randomly for us to say it may have come from that population. These least
likely sampling distributions set the outside bounds of our parameter estimation.
We know that our population parameter most likely, to a specified probability
falls within these bounds. We claim that the true population parameter
lies within these bounds.
In evaluating
the Power of a test, we ask how likely we are to get a random sample that
is far enough in the tails of our tested distribution (the null hypothesis)
to decide that it is false when in reality it is false, i.e., when the
sampling distribution we are REALLY sampling from is NOT the sampling distribution
we are testing. We know that the more powerful the test, the more likely
we are to reject FALSE (null) hypotheses.
These are
the things that we do in inferential statistics.
Populations and samples
-
A population is a set of individuals defined according to
some principle (e.g. coin flips, smokers, people in America, DNA molecules
in a fruit fly).
-
Each individual in a population has the potential to be measured
in one or more ways (e.g. heads or tails, number of cigarettes smoked in
a day, how many calories consumed in a given day, whether she is pregnant,
how many white blood cells/ml)
-
One can conceive of a distribution of these measures in the
population. The distribution must have a mean, represented by m,
and
a standard deviation, represented by
s. In statistics,
we estimate some of the properties of the whole population, by using the
properties of a sample of the measures and a knowledge of probability.
-
A sample is a subset of N scores selected from that population.
This sample will also have a mean =
and a standard deviation = s.
-
Many samples of size N have the possibility of being selected
from the population. For each of them statistics can be computed.
Sampling Distributions
-
An experiment is a selection of a sample of n individuals
from a population and measuring each individual in the sample in some way.
This sample of measures is a single sample point in a sampling distribution
of all such samples.
-
A sampling distribution is the
distribution of a statistic computed on each possible sample of size N
selected from a population. E.g. the sampling distribution of means is
the distribution of the means of these samples. Assume that I take a sample
of five people from this class and computed the mean exam grade, and then
take another sample of five people, and so on until the means of all such
samples are computed. If I then make a frequency distribution of the means
of these samples, that frequency distribution would be the sampling distribution
of the means.
-
It may actually be impossible to select each possible sample,
measure each individual in it and compute the appropriate statistic. Often
this requires selecting an infinite (or astronomically large) number of
samples. Our classroom example would require over 250 million samples and
sample means.
-
Inferential statistics depends on mathematical principles
evaluating computed statistics based on a small number of samples (often
only 1) against the probabilities of obtaining similar samples from theoretical
sampling distributions.
-
A random sample of size
N is one of the samples of size N that can come from the population. If
it is truly a random sample every one of the possible samples had the same
likelihood of being selected.
Let us look at the mathematical relationship between the
properties of sampling distributions
and properties of the populations they
derive from.
-
If a population has a mean m, and
a standard deviation s, then the sampling distribution
of means has the same mean, m, now called the
expected
value of the mean,
and a standard deviation,
,
called the standard error of the mean.
-
states that the
mean of the means of the samples is equal to the mean of the population.
This also is the reason that the mean is called an unbiased estimator.
Sampling
distributions of the Mean
Consider a population, F, the
distribution of IQ’s. It's a normal distribution with a mean, m
= 100, and standard deviation, s = 15.
Let's randomly select an individual from F,
and
compute its mean. This is the same experiment as randomly selecting an
individual and reporting her measure. What would you expect this "mean"
to be? It is expected to be 100.
That
is because on the average if this "experiment" is run many, many times
and the distribution of means (scores) is plotted, it would have m=100.
(This is our assumption in designing the problem. The distribution would
also have a standard deviation, or if we think of each measure as a sample
of one, a standard error of the mean,
.
This means that if you guessed the score as 100, you would miss your guess
on the average, roughly speaking, by about 15 IQ points. Thus, a sampling
distribution of the mean with n = 1 gives a distribution which looks like
the distribution of the population, but can now be thought of as a probability
distribution.
What if for an experiment we randomly sample 2 people and
computed
? You would again
predict
to be 100, because
=
100.
With n=2 on the average your prediction would be closer
than with n=1. Since the standard error of the sampling distribution of
means
=15/Ö
2, you would miss your estimate on the average by only 10.61. This sampling
distribution has the same mean as the population, but a smaller standard
error (standard deviation).
If we randomly sample 10 people in our experiment, we
would again expect
to be
100. But on the average we would be off by only 15/Ö
10 = 4.74.
-
As the size of the sample goes up the standard error of the
mean goes down. At n = infinity,
s/Ö infinity =
0 and
=m.
-
This is one part of the Central
Limit Theorem. The other being that the sampling distribution
approaches the normal distribution as n gets larger.
A sampling distribution
is a probability distribution. If it were normally distributed, about 68%
of the means of random samples would be within one standard error of the
mean of the population; 95% of the sample means would be within about 2
standard errors of the population mean. Only about 5 out of every 100 samples
would have a mean greater than 2 (actually 1.96) standard errors from the
mean.
Hypothesis
testing 1
One can ask whether a random sample
is a sample from a hypothesized sampling distribution by considering probabilities
of events. If it is an
unlikely
point in a distribution we can assume that it did not
come
from that sampling distribution.
-
The assumption that underlies all of statistics is that more
probable events are more likely to occur than less probable events.
-
Some examples…Is a coin unbiased? That is do the flips, or
Bernoulli trials, come from a population where p(heads) = .5?
-
Do open book exams improve exam scores?
-
Do more people prefer Coca Cola or Pepsi Cola?
-
Does marijuana improve appetite of cancer patients?
-
In standard hypothesis testing we identify a meaningful theoretical
distribution which is called the Null Hypothesis
(H0). We propose for the sake
of argument that this hypothesis accurately describes the population from
which our sample comes. A sampling distribution of interest is derived
from that hypothesis. We then ask whether the data we have from our
(hopefully) random sample is likely to have come from that distribution.
If
the sample, or one like it, has a reasonably high probability to come from
the population defined by H0, we conclude H0 may be true. If
the probability of getting a sample like the one we got is too low, we
conclude that H0 is in error, and it does not truly describe
reality.
-
For coin flips H0 states that the coin is unbiased.
But H0 may be in error.
It may be that another hypothesis, H1, the hypothesis that the
coin is biased, is true. This particular coin may be unbalanced, and in
the long run could come up heads at some proportion other than .5. It may
appear 10, 60, 85 or even 90 percent of the time if it is flipped many
times.
How can we test whether H0 is in error?
-
If we can identify a sampling distribution, which we want
to test by identifying its mean, m, and its
standard deviation, s. H0 is the
hypothesis that the sample we have comes from that distribution. We can
then test if the actual results are reasonably likely under the conditions
that H0 is true.
-
One standard technique is to accept H0 as reasonable
if a discrepancy between
,
the mean of the sample and m the mean of the
sampling distribution per hypothesis, or larger is likely to occur more
than once in 20 experiments (e.g. 5% chance).
-
The technical terminology states that we will set a
at the .05 level.
-
a is defined as the probability
of rejecting a true null hypothesis.
Example of Hypothesis testing
We want to test whether a coin is truly fair. The null
hypothesis H0 is that a sample of coin flips with this coin
would come from a sampling distribution with p(Heads) = .5. Let us test
with
a = .05
-
We need to generate a sampling distribution for n Bernoulli
trials of coin flips of a fair coin. Let's consider n=100.
-
The sampling distribution of the number of Heads is the binomial
distribution. We know for the distribution of number of heads, or for any
distribution of Bernoulli trials, m = np and
.
In this case m= 100*(.5) = 50 and
.
-
We need to do the experiment. That is, we need to flip the
coin 100 times. Let's assume that we did this, and got 58 Heads.
-
If H0 is true, the expected value, E(
)=50.
Is it reasonable that we got 58? Let's find out. We know that the binomial
is very close to a normal distribution, so we can compute a z score and
evaluate this result by using the normal tables.
-
We can find a z score and see the likelihood of getting a
score this large by chance.
-
What is the z-score of 58 out of 100 heads?

Looking this up in the z tables, we see that a z score this
large or larger has a probability of .0548. Since we had no reason to think
that heads was more likely than tails, we use a two tailed test. That is,
we will consider how far from a 50-50 split our data is regardless of direction.
There is a 5.48% chance to have a z score as large as 1.60 and there is
an equal chance of having a z score as small as -1.60 if H0 is
true. Thus we could have a score this deviant over 10% of the time by chance
alone. This is not considered far enough from the expected value to reject
H0.