One sample t-tests
Independent sample t-tests
Correlated sample or direct difference
t-tests
t-tests for significance of r
In order to do statistical research we need to consider sampling distributions of several different kinds. These sampling distributions are probability distributions of a statistic computed from samples of a certain size taken from a hypothesized population. The null hypothesis, H0, is that the data in the experiment comes from the hypothesized population. By statistical analysis using z tests, t tests, F tests, or other tests probably not covered in this class, we evaluate whether it is reasonable to assume that the experimental data we have, came from the hypothesized population. Alpha is our mathematical definition of reasonable. If, under the condition that H0 is true, the probability of our data is less than a we reject H0 and conclude that it was false. If under the condition that H0 is true the probability is greater than a we that the sample statistic could come from we accept H0.
Usually when we do research we use the empirical data that we collect to help us make all of our decisions. We almost never know the mean or the standard deviation of the population or populations of interest. Therefore when we usually set up our H0's and H1's based on certain experimental designs.
The basic idea of inferential statistics is that our experimental data can be treated as coming from a random sample from some population. We have to know some of the properties of the sampling distributions that spawn our sample statistics.
1. Sampling distributions
of means: Consider a population of scores which has a mean, m,
and a standard deviation, s. If a mean, ,
is computed from a random sample taken from the population it can be thought
of as an estimate of the population mean. Each random sample, however,
will have its own mean; the means will differ from each other due to differences
based on random variation. A distribution of sample means of the same size,
N, is called a sampling distribution of means. If the original population
is large, or N is large, the sampling distribution of means is distributed
normally, with its mean equal to the mean of the population, m.
The mean of a sampling distribution of a statistic is called the expected
value of that statistic. Mathematically,
.
We say the expected value of
equals m. We can use
to estimate m. Because
is called an unbiased
estimator of the population mean. Interestingly,
a sample mean is not likely to be very different from the population mean,
especially if the sample size is large. We can use probability theory to
estimate how far from m a sample mean is likely
to be.
The standard deviation of the sampling
distribution of means depends on the standard deviation of the original
population and the size of the samples. This standard deviation is usually
called the standard error of the mean,
2. Sampling
distributions of variances or standard deviations: The variance and
the standard deviation of a population can also be estimated by computations
based on sample scores. You may not remember, but the variance of a group
is theoretically defined as the average of the squared deviations from
the mean , and the standard deviation as the positive square root of the
variance. By definition a variance is the sum of squared deviations
divided by the number of scores. We can compute a sample variance
for many samples of size N from a population and thus generate a sampling
distribution of variances. However, the mean of this sampling distribution,
(i.e. the expected value) does not equal
the variance of the population but is slightly too small. Thus the variance
of a sample computed this way, i.e. dividing the sum of squares by N, is
a biased estimate of the population variance. We have defined
as the sample variance because
.
Therefore the mean of a sampling distribution of
=
.
More critically, the estimate of
the standard error of a sampling distribution
of means based on a sample mean is .
We can think of the t distributions as sampling distributions of the t statistic computed from each random sample drawn from a particular population. The properties of the t distributions are well known and there are tables totally analogous to the z tables where the probabilitiy of getting t values within any area under the curve can be calculated. Specific values are in Table D on Page 527 in our text.


• Often we have a hypothesis for m,
but not for s, e.g. IQ should be greater than
100
• We often must estimate the standard error of the sampling
distribution of means because we have no mathematical principle by which
it can be derived.
• If we don't know s we cannot
convert our distribution to a unit normal distribution.
• The t-distributions are a substitute for the unit normal
distribution.
• T scores are computed in almost the same way that z-scores
are
• They are used instead of z when z seems to be appropriate,
but the variance of the distribution has to be estimated from the data
Properties of t distributions
• All t’s are symmetrical and they have m
= 0, and s = 1, but
• t-distributions are not normal in form
• they are steeper in the middle and have a greater percent
of the area in the tails. (leptokurtic)
• There are many t-distributions, which have somewhat
different shapes. They differ dependent upon a parameter called degrees
of freedom or df.
• This parameter is based on the number of independent
or freely selected entries in the variance used in the computation of the
standard error
• The fewer the df the more leptokurtic the distribution.
That is, a greater percent of the area is in the tails of t’s with fewer
df.
• As df approach infinity the t-distributions approach
the normal distribution in form.
• a is the proportion of the
area in the tails of the distributions
Direct difference t-test: Homework












t-test
for significance of r Homework
There is one more kind of t-test to
learn. This one does not compare differences between means, but is more
like a one sampe t-test. It evaluates whether a Pearson correlation coefficient
could reasonably come from a population in which the correlation between
the two sets of numbers is zero. A random sample of correlations of size
n taken from a population with r=0 divided by the standard error based
on that computed correlation is distributed in a t distribution with n-2
degrees of freedom. If the computed t is close enough to zero that a difference
that large would occur with a probability greater than a
we accept Ho. If r is so large that it would occur with a probability less
than a we would reject Ho and conclude that there is a real relation between
the two sets of numbers. We can compute r and then compute t.

