Chapter 8 Estimation and
Confidence Intervals
Homework #8 (Hilary Term Week 3): Chapter 8, Exercises 2, 8, 14, 16, 24 & 26.
Xi ~
N(m, s2)
Þ P(m - 1.96.s < Xi <
m + 1.96.s) =
0.95
The population parameters
m and s2 were assumed known and the
objective was to form some conclusions about possible values of the sample mean
.
Xi ~
N(m, s2) {or Xi ~
?(m, s2) with n >
30}
Þ ~
N(m, s2/n)
Þ P(m - 1.96.s/Ön < <
m + 1.96.s/Ön) =
0.95
A more interesting question
to ask is: given the values of (or
s2), what can be said about the population parameters
m (or s2)?
Point Estimation: A single
value is used to provide the best estimate of the parameter of
interest.
~ N(m, s2/n) (for all distributions
of xi!)
Þ P(m - 1.96.s/Ön < <
m + 1.96.s/Ön) =
0.95
Rearranging this
inequality,
1. (m - 1.96.s/Ön < ) Þ (m <
+
1.96.s/Ön)
2. (< m + 1.96.s/Ön) Þ (
- 1.96.s/Ön < m)
The interval [- 1.96.s/Ön < m <
+
1.96.s/Ön] is referred to as the 95%
confidence interval for m.
The interval [- 1.64. s/Ön < m <
+ 1.64.
s/Ön] is referred to as the 90%
confidence interval for m.
The greater the degree of
confidence required, the wider the confidence interval has to
be.
s2
Unknown?
Replace s2 with s2 (as long
as n > 30)
The interval [- 1.96.s/Ön < m <
+
1.96.s/Ön] is referred to as the 95%
confidence interval for m.
The interval [- 1.64.s/Ön < m <
+
1.64.s/Ön] is referred to as the 90%
confidence interval for m.
Importance of Large Sample
Size
1. Central Limit Theorem:
Sampling distribution of the sample mean could be assumed to be Normally
distributed.
Z = ~ N(0,1)
2. Unknown s2: Replace s2 with
s2
Z = ~ N(0,1)
1. Given that the Central
Limit Theorem can no longer be used, we must know (or simply assume/hope) that
underlying distribution is Normally distributed.
Xi ~
N(m, s2) Þ ~
N(m, s2/n) Þ Z =
~ N(0,1)
2. Claim: If the
population is Normally distributed, the following
statistic,
T = ~
tn-1
has a distribution called
the t distribution with n-1 degrees of freedom.
"Student" was Gosset's
pseudonym (Guinness brewery, Dublin)
The shape of the t
distribution depends on the number of the degrees of freedom (= n – 1). The t
distribution is similar in appearance to the standard normal (Z) distribution in
that it is symmetric about zero. For small sample sizes, it has wider (i.e.
fatter) tails than the standard normal distribution. For n > 25 or 30, there
is little/no difference between the t distribution and the z
distribution.
The interval [- t0.025,n-1.s/Ön < m <
+
t0.025,n-1.s/Ön] is referred to as the 95%
confidence interval for m.
n = ¥ Þ (n – 1) = ¥ Þ t0.025,n-1 =
1.96
Example: Given the sample
data, = 40, s = 10 and
n = 36, calculate the 99% confidence estimate of the population mean
m. If the sample size were
20, how would the method of calculation and width of the interval be
altered?
n = 36: The 99% confidence
interval for m is [- t0.005,n-1.s/Ön < m <
+
t0.005,n-1.s/Ön]
=
[40 – 2.75.10/6 <
m < 40 + 2.75.10/6] =
[35.42, 44.58]
or as n > 25 or 30: The
99% confidence interval for m is [- 2.57.s/Ön < m <
+
2.57.s/Ön]
=
[40 – 2.57.10/6 <
m < 40 + 2.57.10/6] =
[35.72, 44.28]
n = 20: The 99% confidence
interval for m is [- t0.005,n-1.s/Ön < m <
+
t0.005,n-1.s/Ön]
=
[40 – 2.861.10/Ö20 < m < 40 +
2.861.10/Ö20] = [33.60,
46.40]
p: proportion of the
population that has a particular characteristic, e.g. unemployed, FF voter,
…
p: proportion of a sample
that has a particular characteristic, e.g. unemployed, FF voter,
…
n: sample
size
n: number of
trials
x: number of "successes"
within n trials
p: probability of "success"
in any individual trial
(1-p): probability of "failure"
in any individual trial
P(x) = nCx
px (1-p)n-x
Claims:
1. E(x) = np
(intuitive)
2. Var(x) = np(1-p) (not so
intuitive)
See previous notes for
proofs.
x ~ B(np, np(1-p)) but remember (see
previous notes if necessary)
x ~ N(np, np(1-p)) [if np > 5 and
n(1-p) >
5)]
Sample proportion =
number of "successes"
number of
trials
i.e., p =
x/n
p ~
?(?,?)
· x ~ N(np, np(1-p)) and p is a linear
transformation of x
Þ p ~
N(?,?)
·
E(p)?
E(p) = E(x/n) =
E(x)/n = np/n = p (as
expected)
· Var
(p)?
Var(p) = Var(x/n) =
Var(x)/n2 = np(1-p)/n2 =
p(1-p)/n
= p(1-p)/n
Therefore,
p ~ N(p, p(1-p)/n)
Example: Given the sample
data p = 0.4, n = 50, estimate the 99% confidence interval estimate of
the true proportion.
p ~ N(p, p(1-p)/n)
Therefore, the 99%
confidence interval for p can be written down
as:
[p –
2.57{p(1-p)/n}0.5, p +
2.57{p(1-p)/n}0.5]
[0.22,
0.58]
Note: The known
p(1-p)/n is being used as a replacement for the unknown
p(1-p)/n.