Chapter 8 Estimation and Confidence Intervals

Homework #8 (Hilary Term Week 3): Chapter 8, Exercises 2, 8, 14, 16, 24 & 26.

A Very Brief Review

Chapter 6: The Normal Probability Distribution

X_i ~ N(m, s²)

Þ P(m - 1.96.s < X_i < m + 1.96.s) = 0.95

Chapter 7: Sampling Methods and the Central Limit Theorem

The population parameters m and s² were assumed known and the objective was to form some conclusions about possible values of the sample mean .

X_i ~ N(m, s²) {or X_i ~ ?(m, s²) with n > 30}

Þ ~ N(m, s²/n)

Þ P(m - 1.96.s/Ön < < m + 1.96.s/Ön) = 0.95

Estimation

A more interesting question to ask is: given the values of (or s²), what can be said about the population parameters m (or s²)?

Point Estimation: A single value is used to provide the best estimate of the parameter of interest.

Interval Estimation: "Interval estimates are better for the consumer of the statistics, since they not only show the estimate of the parameter but also give an idea of the confidence which the researcher has in that estimate."

Estimation: Large Sample Size (n > 30)

~ N(m, s²/n) (for all distributions of x_i!)

Þ P(m - 1.96.s/Ön < < m + 1.96.s/Ön) = 0.95

Rearranging this inequality,

1. (m - 1.96.s/Ön < ) Þ (m < + 1.96.s/Ön)

2. (< m + 1.96.s/Ön) Þ (- 1.96.s/Ön < m)

The interval [- 1.96.s/Ön < m < + 1.96.s/Ön] is referred to as the 95% confidence interval for m.

The interval [- 1.64. s/Ön < m < + 1.64. s/Ön] is referred to as the 90% confidence interval for m.

The greater the degree of confidence required, the wider the confidence interval has to be.

s² Unknown?

Replace s² with s²(as long as n > 30)

The interval [- 1.96.s/Ön < m < + 1.96.s/Ön] is referred to as the 95% confidence interval for m.

The interval [- 1.64.s/Ön < m < + 1.64.s/Ön] is referred to as the 90% confidence interval for m.

Estimation: Small Sample Size

Importance of Large Sample Size

1. Central Limit Theorem: Sampling distribution of the sample mean could be assumed to be Normally distributed.

Z = ~ N(0,1)

2. Unknown s²: Replace s² with s²

Z = ~ N(0,1)

Small Sample Size (n < 30)?

1. Given that the Central Limit Theorem can no longer be used, we must know (or simply assume/hope) that underlying distribution is Normally distributed.

X_i ~ N(m, s²) Þ ~ N(m, s²/n) Þ Z = ~ N(0,1)

2. Claim: If the population is Normally distributed, the following statistic,

T = ~ t_n-1

has a distribution called the t distribution with n-1 degrees of freedom.

Student's t distribution

"Student" was Gosset's pseudonym (Guinness brewery, Dublin)

The shape of the t distribution depends on the number of the degrees of freedom (= n – 1). The t distribution is similar in appearance to the standard normal (Z) distribution in that it is symmetric about zero. For small sample sizes, it has wider (i.e. fatter) tails than the standard normal distribution. For n > 25 or 30, there is little/no difference between the t distribution and the z distribution.

Use of t Table(s)

The interval [- t_0.025,n-1.s/Ön < m < + t_0.025,n-1.s/Ön] is referred to as the 95% confidence interval for m.

Examples:

n = 10 Þ (n – 1) = 9 Þ t_0.025,n-1 = 2.262

n = 20 Þ (n – 1) = 19 Þ t_0.025,n-1 = 2.093

n = ¥ Þ (n – 1) = ¥ Þ t_0.025,n-1 = 1.96

Example: Given the sample data, = 40, s = 10 and n = 36, calculate the 99% confidence estimate of the population mean m. If the sample size were 20, how would the method of calculation and width of the interval be altered?

n = 36: The 99% confidence interval for m is [- t_0.005,n-1.s/Ön < m < + t_0.005,n-1.s/Ön] =

[40 – 2.75.10/6 < m < 40 + 2.75.10/6] = [35.42, 44.58]

or as n > 25 or 30: The 99% confidence interval for m is [- 2.57.s/Ön < m < + 2.57.s/Ön] =

[40 – 2.57.10/6 < m < 40 + 2.57.10/6] = [35.72, 44.28]

n = 20: The 99% confidence interval for m is [- t_0.005,n-1.s/Ön < m < + t_0.005,n-1.s/Ön] =

[40 – 2.861.10/Ö20 < m < 40 + 2.861.10/Ö20] = [33.60,

46.40]

Estimating a Proportion

p: proportion of the population that has a particular characteristic, e.g. unemployed, FF voter, …

p: proportion of a sample that has a particular characteristic, e.g. unemployed, FF voter, …

n: sample size

Review: The Binomial Distribution

n: number of trials

x: number of "successes" within n trials

p: probability of "success" in any individual trial

(1-p): probability of "failure" in any individual trial

P(x) = ⁿC_xp^x(1-p)^n-x

Claims:

1. E(x) = np (intuitive)

2. Var(x) = np(1-p) (not so intuitive)

See previous notes for proofs.

x ~ B(np, np(1-p)) but remember (see previous notes if necessary)

x ~ N(np, np(1-p)) [if np > 5 and n(1-p) > 5)]

Estimating a Proportion

Sample proportion = number of "successes"

number of trials

i.e., p = x/n

p ~ ?(?,?)

· x ~ N(np, np(1-p)) and p is a linear transformation of x

Þ p ~ N(?,?)

· E(p)?

E(p) = E(x/n) = E(x)/n = np/n = p (as expected)

· Var (p)?

Var(p) = Var(x/n) = Var(x)/n² = np(1-p)/n²= p(1-p)/n

= p(1-p)/n

Therefore,

p ~ N(p, p(1-p)/n)

Example: Given the sample data p = 0.4, n = 50, estimate the 99% confidence interval estimate of the true proportion.

p ~ N(p, p(1-p)/n)

Therefore, the 99% confidence interval for p can be written down as:

[p – 2.57{p(1-p)/n}^0.5, p + 2.57{p(1-p)/n}^0.5]

[0.22, 0.58]

Note: The known p(1-p)/n is being used as a replacement for the unknown p(1-p)/n.

A Very Brief Review

Chapter 6: The Normal Probability Distribution

Chapter 7: Sampling Methods and the Central Limit Theorem

Estimation

Interval Estimation: "Interval estimates are better for the consumer of the statistics, since they not only show the estimate of the parameter but also give an idea of the confidence which the researcher has in that estimate."

Estimation: Large Sample Size (n > 30)

Estimation: Small Sample Size

Small Sample Size (n < 30)?

Student's t distribution

Use of t Table(s)

Examples:

n = 10 Þ (n – 1) = 9 Þ t0.025,n-1 = 2.262

n = 20 Þ (n – 1) = 19 Þ t0.025,n-1 = 2.093

Estimating a Proportion

Review: The Binomial Distribution

Estimating a Proportion

n = 10 Þ (n – 1) = 9 Þ t_0.025,n-1 = 2.262

n = 20 Þ (n – 1) = 19 Þ t_0.025,n-1 = 2.093