Lorem ipsum dolor sit amet, consectetur adipisicing elit. Odit molestiae mollitia laudantium assumenda nam eaque, excepturi, soluta, perspiciatis cupiditate sapiente, adipisci quaerat odio voluptates consectetur nulla eveniet iure vitae quibusdam? Excepturi aliquam in iure, repellat, fugiat illum voluptate repellendus blanditiis veritatis ducimus ad ipsa quisquam, commodi vel necessitatibus, harum quos a dignissimos.
Close Save changesHelp F1 or ? Previous Page ← + CTRL (Windows) ← + ⌘ (Mac) Next Page → + CTRL (Windows) → + ⌘ (Mac) Search Site CTRL + SHIFT + F (Windows) ⌘ + ⇧ + F (Mac) Close Message ESC
The methods of the last page, in which we derived a formula for the sample size necessary for estimating a population proportion \(p\) work just fine when the population in question is very large. When we have smaller, finite populations, however, such as the students in a high school or the residents of a small town, the formula we derived previously requires a slight modification. Let's start, as usual, by taking a look at an example.
A researcher is studying the population of a small town in India of \(N=2000\) people. She's interested in estimating \(p\) for several yes/no questions on a survey.
How many people \(n\) does she have to randomly sample (without replacement) to ensure that her estimates \(\hat
\) are within \(\epsilon=0.04\) of the true proportions \(p\)?
We can't even begin to address the answer to this question until we derive a confidence interval for a proportion for a small, finite population!
An approximate (\((1-\alpha)100\%\) confidence interval for a proportion \(p\) of a small population is:
We'll use the example above, where possible, to make the proof more concrete. Suppose we take a random sample, \(X_1, X_2, \ldots, X_n\), without replacement, of size \(n\) from a population of size \(N\). In the case of the example, \(N=2000\). Suppose also, unknown to us, that for a particular survey question there are \(N_1\) respondents who would respond "yes" to the question, and therefore \(N-N_1\) respondents who would respond "no." That is, our small finite population looks like this:
If that's the case, the true proportion (but unknown to us) of yes respondents is:
while the true proportion (but unknown to us) of no respondents is:
Now, let \(X\) denote the number of respondents in the sample who say yes, so that:
if \(X_i=1\) if respondent \(i\) answers yes, and \(X_i=0\) if respondent \(i\) answers no. Then, the proportion in the sample who say yes is:
Then, \(X=\sum\limits_^n X_i\) is a hypergeometric random variable with mean:
It follows that \(\hat
=X/n\) has mean \(E(\hat
)=p\) and variance:
Then, the Central Limit Theorem tells us that:
follows an approximate standard normal distribution. Now, it's just a matter of doing the typical confidence interval derivation, in which we start with a probability statement, manipulate the quantity inside the parentheses, and substitute sample estimates where necessary. We've done that a number of times now, so skipping all of the details here, we get that an approximate \((1-\alpha)100\%\) confidence interval for \(p\) of a small population is:
By the way, it is worthwhile noting that if the sample \(n\) is much smaller than the population size \(N\), that is, if \(n
and the confidence interval for \(p\) of a small population becomes quite similar to the confidence interval for \(p\) of a large population:
A researcher is studying the population of a small town in India of \(N=2000\) people. She's interested in estimating \(p\) for several yes/no questions on a survey.
How many people \(n\) does she have to randomly sample (without replacement) to ensure that her estimates \(\hat
\) are within \(\epsilon=0.04\) of the true proportion \(p\)?
Now that we know the correct formula for the confidence interval for \(p\) of a small population, we can follow the same procedure we did for determining the sample size for estimating a proportion \(p\) of a large population. The researcher's goal is to estimate \(p\) so that the error is no larger than 0.04. That is, the goal is to calculate a 95% confidence interval such that:
\(\hat
\pm \epsilon=\hat
\pm 0.04\)
Now, we know the formula for an approximate \((1-\alpha)100\%\) confidence interval for a proportion \(p\) of a small population is:
So, again, we should proceed by equating the terms appearing after each of the above \(\pm\) signs, and solving for \(n\). That is, equate:
and solve for \(n\). Doing the algebra yields:
That looks simply dreadful! Let's make it look a little more friendly to the eyes:
where \(m\) is defined as the sample size necessary for estimating the proportion \(p\) for a large population, that is, when a correction for the population being small and finite is not made. That is:
Now, before we make the calculation for our particular example, let's take a step back and summarize what we have just learned.
Estimating a population proportion \(p\) of a small finite populationThe sample size necessary for estimating a population proportion \(p\) of a small finite population with \((1-\alpha)100\%\) confidence and error no larger than \(\epsilon\) is:
is the sample size necessary for estimating the proportion \(p\) for a large population.