This paper introduces three foundational concepts in inferential statistics. First, it explains the normal distribution — its bell-shaped, symmetrical form and the role of z-scores in measuring distance from the mean. Second, it discusses the central limit theorem, describing how sample means tend toward a normal distribution as sample size increases, along with the requirements for simple random sampling. Third, it covers point estimates and confidence intervals, explaining how margin of error and confidence levels are used to estimate unknown population parameters. Concrete numerical examples illustrate each concept, making this a useful introductory reference for applied statistics.
The normal distribution is very much what it sounds like. This distribution is symmetrical and is shaped like a bell when graphed on the Cartesian plane. The normal distribution has the mean, the median, and the mode all located at essentially the same point on the distribution — at the peak — and frequencies gradually decrease at both ends of this bell-shaped curve.
This is, of course, a model for understanding a problem, and no definite predictions can be made with it or any other statistical tool. However, the model does have real practical value. Many things in life follow this model and are normally distributed, offering at least a guide for how to best understand and predict behavior mathematically using statistics.
Suppose X is a normally distributed variable with mean μ and variance σ². Any probability involving X can be computed by converting to the z-score, where Z = (X − μ) / σ. For example, if the mean IQ score for all test-takers is 100 and the standard deviation is 10, the z-score of someone with a raw IQ score of 127 is calculated using this formula.
The z-score measures how many standard deviations X is from its mean, and it is the most appropriate way to express distances from the mean. For instance, being 27 points above the mean is meaningful if the standard deviation is 10 (z = 2.7), but less so if the standard deviation is 20 (z = 1.35). The z-score puts these distances in proper context relative to the spread of the data.
The central limit theorem states that the distribution of the sum of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution. The importance of this theorem is widespread, as it is the reason that many statistical procedures work. Regardless of the population distribution model, as the sample size increases, the sample mean tends to be normally distributed around the population mean, and its standard deviation shrinks as n increases.
To use the central limit theorem, the sample size must be independent and large enough to allow a meaningful amount of data to be collected. Each sample should represent a random sample from the population or otherwise follow the population distribution. The sample size should also be less than ten percent of the entire population.
Simple random sampling refers to any sampling method in which a population has N objects, the sample consists of n objects, and all possible samples of n objects are equally likely to occur. This method allows researchers to use established statistical methods to analyze sample results. Confidence intervals are then constructed by deviating from the sample mean in both directions to help model the situation.
For example, consider a population with probability p of a certain characteristic (and q = 1 − p). Given a random sample of n from the population, we can find the mean and standard deviation of the proportion of that sample that has the characteristic. If X₁, X₂, …, Xₙ are n independent and identically distributed random variables with mean μ and standard deviation σ, then Sₙ = X₁ + X₂ + … + Xₙ is the sample sum. It can be shown that E(Sₙ) = nμ and SD(Sₙ) = σ√n. The central limit theorem states that as n grows large, the standardized version of Sₙ approaches the standard normal distribution.
"Estimating population mean with margin of error"
Always verify citation format against your institution’s current style guide requirements.