This paper presents a detailed statistical analysis across five problem sets. It examines measures of central tendency (mean, median, mode) and their robustness to outliers, calculates standard deviations and ranges for comparative data sets, converts raw scores to z-scores and T-scores, applies normal distribution principles to standardized test data, and analyzes Pearson correlation coefficients between two variables. The analysis demonstrates how outliers affect different statistical measures, the importance of choosing appropriate descriptive statistics, and how data truncation can meaningfully alter correlation results.
The first problem set required calculation of five key descriptive statistics across four datasets. Dataset a yielded a mean of 6, median of 6, mode of 6, standard deviation of 2.55, and range of 7. Dataset b produced a mean of 8.25, median of 9, no mode (N/A), standard deviation of 1.708, and range of 4. Dataset c showed a mean of 82, median of 82, mode of 82, standard deviation of 30.56, and range of 1. Dataset d resulted in a mean of 15.6, median of 9, no mode (N/A), standard deviation of 16.502, and range of 39.
These datasets illustrate the importance of computing multiple descriptive statistics rather than relying on any single measure. When all three measures of central tendency align (as in datasets a and c), the data distribution is typically symmetric. Conversely, when mean and median diverge (as in datasets b and d), the presence of asymmetry or outliers is indicated. The range provides a quick visual sense of data spread, though it is sensitive to extreme values. Standard deviation offers a more refined measure of variability by accounting for the distance of every observation from the mean.
The variation in these statistics across datasets demonstrates why selecting the appropriate measure for a given context is fundamental to statistical analysis. A researcher must consider not only what the numbers are, but what they represent about the underlying data distribution.
Comparison of datasets b and d reveals how outliers distort different statistical measures. Both datasets have the same median (9) but differ markedly in mean, standard deviation, and range. Dataset b has a mean of 8.25 and standard deviation of 1.708, while dataset d has a mean of 15.6 and standard deviation of 16.502. The explanation lies in the presence of an outlier value of 45 in dataset d, which widens the range to 39 compared to 4 in dataset b.
The standard deviation in dataset d is substantially larger than in dataset b because standard deviation is a function of squared deviations from the mean. The outlier value of 45 creates a much larger deviation than any value in dataset b, and when squared, this deviation heavily influences the overall standard deviation calculation. This demonstrates that standard deviation, like the mean, is sensitive to extreme values.
The mean is higher in dataset d (15.6) than in dataset b (8.25) for a straightforward reason: the sum of values in dataset d exceeds the sum of values in dataset b. The outlier 45 pulls the mean upward substantially. However, the median is relatively unaffected by this outlier. The median is based on the middle value in an odd-numbered set or the average of the two middle values in an even-numbered set. Since 9 is the middle value in both datasets b and d, the median remains 9 in both cases.
This robustness makes the median the best measure of central tendency for dataset d. Because the median is not skewed by the outlier, it provides a more representative description of the typical value in the distribution. In practice, when analyzing data suspected of containing outliers, statisticians often report both mean and median to give audiences a complete picture of central tendency.
The third problem set involved converting raw scores to standardized forms. Raw scores of 35.0, 56.0, 57.5, and 55.0 were converted to z-scores of −3.0, 1.2, 1.5, and 1.0, respectively. These z-scores indicate the number of standard deviations each raw score falls from the mean. A z-score of −3.0 indicates a value three standard deviations below the mean, representing an extremely low performance. A z-score of 1.5 indicates a value 1.5 standard deviations above the mean, representing solid above-average performance.
The corresponding T-scores were 20, 62, 65, and 40. T-scores use a mean of 50 and standard deviation of 10, making them more intuitive to interpret than z-scores while preserving the standardization property. A T-score of 20 is very low, while a T-score of 65 represents strong performance relative to the population.
Percentiles were also calculated: 99.865, 11.507, 6.6807, and 16.00 respectively. These percentiles indicate the percentage of the population expected to score at or below each raw score, assuming a normal distribution. A percentile of 99.865 means that approximately 99.865% of test-takers scored at or below the raw score of 35.0—though notably, this represents the lower tail of the distribution, as the z-score of −3.0 indicates an exceptionally low raw score.
"Pearson correlation with full and truncated data; causation versus correlation"
This analysis demonstrates that statistical measures must be chosen carefully based on data distribution and context. No single measure of central tendency or correlation coefficient tells the complete story without interpretation. Outliers affect different statistics in different ways: the mean and standard deviation are highly sensitive, while the median is robust. Standardized scores like z-scores and T-scores enable comparison across different distributions. Correlation analysis must account for data structure, truncation, and the critical distinction between association and causation. Competent statistical reasoning requires not merely computing numbers, but understanding what they mean and communicating those meanings clearly to an audience.
You’re 72% through this paper. Sign up to read the remaining 1 section.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.