This paper compares and contrasts two scholarly articles — Gall's "Figuring out the Importance of Research Results: Statistical Significance versus Practical Significance" and Levin's "What if There Were No More Bickering About Statistical Significance Tests?" — examining their differing perspectives on hypothesis testing, sampling strategies, and statistical power. The paper explores where significance testing can and cannot be usefully applied, particularly in educational research, and considers how issues of measurement error, verbal data collection, and researcher resources affect reliability. It concludes by reflecting on how these readings inform the author's own research priorities, especially regarding at-risk populations in educational settings.
Gall's "Figuring out the Importance of Research Results: Statistical Significance versus Practical Significance" offers a thoughtful, if somewhat indecisive, viewpoint on the statistical methods used to test the null hypothesis. His observations tend to focus more on the importance of research results than on the question of when results lack significance. He moves back and forth on the subject, suggesting from his perspective that null hypothesis testing is repetitive given the level of certainty required, and that accurate conditions — such as random sampling from a defined population — must be satisfied but are inherently limited.
Levin's "What if There Were No More Bickering About Statistical Significance Tests?" is a well-reasoned, if somewhat pointed, response to "those who advocate replacing statistical hypothesis testing with alternative data-analysis strategies" (Research in the Schools, 1998). Together, these two articles provide a useful lens through which to examine the ongoing debate over statistical versus practical significance in research.
In Gall's article, the kinds of problems where significance testing can be helpfully applied are those connected to educational practice. As he states: "My concern in this paper is with the importance of research results for the improvement of educational practice" (Statistical Significance vs. Practical Significance of Research Results, 2012). Levin's article, by contrast, focuses on the lack of contextual clarity in discussions of statistical significance, using the example of a hypothetical Group A and Group B treating six elderly patients. In this sense, Levin implies that significance testing can be applied to almost any set of tasks performed repeatedly over time.
At the same time, Levin argues that significance testing is frequently misapplied in educational research. He critiques certain assertions about statistical power, noting: "Some of Nix and Barnette's assertions about statistical power and a study's publishability are similarly misleading. First, the authors state that the problem is of special concern in educational research, where '. . . effect sizes may be subtle, but at the same time, may indicate meritorious improvements in instruction and other classroom methods'" (Research in the Schools, 1998). Levin does not dispute the underlying idea but objects to its execution, arguing the claim was misleading because it rested on assumptions of reliability derived from sampling error rather than from reduced measurement error.
Levin also critiques how statistical jargon is used, arguing it obscures rather than clarifies meaning. He writes: "What a misrepresentation of the F-test and its operating characteristics! The error mean square (MSE) is an unbiased estimator of the population variance (σ²) that is not systematically affected by sample size…" (Research in the Schools, 1998). Rather than introducing new frameworks of his own, he largely focuses on exposing flaws in others' reasoning. He does, however, use the example of at-risk patients in a medical facility — a population type more commonly studied in business and organizational research — to illustrate his points. Hospitals and schools similarly conduct research on at-risk populations in order to develop predictive hypotheses about reliability.
Gall addresses comparable concerns in an educational setting, asking: "For example, suppose the research sample consists of fifth-graders and they are found to be reading at the third-grade level on a particular standardized reading test. How well does the typical fifth-grader read, and how well does the typical third-grader read?" (Statistical Significance vs. Practical Significance of Research Results, 2012). This kind of practical framing illustrates why effect size and real-world meaning matter alongside formal measures of significance.
"Critiques of sampling methods and verbal data reliability"
"Why statistical power matters and how to improve it"
"Author's research priorities shaped by Gall and Levin"
Always verify citation format against your institution’s current style guide requirements.