Description of all statistical terms used in statistics as welll as summary of research methods.
Linear Regression models (Meier, Chapter 18 / 19)
These are used in order to determine whether a correlation (or relationship) exists between one element and another and, if so, in which direction (negative or positive).
The two variables are plotted on a graph. Independent variable on the x line (horizon); y- variable (dependent) on the vertical line. The pattern between them is called the 'slope'. The point where X and Y intersect online is called 'intercept'.
The theorem used tells us that the slope of the line will be equal to the change in x (IV) given changes of y (DV). The shape of the slope (their direction and gradient) describes the relationship between X and Y.
Linear regression, as are the previous models, is used apply results population sample to population as a whole. >Linear regression is also useful for predicting occurrences in that sphere. For instance, linear regression may be used to determine whether there is a correlation between vehicle collisions and rainy days. If so, one can predict that the stormier the weather the greater the quantity of collisions.
Goodness of Fit
We will want to know the amount of error i.e. how well the regression line fits the data. The distance a point is from the regression line is known as error. A certain calculation exists to find this out. Another goodness of fit measure is the standard error of the estimation where a calculation is used to find out the extent to which the results of the sampled population will correspond to the population as a whole. Thirdly, the coefficient of determination is used to measure the total variation in the independent variable (X). Complex calculations exist for this. (All of these calculations can be worked out by special computer programs too).
Linear regression has various assumptions:
1. For any value in X, the errors in predicting Y are normally distributed with a mean of zero.
2. Errors do not get larger as X becomes larger; rather the errors remain constant throughout slope regardless of the X value.
3. The errors of Y and X are independent of one another.
4. Both IV and DV (X and Y) must be interval variables (i.e. numerical data).
5. The relationships between X and Y are linear.
Ignoring these assumptions will result in faulty statistical conclusions.
Topic 2: Comparing 2 Groups
A researcher may run the same study on two different groups with one, for instance, acting as control and the other as experimental. He may then want to know whether differences are observed between the two groups.
1. Research and null hypothesis are drawn up stating that: (a) significant difference will be found, (b) significant difference will not be found between both groups.
e.g. Alternative Hyp. H1: Employees who have taken *program will have higher job scores
Null hyp (H0): There is no difference in scores between employees who have taken program and employees who have not.
2. Mean and standard deviation of each group is calculated
3. Standard error of each group is calculated
4. Aggregate standard error for both groups together is calculated
5. With all of those calculations, we can now work out t-score. Positing that resultant t-score is -.95. Looking that up in t-table and finding that it indicates probability of .1, we can now conclude that there is more than 1 chance in 10 that no difference (minus) will be found between the two groups.
The program, in other words, will have had insignificant impact.
Understanding the three major difference of means tests
The above test was used for determining differences when the two groups are independent (i.e. not paired in any way e.g. If contrast was made between a tax return filled by a male in Nevada, and a tax-return filed by a female in Chicago). Two other different tests are used, one for equal variances and the other for dependent samples (when the two groups are paired e.g. with the same group being tested on a pre- and post-test).
Variations between t-tests are the following:
a. T-test on independent samples that have unequal variance. The problem here is determining sampling error in order to accurately determine whether two samples are or are not really different. This test is also used when the number of cases in each sample is different or when the number of cases in one or both the samples is small.
Steps:
1. Hypothesis is formulated. Means and SD for groups are worked out
2. Standard error for each group is calculated as is overall standard error
3. The t-score for the difference of means is calculated
4. The t-score is looked up and probability result applied to hypothesis.
B. T-test for independent samples with equal variances.
Since you want to avoid a Type 1 error, it is important to ensure that both groups do indeed have equal variances. To do so, perform a Levene test. Alternately, test for unequal variances can be used, unless one is absolutely certain that both groups have equal variances.
The only difference in calculation here to that of unequal variances is that instead of standard error worked out for each group, an overall standard error is calculated. The t-score is then calculated and results assessed to investigate probability.
c. T-test with dependent samples
In this case, the items are paired (e.g. with a pre -- and post test) therefore the differences between the scores of each of the individuals in both pre- and post- are calculated. Mean, SD, and SE are then calculated one of each of the differences. The t-score is calculated and probability assessed in order to see, for instance, whether differences have been found between pre- and post test.
Proportions
The t-test can also be used to investigate whether there are differences between two sample proportions.
You’re 75% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.