A fourth method for finding reliability, also utilizing a single administration of a single form, is based on the consistency of responses to all items in the test. Such a formula is developed by Kuder and Richardson (1937). Rather than requiring two-half-scores, this technique is based on an examination of performance on each item. Of the various formulas developed in the original article, the most widely applicable, commonly known as Kuder-Richardson formula 20, is the following:
where n is the number of items on the test, s2 is the variance of the total test scores, and pi the proportion of persons getting the z-th item correct. The product of p and q is computed for each item, and these products are then summed over all n items to give ∑piqi.
Under the assumption that all items are of equal difficulty, i.e., pi is the same for items, Kuder and Richardson used alternative tut formula for the above expression in the following form:
where x, which equals np, is the mean of the total scores. This formula is known as Kuder-Richardson formula 21.
The coefficient computed from formula (i) is a more conservative estimate of reliability than that obtained from formula (h). Formula (i), which was derived from (h) by making pi the same for all items, is simpler from a computational standpoint because the n products piqi do not have to be calculated.
To illustrate the application of formula (i), assume that the mean of a test containing 70 items is 50, and the variance is 100.
Then on applying (h), the estimated reliability of this test is;
The formula K-R 21 is quick and easy to calculate, and when used appropriately, it can be extremely useful in determining a test’s overall reliability.
Unlike the Split-half method, which splits the test just once, the K-R 21 estimates the reliability of a test that has been split into all possible halves, and it automatically corrects for the splits without any need for a Spearman-Brown type adjustment.
To use the K-R 21, the following criteria should be kept in mind:
- The entire test should be aimed at tapping a single domain. If the test is not clearly focused on a single underlying concept, the reliability value will be underestimated
- The test is scored on the basis of each item being either right or wrong.
- All items have about the same degree of difficulty. The formula works best (produces its highest reliability estimate) when the difficulty index is approximately 0.50 for each item.
Nowadays, most researchers use a test of internal reliability known as Cronbach’s alpha (Cronbach, 1951). It essentially calculates the average of all possible split-half reliability coefficients. A computed alpha coefficient will vary between ‘1’ (implying perfect internal reliability) and ‘0’ (implying no internal reliability).
A figure of 0.80 is typically employed as a rule of thumb to denote an acceptable level of internal reliability, though many writers work with a slightly lower figure.
The coefficient is defined as;
where si2 is the variance of scores on item i, and st2 is the variance of the total test scores. Although the Kuder-Richardson formulas are applicable only when test items are scored “0” (wrong) or “1” (right),
Cronbach’s alpha is a general formula for estimating the reliability of a test consisting of items on which two or more scoring weights are assigned to answers.
Although Cronbach’s alpha has the advantage of identifying which items are or are not contributing to the overall reliability, the disadvantages are that each and every item has to be individually assessed for variability.