R: Alternative estimates of test reliabiity

splitHalf {psych}

R Documentation

Alternative estimates of test reliabiity

Description

Eight alternative estimates of test reliability include the six discussed by Guttman (1945), four discussed by ten Berge and Zergers (1978) (\mu_0 \dots \mu_3) as well as \beta (the worst split half, Revelle, 1979), the glb (greatest lowest bound) discussed by Bentler and Woodward (1980), and \omega_h and \omega_t (McDonald, 1999; Zinbarg et al., 2005). Greatest and lowest split-half values are found by brute force or sampling.

Usage

splitHalf(r,raw=FALSE,brute=FALSE,n.sample=15000,covar=FALSE,check.keys=TRUE,
           key=NULL,ci=.05,use="pairwise")
guttman(r,key=NULL) 
tenberge(r)
glb(r,key=NULL)
glb.fa(r,key=NULL)

Arguments

`r`	A correlation or covariance matrix or raw data matrix.
`raw`	return a vector of split half reliabilities
`brute`	Use brute force to try all combinations of n take n/2. Be careful. For e.g., n=24, this is 1,352,078 possible splits!
`n.sample`	If brute is false, how many samples of split halves should be tried? (16 items takes 12,780)
`covar`	Should the covariances or correlations be used for reliability calculations
`check.keys`	If TRUE, any item with a negative loading on the first factor will be flipped in sign
`key`	a vector of -1, 0, 1 to select or reverse key items. See `scoreItems` for an example of keying. If the key vector is less than the number of variables, then item numbers to be reverse can be specified.
`use`	Should we find the correlations using "pairwise" or "complete" (see ?cor)
`ci`	The alpha level to use for the confidence intervals of the split half estimates

Details

Surprisingly, more than a century after Spearman (1904) introduced the concept of reliability to psychologists, there are still multiple approaches for measuring it. Although very popular, Cronbach's \alpha (1951) underestimates the reliability of a test and over estimates the first factor saturation. Using splitHalf for tests with 16 or fewer items, all possible splits may be found fairly easily. For tests with 17 or more items, n.sample splits are randomly found. Thus, for 16 or fewer items, the upper and lower bounds are precise. For 17 or more items, they are close but will probably slightly underestimate the highest and overestimate the lowest reliabilities.

The guttman function includes the six estimates discussed by Guttman (1945), four of ten Berge and Zergers (1978), as well as Revelle's \beta (1979) using splitHalf. The companion function, omega calculates omega hierarchical (\omega_h) and omega total (\omega_t).

Guttman's first estimate \lambda_1 assumes that all the variance of an item is error:

\lambda_1 = 1 - \frac{tr(\vec{V_x})}{V_x} = \frac{V_x - tr(\vec{V}_x)}{V_x}

This is a clear underestimate.

The second bound, \lambda_2, replaces the diagonal with a function of the square root of the sums of squares of the off diagonal elements. Let C_2 = \vec{1}( \vec{V}-diag(\vec{V})^2 \vec{1}' , then

\lambda_2 = \lambda_1 + \frac{\sqrt{\frac{n}{n-1}C_2}}{V_x} = \frac{V_x - tr(\vec{V}_x) + \sqrt{\frac{n}{n-1}C_2} }{V_x}

Effectively, this is replacing the diagonal with n * the square root of the average squared off diagonal element.

Guttman's 3rd lower bound, \lambda_3, also modifies \lambda_1 and estimates the true variance of each item as the average covariance between items and is, of course, the same as Cronbach's \alpha.

\lambda_3 = \lambda_1 + \frac{\frac{V_X - tr(\vec{V}_X)}{n (n-1)}}{V_X} = \frac{n \lambda_1}{n-1} = \frac{n}{n-1}\Bigl(1 - \frac{tr(\vec{V})_x}{V_x}\Bigr) = \frac{n}{n-1} \frac{V_x - tr(\vec{V}_x)}{V_x} = \alpha

This is just replacing the diagonal elements with the average off diagonal elements. \lambda_2 \geq \lambda_3 with \lambda_2 > \lambda_3 if the covariances are not identical.

\lambda_3 and \lambda_2 are both corrections to \lambda_1 and this correction may be generalized as an infinite set of successive improvements. (Ten Berge and Zegers, 1978)

\mu_r = \frac{1}{V_x} \bigl( p_o + (p_1 + (p_2 + \dots (p_{r-1} +( p_r)^{1/2})^{1/2} \dots )^{1/2})^{1/2} \bigr), r = 0, 1, 2, \dots

where

p_h = \sum_{i\ne j}\sigma_{ij}^{2h}, h = 0, 1, 2, \dots r-1

and

p_h = \frac{n}{n-1}\sigma_{ij}^{2h}, h = r

tenberge and Zegers (1978). Clearly \mu_0 = \lambda_3 = \alpha and \mu_1 = \lambda_2. \mu_r \geq \mu_{r-1} \geq \dots \mu_1 \geq \mu_0, although the series does not improve much after the first two steps.

Guttman's fourth lower bound, \lambda_4 was originally proposed as any spit half reliability but has been interpreted as the greatest split half reliability. If \vec{X} is split into two parts, \vec{X}_a and \vec{X}_b, with correlation r_{ab} then

\lambda_4 = 2\Bigl(1 - \frac{V_{X_a} + V_{X_b}}{V_X} \Bigr) = \frac{4 r_{ab}}{V_x} = \frac{4 r_{ab}}{V_{X_a} + V_{X_b}+ 2r_{ab}V_{X_a} V_{X_b}}

which is just the normal split half reliability, but in this case, of the most similar splits. For 16 or fewer items, this is found by trying all possible splits. For 17 or more items, this is estimated by taking n.sample random splits.

\lambda_5, Guttman's fifth lower bound, replaces the diagonal values with twice the square root of the maximum (across items) of the sums of squared interitem covariances

\lambda_5 = \lambda_1 + \frac{2 \sqrt{\bar{C_2}}}{V_X}.

Although superior to \lambda_1, \lambda_5 underestimates the correction to the diagonal. A better estimate would be analogous to the correction used in \lambda_3:

\lambda_{5+} = \lambda_1 + \frac{n}{n-1}\frac{2 \sqrt{\bar{C_2}}}{V_X}.

\lambda_6,Guttman's final bound considers the amount of variance in each item that can be accounted for the linear regression of all of the other items (the squared multiple correlation or smc), or more precisely, the variance of the errors, e_j^2, and is

\lambda_6 = 1 - \frac{\sum e_j^2}{V_x} = 1 - \frac{\sum(1-r_{smc}^2)}{V_x}

The smc is found from all the items. A modification to Guttman \lambda_6, \lambda_6* reported by the score.items function is to find the smc from the entire pool of items given, not just the items on the selected scale.

Guttman's \lambda_4 is the greatest split half reliability. Although originally found here by combining the output from three different approaches,this has now been replaced by using splitHalf to find the maximum value by brute force (for 16 or fewer items) or by taking a substantial number of random splits.

The algorithms that had been tried before included:

a) Do an ICLUST of the reversed correlation matrix. ICLUST normally forms the most distinct clusters. By reversing the correlations, it will tend to find the most related clusters. Truly a weird approach but tends to work.

b) Alternatively, a kmeans clustering of the correlations (with the diagonal replaced with 0 to make pseudo distances) can produce 2 similar clusters.

c) Clusters identified by assigning items to two clusters based upon their order on the first principal factor. (Highest to cluster 1, next 2 to cluster 2, etc.)

These three procedures will produce keys vectors for assigning items to the two splits. The maximum split half reliability is found by taking the maximum of these three approaches. This is not elegant but is fast.

The brute force and the sampling procedures seem to provide more stable and larger estimates.

Yet another procedure, implemented in splitHalf is actually form all possible (for n items <= 16) or sample 15,000 (or more) split halfs corrected for test length. This function returns the best and worst splits as item keys that can be used for scoring purposes, if desired. Can do up to 24 items in reasonable time, but gets much slower for more than about 24 items. To do all possible splits of 24 items considers 1,352,078 splits. This will give an exact value, but this will not differ that much from random samples. For a 24 item problem with exactly 2 factors (by simulation), the worst split half is much lower than just random sampling would indicate. s

Timings on a MacPro for a 24 item problem with a 2.4 GHz 8 core are .24 secs for the default 10,000 samples, .678 for 30,000 samples and 22.58 sec for all possible. The values of the maximum split for these sample sizes were .799,/.804 and .800 for three replications of the default sample size of 10000, .805 and .806 for two sets of 30,000 and .812 for an exhaustive search.

There are three greatest lower bound functions. One, glb finds the greatest split half reliability, \lambda_4. This considers the test as set of items and examines how best to partition the items into splits. The other two, glb.fa and glb.algebraic, are alternative ways of weighting the diagonal of the matrix.

glb.fa estimates the communalities of the variables from a factor model where the number of factors is the number with positive eigen values. Then reliability is found by

glb = 1 - \frac{\sum e_j^2}{V_x} = 1 - \frac{\sum(1- h^2)}{V_x}

This estimate will differ slightly from that found by glb.algebraic, written by Andreas Moeltner which uses calls to csdp in the Rcsdp package. His algorithm, which more closely matches the description of the glb by Jackson and Woodhouse, seems to have a positive bias (i.e., will over estimate the reliability of some items; they are said to be = 1) for small sample sizes. More exploration of these two algorithms is underway.

Compared to glb.algebraic, glb.fa seems to have less (positive) bias for smallish sample sizes (n < 500) but larger for large (> 1000) sample sizes. This interacts with the number of variables so that equal bias sample size differs as a function of the number of variables. The differences are, however small. As samples sizes grow, glb.algebraic seems to converge on the population value while glb.fa has a positive bias.

Value

`beta`	The worst split half reliability. This is an estimate of the general factor saturation.
`maxrb`	The maximimum split half reliability. This is Guttman's lambda 4
`alpha`	Also known as Guttman's Lambda 3
`ci`	The 2.5%, 50%, and 97.5% values of the raw or sampled split half. Note that it necessary to specify raw=TRUE to get these.
`tenberge$mu1`	tenBerge mu 1 is functionally alpha
`tenberge$mu2`	one of the sequence of estimates mu1 ... mu3
`glb`	glb found from factor analysis

Author(s)

William Revelle

References

Cronbach, L.J. (1951) Coefficient alpha and the internal strucuture of tests. Psychometrika, 16, 297-334.

Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10 (4), 255-282.

Revelle, W. (1979). Hierarchical cluster-analysis and the internal structure of tests. Multivariate Behavioral Research, 14 (1), 57-74.

Revelle, W. and Condon, D.M. (2019) Reliability from alpha to omega: A tutorial. Psychological Assessment, 31, 12, 1395-1411. DOI: 10.1037/pas0000754. https://osf.io/preprints/psyarxiv/2y3w9 Preprint available from PsyArxiv

Revelle, W. and Zinbarg, R. E. (2009) Coefficients alpha, beta, omega and the glb: comments on Sijtsma. Psychometrika, 2009.

Ten Berge, J. M. F., & Zegers, F. E. (1978). A series of lower bounds to the reliability of a test. Psychometrika, 43 (4), 575-579.

Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach's \alpha , Revelle's \beta , and McDonald's \omega_h ): Their relations with each other and two alternative conceptualizations of reliability. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11336-003-0974-7")} Psychometrika, 70 (1), 123-133.

Examples

data(attitude)
splitHalf(attitude)
splitHalf(attitude,covar=TRUE) #do it on the covariances
temp <- splitHalf(attitude,raw=TRUE)
temp$ci #to show the confidence intervals, you need to specify that raw=TRUE

glb(attitude)
glb.fa(attitude)
if(require(Rcsdp)) {glb.algebraic(cor(attitude)) }
guttman(attitude)

#to show the histogram of all possible splits for the ability test
#sp <- splitHalf(psychTools::ability,raw=TRUE)  #this saves the results
#hist(sp$raw,breaks=101,ylab="SplitHalf reliability",main="SplitHalf 
#    reliabilities of a test with 16 ability items")
sp <- splitHalf(psychTools::bfi[1:10],key=c(1,9,10))

[Package psych version 1.9.11 ]