ICLUST: Item Cluster Analysis

A common problem in the social sciences is to construct scales or composites of items to measure constructs of theoretical interest and practical importance. This process frequently involves administering a battery of items from which those that meet certain criteria are selected. These criteria might be rational, empirical, or factorial. A similar problem is to analyze the adequacy of scales that already have been formed and to decide whether the putative constructs are measured properly. Both of these problems have been discussed in numerous texts, as well as in myriad articles. Proponents of various methods have argued for the importance of face validity, discriminant validity, construct validity, factorial homogeneity, and theoretical importance.

Revelle (1979) proposed that hierachical cluster analysis could be used to estimate a new coefficient (beta) that was an estimate of the general factor saturation of a test. More recently, Zinbarg, Revelle, Yovel and Li (2005) compared McDonald's Omega to Chronbach's alpha and Revelle's beta. They conclude that omega is the best estimate. An algorithm for estimating omega using R is available as part of the "psych" package at http://personality-project.org/r.

Although ICLUST was originally written in FORTRAN for a CDC 6400 and then converted for IBM mainframes, Cooksey and Soutar, (2005) report using a PC adaptation of the original FORTRAN program. A modified version, written in Lightspeed Pascal has been available for the Mac for several years but did not have all the graphic features of the original program.

I now release a completely new version of ICLUST, written in R. It is still under development but is included as part of the "psych" package. Although early testing suggests it is stable, let me know if you have problems. Please email me if you want help with this version of ICLUST or if you desire more features.

The program currently has three primary functions: cluster, loadings, and graphics. There is no documentation yet for the program, although one can figure out most requirements by reading the code.

Clustering 24 tests of mental ability

A sample output using the 24 variable problem by Harman can be represented both graphically and in terms of the cluster order. Note that the graphic is created using GraphViz in the dot language. ICLUST.graph produces the dot code for Graphviz. I have also tried doing this in Rgraphviz with less than wonderful results. Dot code can be viewed directly in Graphviz or can be tweaked using commercial software packages (e.g., OmniGraffle)

Note that for this problem, with these parameters, the data formed one large cluster. (This is consistent with the Very Simple Structure (VSS) output as well, which shows a clear one factor solution for complexity 1 data.) See below for an example with this same data set, but with more stringent parameter settings.

 
 r.mat<- Harman74.cor$cov
 iq.clus <- ICLUST(r.mat)
 ICLUST.graph(iq.clus,title="ICLUST of 24 mental variables",out.file="iclust.dot")
 
 
 
$title
[1] "ICLUST"

$clusters
      VisualPerception                  Cubes         PaperFormBoard                  Flags     GeneralInformation  PargraphComprehension 
                     1                      1                      1                      1                      1                      1 
    SentenceCompletion     WordClassification            WordMeaning               Addition                   Code           CountingDots 
                     1                      1                      1                      1                      1                      1 
StraightCurvedCapitals        WordRecognition      NumberRecognition      FigureRecognition           ObjectNumber           NumberFigure 
                     1                      1                      1                      1                      1                      1 
            FigureWord              Deduction       NumericalPuzzles       ProblemReasoning       SeriesCompletion     ArithmeticProblems 
                     1                      1                      1                      1                      1                      1 

$corrected
     [,1]
[1,]    1

$loadings
                       [,1]
VisualPerception       0.55
Cubes                  0.34
PaperFormBoard         0.38
Flags                  0.44
GeneralInformation     0.60
PargraphComprehension  0.59
SentenceCompletion     0.57
WordClassification     0.60
WordMeaning            0.59
Addition               0.41
Code                   0.51
CountingDots           0.43
StraightCurvedCapitals 0.55
WordRecognition        0.39
NumberRecognition      0.37
FigureRecognition      0.48
ObjectNumber           0.44
NumberFigure           0.49
FigureWord             0.42
Deduction              0.56
NumericalPuzzles       0.55
ProblemReasoning       0.56
SeriesCompletion       0.63
ArithmeticProblems     0.60

$fit
$fit$clusterfit
[1] 0.76

$fit$factorfit
[1] 0.76


$results
    Item/Cluster Item/Cluster similarity correlation alpha1 alpha2 beta1 beta2 size1 size2 rbar1 rbar2   r1   r2 alpha beta rbar size
C1            V9           V5       1.00        0.72   0.72   0.72  0.72  0.72     1     1  0.72  0.72 0.78 0.78  0.84 0.84 0.72    2
C2           V12          V10       1.00        0.58   0.58   0.58  0.58  0.58     1     1  0.58  0.58 0.66 0.66  0.74 0.74 0.58    2
C3           V18          V17       1.00        0.45   0.45   0.45  0.45  0.45     1     1  0.45  0.45 0.53 0.53  0.62 0.62 0.45    2
C4           V23          V20       1.00        0.51   0.51   0.51  0.51  0.51     1     1  0.51  0.51 0.59 0.59  0.67 0.67 0.51    2
C5           V13          V11       1.00        0.54   0.54   0.54  0.54  0.54     1     1  0.54  0.54 0.61 0.61  0.70 0.70 0.54    2
C6            V7           V6       1.00        0.72   0.72   0.72  0.72  0.72     1     1  0.72  0.72 0.78 0.78  0.84 0.84 0.72    2
C7            V4           V1       0.98        0.47   0.47   0.48  0.47  0.48     1     1  0.47  0.48 0.55 0.55  0.64 0.64 0.47    2
C8           V16          V14       0.98        0.41   0.43   0.41  0.43  0.41     1     1  0.43  0.41 0.50 0.49  0.58 0.58 0.41    2
C9            C1           C6       0.93        0.78   0.84   0.84  0.84  0.84     2     2  0.72  0.72 0.86 0.86  0.90 0.87 0.69    4
C10           C4          V22       0.91        0.56   0.67   0.56  0.67  0.56     2     1  0.51  0.56 0.71 0.62  0.74 0.71 0.49    3
C11          V21          V24       0.86        0.45   0.51   0.53  0.51  0.53     1     1  0.51  0.53 0.56 0.58  0.62 0.62 0.45    2
C12          C10          C11       0.86        0.58   0.74   0.62  0.71  0.62     3     2  0.49  0.45 0.76 0.67  0.79 0.74 0.43    5
C13           C9           V8       0.85        0.64   0.90   0.64  0.87  0.64     4     1  0.69  0.64 0.90 0.69  0.90 0.78 0.64    5
C14           C8          V15       0.84        0.41   0.58   0.41  0.58  0.41     2     1  0.41  0.41 0.61 0.49  0.64 0.59 0.37    3
C15           C2           C5       0.82        0.59   0.74   0.70  0.74  0.70     2     2  0.58  0.54 0.74 0.72  0.79 0.74 0.49    4
C16           V3           C7       0.81        0.41   0.41   0.64  0.41  0.64     1     2  0.41  0.47 0.48 0.65  0.66 0.58 0.39    3
C17          V19           C3       0.80        0.40   0.41   0.62  0.41  0.62     1     2  0.41  0.45 0.48 0.64  0.65 0.57 0.38    3
C18          C12          C16       0.78        0.57   0.79   0.66  0.74  0.58     5     3  0.43  0.39 0.79 0.68  0.82 0.72 0.37    8
C19          C17          C14       0.76        0.49   0.64   0.64  0.57  0.59     3     3  0.38  0.37 0.66 0.65  0.74 0.65 0.32    6
C20          C18          C19       0.76        0.59   0.82   0.74  0.72  0.65     8     6  0.37  0.32 0.81 0.73  0.86 0.74 0.30   14
C21          C20          C13       0.71        0.62   0.86   0.90  0.74  0.78    14     5  0.30  0.64 0.85 0.78  0.90 0.77 0.32   19
C22          C15          C21       0.65        0.55   0.79   0.90  0.74  0.77     4    19  0.49  0.32 0.65 0.89  0.91 0.71 0.31   23
C23          C22           V2       0.62        0.35   0.91   0.35  0.71  0.35    23     1  0.31  0.35 0.91 0.37  0.91 0.52 0.30   24

$cor
     [,1]
[1,]    1

$alpha
[1] 0.91

$size
[1] 24 


  

Graphviz (or OmniGraffle) can open the dot file (ICLUST.dot) and will show the figure. (See above). The dot file can be edited with a normal text editor to make the figure cleaner.

24 Mental tests -- more stringent clustering rules

The previous solution showed that all 24 mental tests formed one, high level cluster (g). An alternative is to form smaller, tighter clusters. These were formed, of course, in the previous analysis, but it would be nice to find the correlations, loadings, and fit statistics for a multiple cluster solution.

Note that the cluster fit, and the more stringent factor fit are worse for this solution than the previous solution. Also note that one of the "clusters" is a single variable (cubes).


r.mat<- Harman74.cor$cov
iq.clus <- ICLUST(r.mat,nclusters=4)
graph.out <- file.choose()
ICLUST.graph(iq.clus,graph.out,title = "ICLUST of 24 mental variables")
iq.clus

$title
[1] "ICLUST"

$clusters
                       V2 C13 C15 C20
VisualPerception        0   0   0   1
Cubes                   1   0   0   0
PaperFormBoard          0   0   0   1
Flags                   0   0   0   1
GeneralInformation      0   1   0   0
PargraphComprehension   0   1   0   0
SentenceCompletion      0   1   0   0
WordClassification      0   1   0   0
WordMeaning             0   1   0   0
Addition                0   0   1   0
Code                    0   0   1   0
CountingDots            0   0   1   0
StraightCurvedCapitals  0   0   1   0
WordRecognition         0   0   0   1
NumberRecognition       0   0   0   1
FigureRecognition       0   0   0   1
ObjectNumber            0   0   0   1
NumberFigure            0   0   0   1
FigureWord              0   0   0   1
Deduction               0   0   0   1
NumericalPuzzles        0   0   0   1
ProblemReasoning        0   0   0   1
SeriesCompletion        0   0   0   1
ArithmeticProblems      0   0   0   1

$corrected
      V2  C13  C15  C20
V2  1.00 0.26 0.21 0.41
C13 0.24 0.90 0.47 0.71
C15 0.19 0.40 0.79 0.67
C20 0.38 0.62 0.55 0.86

$loadings
                         V2  C13  C15  C20
VisualPerception       0.32 0.38 0.39 0.57
Cubes                  1.00 0.24 0.19 0.38
PaperFormBoard         0.32 0.31 0.15 0.42
Flags                  0.23 0.38 0.22 0.45
GeneralInformation     0.28 0.77 0.39 0.50
PargraphComprehension  0.23 0.77 0.31 0.53
SentenceCompletion     0.16 0.80 0.32 0.49
WordClassification     0.16 0.67 0.40 0.56
WordMeaning            0.20 0.79 0.27 0.54
Addition               0.06 0.29 0.63 0.35
Code                   0.15 0.36 0.61 0.48
CountingDots           0.14 0.21 0.65 0.40
StraightCurvedCapitals 0.24 0.40 0.62 0.51
WordRecognition        0.10 0.31 0.27 0.41
NumberRecognition      0.13 0.26 0.22 0.41
FigureRecognition      0.27 0.28 0.27 0.56
ObjectNumber           0.00 0.29 0.36 0.47
NumberFigure           0.26 0.25 0.43 0.54
FigureWord             0.11 0.29 0.27 0.46
Deduction              0.29 0.51 0.27 0.58
NumericalPuzzles       0.31 0.36 0.50 0.54
ProblemReasoning       0.23 0.49 0.30 0.58
SeriesCompletion       0.35 0.54 0.40 0.62
ArithmeticProblems     0.21 0.50 0.54 0.54

$fit
$fit$clusterfit
[1] 0.51

$fit$factorfit
[1] 0.16


$results
    Item/Cluster Item/Cluster similarity correlation alpha1 alpha2 beta1 beta2 size1 size2 rbar1 rbar2   r1   r2 alpha beta rbar size
C1            V9           V5       1.00        0.72   0.72   0.72  0.72  0.72     1     1  0.72  0.72 0.78 0.78  0.84 0.84 0.72    2
C2           V12          V10       1.00        0.58   0.58   0.58  0.58  0.58     1     1  0.58  0.58 0.66 0.66  0.74 0.74 0.58    2
C3           V18          V17       1.00        0.45   0.45   0.45  0.45  0.45     1     1  0.45  0.45 0.53 0.53  0.62 0.62 0.45    2
C4           V23          V20       1.00        0.51   0.51   0.51  0.51  0.51     1     1  0.51  0.51 0.59 0.59  0.67 0.67 0.51    2
C5           V13          V11       1.00        0.54   0.54   0.54  0.54  0.54     1     1  0.54  0.54 0.61 0.61  0.70 0.70 0.54    2
C6            V7           V6       1.00        0.72   0.72   0.72  0.72  0.72     1     1  0.72  0.72 0.78 0.78  0.84 0.84 0.72    2
C7            V4           V1       0.98        0.47   0.47   0.48  0.47  0.48     1     1  0.47  0.48 0.55 0.55  0.64 0.64 0.47    2
C8           V16          V14       0.98        0.41   0.43   0.41  0.43  0.41     1     1  0.43  0.41 0.50 0.49  0.58 0.58 0.41    2
C9            C1           C6       0.93        0.78   0.84   0.84  0.84  0.84     2     2  0.72  0.72 0.86 0.86  0.90 0.87 0.69    4
C10           C4          V22       0.91        0.56   0.67   0.56  0.67  0.56     2     1  0.51  0.56 0.71 0.62  0.74 0.71 0.49    3
C11          V21          V24       0.86        0.45   0.51   0.53  0.51  0.53     1     1  0.51  0.53 0.56 0.58  0.62 0.62 0.45    2
C12          C10          C11       0.86        0.58   0.74   0.62  0.71  0.62     3     2  0.49  0.45 0.76 0.67  0.79 0.74 0.43    5
C13           C9           V8       0.85        0.64   0.90   0.64  0.87  0.64     4     1  0.69  0.64 0.90 0.69  0.90 0.78 0.64    5
C14           C8          V15       0.84        0.41   0.58   0.41  0.58  0.41     2     1  0.41  0.41 0.61 0.49  0.64 0.59 0.37    3
C15           C2           C5       0.82        0.59   0.74   0.70  0.74  0.70     2     2  0.58  0.54 0.74 0.72  0.79 0.74 0.49    4
C16           V3           C7       0.81        0.41   0.41   0.64  0.41  0.64     1     2  0.41  0.47 0.48 0.65  0.66 0.58 0.39    3
C17          V19           C3       0.80        0.40   0.41   0.62  0.41  0.62     1     2  0.41  0.45 0.48 0.64  0.65 0.57 0.38    3
C18          C12          C16       0.78        0.57   0.79   0.66  0.74  0.58     5     3  0.43  0.39 0.79 0.68  0.82 0.72 0.37    8
C19          C17          C14       0.76        0.49   0.64   0.64  0.57  0.59     3     3  0.38  0.37 0.66 0.65  0.74 0.65 0.32    6
C20          C18          C19       0.76        0.59   0.82   0.74  0.72  0.65     8     6  0.37  0.32 0.81 0.73  0.86 0.74 0.30   14
C21            0            0       0.00        0.00   0.00   0.00  0.00  0.00     0     0  0.00  0.00 0.00 0.00  0.00 0.00 0.00    0
C22            0            0       0.00        0.00   0.00   0.00  0.00  0.00     0     0  0.00  0.00 0.00 0.00  0.00 0.00 0.00    0
C23            0            0       0.00        0.00   0.00   0.00  0.00  0.00     0     0  0.00  0.00 0.00 0.00  0.00 0.00 0.00    0

$cor
      V2  C13  C15  C20
V2  1.00 0.24 0.19 0.38
C13 0.24 1.00 0.40 0.62
C15 0.19 0.40 1.00 0.55
C20 0.38 0.62 0.55 1.00

$alpha
  V2  C13  C15  C20 
1.00 0.90 0.79 0.86 

$size
 V2 C13 C15 C20 
  1   5   4  14 

ICLUST - R Code


#ICLUST  - a function to form homogeneous item composites
# based upon Revelle, W. (1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research, 14, 57-74.
#
# psudo code
#	find similarity matrix
#		original is either covariance or correlation
#		corrected is disattenuated
#find most similar pair
#if size of pair > min size, apply beta criterion
#	if beta new > min(beta 1, beta 2) combine pair
#update similarity matrix
#repeat until finished
#then report various summary statistics
#example code
#r.mat<- Harman74.cor$cov
# print(ICLUST(r.mat),digits=2)

#ICLUST is the main function and calls other routines

"ICLUST" <- 
 function (r.mat,nclusters=1,alpha=3,beta=1,beta.size=4,alpha.size=3,correct=TRUE,reverse=TRUE,beta.min=.5,output=1,digits=2,labels=NULL,cut=0,n.iterations=0,title="ICLUST") {#should allow for raw data, correlation or covariances
 #ICLUST.options <- list(n.clus=1,alpha=3,beta=2,beta.size=4,alpha.size=3,correct=TRUE,reverse=TRUE,beta.min=.5,output=1,digits=2) 
	ICLUST.options <- list(n.clus=nclusters,alpha=alpha,beta=beta,beta.size=beta.size,alpha.size=alpha.size,correct=correct,reverse=reverse,beta.min=beta.min,output=output,digits=digits) 
	if(dim(r.mat)[1]!=dim(r.mat)[2]) {r.mat <- cor(r.mat,use="pairwise") }    #cluster correlation matrices, find correlations if not square matrix
	iclust.results <- ICLUST.cluster(r.mat,ICLUST.options)
	loading <- cluster.loadings(iclust.results$clusters,r.mat,digits=digits)
	fits <- cluster.fit(r.mat,loading$loadings,iclust.results$clusters,digits=digits)
	sorted <- ICLUST.sort(ic.load=loading$loadings,labels=labels,cut=cut) #sort the loadings
	
	#now, iterate the cluster solution to clean it up (if desired)
	
		clusters <- iclust.results$clusters
		old.clusters <- clusters
		old.fit <- fits$clusterfit
		clusters <- factor2cluster(loading$loadings,cut=cut,loading=FALSE)
		load <- cluster.loadings(clusters,r.mat,digits=digits)
		if (n.iterations > 0) {  #it is possible to iterate the solution to perhaps improve it 
		for (steps in 1:n.iterations) {   #
			load <- cluster.loadings(clusters,r.mat,digits=digits)
			clusters <- factor2cluster(loading$loadings,cut=cut,loading=FALSE)
			if(dim(clusters)[2]!=dim(old.clusters)[2]) {change <- 999 
			load <- cluster.loadings(clusters,r.mat,digits=digits) } else {
			change <- sum(abs(clusters)-abs(old.clusters)) }  #how many items are changing?
			fit <- cluster.fit(r.mat,loading$loadings,clusters,digits=digits)
		old.clusters <- clusters
		print(paste("iterations ",steps," change in clusters ", change, "current fit " , fit$clusterfit))
		if ((abs(change) < 1) | (fit$clusterfit <= old.fit)) {break}    #stop iterating if it gets worse or there is no change in cluster definitions
		old.fit <- fit$cluster.fit
					}
		}
	p.fit <- cluster.fit(r.mat,loading$loadings,clusters,digits=digits)
	p.sorted <- ICLUST.sort(ic.load=loading$loadings,labels=labels,cut=cut)
	purified <- cluster.cor(clusters,r.mat,digits=digits)
	list(title=title,clusters=iclust.results$clusters,corrected=loading$corrected,loadings=loading$loadings,fit=fits,results=iclust.results$results,cor=loading$cor,alpha=loading$alpha,size=loading$size,sorted=sorted,p.fit = p.fit,p.sorted = p.sorted,purified=purified)
}   

"ICLUST.sort"<- function (ic.load,cut=0,labels=NULL,loading=FALSE) {
     nclust <- dim(ic.load)[2]
     nitems <- dim(ic.load)[1]
    if (length(labels)==0) {
    var.labels <- rownames(ic.load)} else {var.labels=labels}
    if (length(var.labels)==0) {var.labels =paste('V',seq(1:nitems),sep='')} #unlabled variables
    if(loading) {loadings <- ic.load$loadings} else {loadings <- ic.load}
    
   
  loads <- data.frame(item=seq(1:nitems),content=var.labels,cluster=rep(0,nitems),loadings)
 
  #first find the maximum for each row and assign it to that cluster
   loads$cluster <- apply(abs(loadings),1,which.max)
  for (i in 1:nitems) {if (abs(loadings[i,loads$cluster[i]]) < cut) {loading$cluster[i] <- nclust+1}} #assign the ones that missed the cut a location
 
  ord <- sort(loads$cluster,index.return=TRUE)
  loads[1:nitems,] <- loads[ord$ix,]
  rownames(loads)[1:nitems] <- rownames(loads)[ord$ix]
  
  items <- c(table(loads$cluster),1)   #how many items are in each cluster?
  if(length(items) < (nclust+1)) {items <- rep(0,(nclust+1))   #this is a rare case where some clusters don't have anything in them
    for (i in 1:nclust+1) {items[i] <- sum(loads$cluster==i) }  }
    
  #now sort the loadings that have their highest loading on each cluster
   first <- 1
	for (i in 1:nclust) {
	if(items[i]>0 ) {
	last <- first + items[i]- 1
	ord <- sort(abs(loads[first:last,i+3]),decreasing=TRUE,index.return=TRUE)
   loads[first:last,] <- loads[ord$ix+first-1,]
    rownames(loads)[first:last] <- rownames(loads)[ord$ix+first-1]
    first <- first + items[i]}
    }
    if (first < nitems) loads[first:nitems,"cluster"] <- 0   #assign items less than cut to 0
 ICLUST.sort <- list(sorted=loads) } 
 
 
 
 

ICLUST.cluster <- function (r.mat,ICLUST.options) {#should allow for raw data, correlation or covariances

#options:  alpha =1  (minimum alpha)  2 (average alpha)    3 (maximum alpha)
#          beta =1   (minimum beta)   2 (average beta)     3 (maximum beta)
#          correct  for reliability
#          reverse score items if negative correlations
#          stop clustering if beta for new clusters < beta.min
#          output =1 (short)    2  (show steps)     3 show rejects as we go

#initialize  various arrays and get ready for the first pass
output <- ICLUST.options$output
num.var <- nrow(r.mat)
keep.clustering <- TRUE          #used to determine when we are finished clustering
results <- data.frame(matrix(rep(0,18*(num.var-1)),ncol=18))
names(results) <- c("Item/Cluster", "Item/Cluster","similarity","correlation","alpha1","alpha2",
"beta1","beta2","size1","size2","rbar1","rbar2","r1","r2","alpha","beta","rbar","size")
rownames(results) <- paste("C",1:(num.var-1),sep="")

clusters <- diag(1,nrow =nrow(r.mat))    #original cluster structure is 1 item clusters
rownames(clusters) <- rownames(r.mat)
colnames(clusters) <- paste("V",1:num.var,sep="")
count=1

#master loop 

while (keep.clustering) {   #loop until we figure out we should stop
#find similiarities
 #we will do most of the work on a copy of the r.mat
cluster.stats <- cluster.cor(clusters,r.mat,FALSE)  
sim.mat <- cluster.stats$cor    #the correlation matrix
diag(sim.mat) <- 0   #we don't want 1's on the diagonal to mess up the maximum 

#two ways to estimate reliability -- for 1 item clusters, max correlation, for >1, alpha
#this use of initial max should be an option
if (ICLUST.options$correct) {   #find the largest and smallest similarities for each variable      
	row.range <- apply(sim.mat,1,range,na.rm=TRUE)     
	row.max <- pmax(abs(row.range[1,]),abs(row.range[2,]))  #find the largest absolute similarity
   } else {row.max <- rep(1, nrow(sim)) }         #don't correct for largest similarity
   
   item.rel <-  cluster.stats$alpha
for (i in 1: length(item.rel)) { if (cluster.stats$size[i]<2) {
	item.rel[i] <- row.max[i]
	#figure out item betas here?
	}}
sq.max <- diag(1/sqrt(item.rel))     #used to correct for reliabilities
 #this is the corrected for maximum r similarities
if (ICLUST.options$correct) {sim <- sq.max %*% sim.mat %*% sq.max  
     } else {sim <- sim.mat}
diag(sim) <- NA                  #we need to not consider the diagonal when looking for maxima
#find the most similar pair    and apply tests if we should combine
test.alpha <- FALSE
test.beta <- FALSE

while(!(test.alpha&test.beta)){

max.cell <- which.max(sim)             #global maximum
if (length(max.cell) < 1) {
   keep.clustering <- FALSE
   break}   #there are no non-NA values left
sign.max <- 1
if ( ICLUST.options$reverse ) {            #normal case is to reflect if necessary
	min.cell <- which.min(sim)             #location of global minimum
	if (sim[max.cell] <  abs(sim[min.cell] )) { 
 	  	sign.max <- -1
		max.cell <- min.cell }
	if (sim[max.cell] < 0.0) {sign.max <- -1 }}  
	               #this is a weird case where all the similarities are negative 
	
max.col <- trunc(max.cell/nrow(sim))+1    #is in which row and column?
max.row <- max.cell - (max.col-1)*nrow(sim) #need to fix the case of first column
if (max.row < 1) {max.row <- nrow(sim)
 max.col <- max.col-1 }

#combine these two rows if the various criterion are passed
beta.combined <-  2* sign.max*sim.mat[max.cell]/(1+sign.max* sim.mat[max.cell])   #unweighted beta

size1 <- cluster.stats$size[max.row]
if(size1 < 2) {V1 <- 1
	beta1 <-  item.rel[max.row]
	alpha1 <-  item.rel[max.row]
	rbar1 <- item.rel[max.row] } else {
	rbar1 <- results[cluster.names[max.row],"rbar"]
	beta1 <- results[cluster.names[max.row],"beta"]
	alpha1 <- results[cluster.names[max.row],"alpha"]}
	
 V1 <- size1 + size1*(size1-1) * rbar1
 
size2 <- cluster.stats$size[max.col]
if(size2 < 2) {V2 <- 1 
	beta2 <-  item.rel[max.col] 
	alpha2 <-  item.rel[max.col] 
	rbar2 <- item.rel[max.col] } else {
	rbar2 <- results[cluster.names[max.col],"rbar"]
	beta2 <- results[cluster.names[max.col],"beta"]
	alpha2 <- results[cluster.names[max.col],"alpha"]}

 V2 <- size2 + size2 * (size2-1) * rbar2
	
Cov12 <- sign.max* sim.mat[max.cell] * sqrt(V1*V2)
V12 <- V1 + V2 + 2 * Cov12
size12 <- size1 + size2
alpha <- (V12 - size12)*(size12/(size12-1))/V12
rbar <- alpha/(size12-alpha*(size12-1))

#what is the correlation of this new cluster with the two subclusters?
#this considers item overlap problems
c1 <-  sign.max*rbar1*size1*size1 + sign.max* Cov12   #corrects for item overlap
c2 <-  rbar2*size2*size2 + Cov12     #only flip one of the two correlations with the combined cluster
if(size1 > size2) {r1 <- c1/sqrt(V1*V12)
r2 <- sign.max* c2/sqrt(V2*V12) } else {r1 <-sign.max* c1/sqrt(V1*V12)    
                                     #flip the smaller of the two clusters
r2 <-  c2/sqrt(V2*V12) }

#test if we should combine these two clusters  
#first, does alpha increase?
test.alpha <- TRUE

if (ICLUST.options$alpha.size < min(size1,size2)) {
  switch(ICLUST.options$alpha, {if (alpha < min(alpha1,alpha2)) {if (output>2) {print(
  paste ('do not combine ', cluster.names[max.row],"with", cluster.names[max.col],
  'new alpha =', round (alpha,2),'old alpha1 =',round( alpha1,2),"old alpha2 =",round(alpha2,2)))}
												test.alpha <- FALSE }},
  {if (alpha < mean(alpha1,alpha2)) {if (output>2) {print(paste ('do not combine ',
  cluster.names[max.row],"with", cluster.names[max.col],'new alpha =', round (alpha,2),
  'old alpha1 =',round( alpha1,2),"old alpha2 =",round(alpha2,2)))}
												test.alpha <- FALSE  }},
  {if (alpha < max(alpha1,alpha2)) {if (output>2) {print(paste ('do not combine ',
  cluster.names[max.row],"with", cluster.names[max.col],'new alpha =', round (alpha,2),
  'old alpha1 =',round( alpha1,2),"old alpha2 =",round(alpha2,2)))}
												test.alpha <- FALSE  }}) #end switch  
  }   #end if options$alpha.size

#second, does beta increase ?
test.beta <- TRUE          

if (ICLUST.options$beta.size < min(size1,size2)) {
  switch(ICLUST.options$beta, {if (beta.combined < min(beta1,beta2)) {if (output>2) {print(
  paste ('do not combine ', cluster.names[max.row],"with", cluster.names[max.col],'new beta =',
  round (beta.combined,2),'old beta1 =',round( beta1,2),"old beta2 =",round(beta2,2)))}
												test.beta <- FALSE }},
  {if (beta.combined < mean(beta1,beta2)) {if (output>2) {print(paste ('do not combine ', 
  cluster.names[max.row],"with", cluster.names[max.col],'new beta =', round (beta.combined,2),
  'old beta1 =',round( beta1,2),"old beta2 =",round(beta2,2)))}
												test.beta <- FALSE  }},
  {if (beta.combined < max(beta1,beta2)) {if (output>2) {print(paste ('do not combine ',
  cluster.names[max.row],"with", cluster.names[max.col],'new beta =', round (beta.combined,2),
  'old beta1 =',round( beta1,2),"old beta2 =",round(beta2,2)))}
												test.beta <- FALSE  }}) #end switch  
												
  }   #end if options$beta.size






if(test.beta&test.alpha)   {break } else  {
if (beta.combined < ICLUST.options$beta.min) {
			keep.clustering <- FALSE       #the most similiar pair is not very similar, we should quit
			break}  else {sim[max.row,max.col] <- NA 
		sim[max.col,max.row] <- NA }
    }   #end of test.beta&test.alpha
    
}    #end of while test.alpha&test.beta.loop

#combine and summarize
if (keep.clustering) 
   {          # we have based the alpha and beta tests, now combine these two variables
clusters[,max.row] <- clusters[,max.row] + sign.max *  clusters[,max.col]  
cluster.names <- colnames(clusters)

#summarize the results
results[count,1] <- cluster.names[max.row]
results[count,2] <- cluster.names[max.col]
results[count,"similarity"] <- sim[max.cell]
results[count,"correlation"] <- sim.mat[max.cell]
results[count,"alpha1"] <- item.rel[max.row]
results[count,"alpha2"] <- item.rel[max.col]
size1 <- cluster.stats$size[max.row]
size2 <- cluster.stats$size[max.col]
results[count,"size1"] <- size1
results[count,"size2"] <- size2
results[count,"beta1"] <-  beta1
results[count,"beta2"] <-  beta2
results[count,"rbar1"] <-  rbar1
results[count,"rbar2"] <-  rbar2
results[count,"r1"] <- r1
results[count,"r2"] <- r2


results[count,"beta"] <- beta.combined

results[count,'alpha'] <- alpha
results[count,'rbar'] <- rbar
results[count,"size"] <- size12
results[count,3:18] <- round(results[count,3:18],ICLUST.options$digits)
#update
cluster.names[max.row] <- paste("C",count,sep="")
colnames(clusters) <- cluster.names
clusters <- clusters[,-max.col]
cluster.names<- colnames(clusters)

#row.max <- row.max[-max.col]    



	}  #end of combine section

if(output > 1) print(results[count,],digits=2)
count=count+1 
if ((num.var - count) < ICLUST.options$n.clus) {keep.clustering <- FALSE}
if(num.var - count < 1) {keep.clustering <- FALSE}   #only one cluster left
	
}  #end of keep clustering loop

ICLUST.cluster <- list(results=results,clusters=clusters,number <- num.var - count)
}   # end ICLUST.cluster

cluster.loadings <- 
function (keys, r.mat, correct = TRUE,digits=2) 
{
    if (!is.matrix(keys)) 
        keys <- as.matrix(keys)
    item.covar <- r.mat %*% keys          #item by cluster covariances
    covar <- t(keys) %*% item.covar  #variance/covariance of clusters
    var <- diag(covar)
    sd.inv <- 1/sqrt(var)
    
    key.count <- diag(t(keys) %*% keys)    #how many items in each cluster?
    if (correct) {
                  cluster.correct <- diag((key.count/(key.count - 1)))
                   for (i in 1:ncol(keys)) {
                         if (key.count[i]==1) {   #fix the case of 1 item keys
                   	     cluster.correct[i,i] <- 1
                   	} else { cluster.correct[i,i] <- key.count[i]/(key.count[i]-1)
                   	item.covar[,i] <- item.covar[,i] - keys[,i]} 
                   }   #i loop
                   correction.factor <- keys %*% cluster.correct
                   correction.factor[ correction.factor < 1] <- 1
                  item.covar <- item.covar * correction.factor
                  }
   
      
    ident.sd <- diag(sd.inv, ncol = length(sd.inv))
    cluster.loading <-  item.covar %*% ident.sd
    cluster.correl <- ident.sd %*% covar %*% ident.sd
    
    
   
    
    key.alpha <- ((var - key.count)/var) * (key.count/(key.count - 1))
    key.alpha[is.nan(key.alpha)] <- 1
    key.alpha[!is.finite(key.alpha)] <- 1
    colnames(cluster.loading) <- colnames(keys)
    colnames(cluster.correl) <- colnames(keys)
    rownames(cluster.correl) <- colnames(keys)
    rownames(cluster.loading) <- rownames(r.mat)
    
    if( ncol(keys) >1)  {cluster.corrected <- correct.cor(cluster.correl, t(key.alpha))} else {cluster.corrected <- cluster.correl}
    
    return(list(loadings=round(cluster.loading,digits), cor=round(cluster.correl,digits),corrected=round(cluster.corrected,digits), sd = round(sqrt(var),digits), alpha = round(key.alpha,digits),
            size = key.count))
    }






cluster.fit <- function(original,load,clusters,diagonal=FALSE,digits=2) {
      

 sqoriginal <- original*original    #squared correlations
 totaloriginal <- sum(sqoriginal) - diagonal*sum(diag(sqoriginal) )   #sum of squared correlations - the diagonal
        load <- as.matrix(load) 
        clusters <- as.matrix(clusters)
        model <- load %*% t(load)                 #reproduce the correlation matrix by the factor law R=  FF'
        residual <- original-model              #find the residual  R* = R - FF'
        sqresid <- residual*residual            #square the residuals
        totalresid <- sum(sqresid)- diagonal * sum(diag(sqresid) )      #sum squared residuals - the main diagonal
        fit <- 1-totalresid/totaloriginal       #fit is 1-sumsquared residuals/sumsquared original     (of off diagonal elements
        clusters <- abs(clusters)
       model.1 <- (load * clusters) %*%  t(load*clusters)
       residual <- original - model.1
       sqresid <- residual*residual            #square the residuals
        totalresid <- sum(sqresid)- diagonal * sum(diag(sqresid) )      #sum squared residuals - the main diagonal
        fit.1 <- 1-totalresid/totaloriginal       #fit is 1-sumsquared residuals/sumsquared original     (of off diagonal elements 
 cluster.fit <- list(clusterfit=round(fit.1,digits),factorfit=round(fit,digits))
 }
  
 
 

part of a short guide to R
Version of August 20, 2006
William Revelle
Department of Psychology
Northwestern University