Appendix Section 1: Study 1 Expanded description of simulation code

As described in the text, we generated multiple sets of artificial data with known structure in order to analyze the effect of bipolar vs. unipolar scales in the measurement of affect. Structural analyses of these data sets were done using exploratory factor analysis and the Very Simple Structure criterion (Revelle and Rocklin, 1979) to determine the most interpretable number of factors.

These analyses were all done using the public domain statistical and data handling computer system R (R Development Core Team, 2004; http://www.r-project.org/). To facilitate the work of others who want to replicate or extend our analyses, we include in this appendix the original R code for all analyses reported. The code in this appendix may be directly copied into R and executed. A more extensive electronic copy of the code in this appendix is also available at http://personality-project.org/r/measuringemotion.html . Very Simple Structure (Revelle and Rocklin, 1979) has been adapted for the R environment, is available at http://personality-project.org/r/r.vss.html and is included in the R-package "psych' available at http://personality-project.org/r/ .

Generating Data

The basic model for all items follows classic test theory: observed score X is the sum of a true score and an error score (X=T+E). To generate two dimensional data, we generated two independent true scores (T1 and T2) sampled with replacement from a random normal distribution with mean 0 and standard deviation of 1. The score for each item, i, was thus found as Xi = L1i(T1 + wE1) + L2i(T2 + wE2). The loadings (L1 and L2) for the items on the two true score dimensions (T1 and T2) were generated so as to create circumplex items (items with equal communalities that are distributed equally in a two dimensional space). To do so, we generated 16, 36, or 72 equally-spaced items at angles from 0 to 2 pi radians (0-360 degrees). Items were assumed to load on each of the two true scores with loadings varying as the cosine of the angle for factor 1 and sin of the angle for factor 2. Item communalities were specified by adding (appropriately weighted) random normal error to each item.

Sample sizes were chosen to represent typical sample sizes in the literature, as well as the very large sample we report in the text. Thus sample sizes of 200, 800 and 3,200 were examined. Each simulation was conducted twice, once with bipolar items (generated as described above) and once with unipolar items. Following Russell and Carroll's (1999) assumption of how unipolar items are formed, we collapsed all item scores < 0 to zero. This creates a certain level of skew for each item.

One purpose of this simulation was to show that the factor structure of items, though greatly affected by differences or non-uniformity in item skew, can be recovered using factor analysis. To further increase skew for some items, we subtracted a constant (1.0) from True score T2 before adding error and before truncation. This led to greater skew for items with positive T2 scores, and less skew for items with negative T2 scores. Because each simulated item was a mixture of T1 and T2, the greater the positive loading on T2, the greater the skew, and the greater the negative loading on T2, the smaller the skew.

Overview of the R code (Text colored blue may be directly executed in R). The following R commands define three functions (simulate.items, categorical.items, and truncate.item to create artificial items with a circumplex structure and the properties of real items used to measure mood and affect. They also define two utility functions (angle and skew). Functions in R may be defined with default parameter values that may be varied when called. In the following code, the parameters that may be specified are:

The primary function (simulate.items) generates items formed from two independent dimensions. The items can have either a circumplex or simple structure. True scores of items are assumed to be bivariate normal, bipolar and to lie in a two dimensional space. Default values are included in the function definition, but other values may be specified when the function is executed. The # sign indicates a comment.

simulate.items <- function (nvar = 72 ,nsub = 500, 
    circum = TRUE, avloading =.6,  xbias=0,  ybias = -1) #give default values to parameters
	{ #begin function 

	trueweight <- sqrt(avloading)  #true weight is sqrt(of reliability) 
	errorweight <- sqrt(1-trueweight*trueweight)  #squared errors and true score weights add to 1

	truex <- rnorm(nsub)  +xbias #generate normal true scores for x + xbias
	truey <- rnorm(nsub)  + ybias #generate normal true scores for y + ybias

	if (circum)  #make a vector of radians (the whole way around the circle) if circumplex
	{radia <- seq(0,2*pi,len=nvar+1)  
      rad <- radia[which(radia<2*pi)]        #get rid of the last one
     } else rad <- rep(seq(0,3*pi/2,len=4),nvar/4) #simple structure 
        
	error<- matrix(rnorm(nsub*(nvar)),nsub)    #create normal error scores

	#true score matrix for each item reflects structure in radians
	trueitem <- outer(truex, cos(rad)) + outer(truey,sin(rad)) 

	item<- trueweight* trueitem  + errorweight*error   #observed item = true score + error score 

	return (item) 
	}   #the value of the function is the item matrix, ready for further analysis

Two other functions convert the normally distributed item scores into discrete categories and then truncate all values smaller than 0 to be equal to 0. The first function (categorical.item) converts continuous variables to discrete categorical variables with scores ranging from -3 to +3. The second function (truncate.item) converts items so that they range from 0 to +3 only, effectively truncating bipolar scales into unipolar ones.


categorical.item <-function (item) 
	{
	item = round(item)       #round all items to nearest integer value
	item[(item<=-3)] <- -3   #items < 3 become -3    
	item[(item>3) ] <-  3    #items >3 become 3 
	return(item) 
	}           #the function returns these categorical items 

truncate.item <- function(item,cutpoint=0) #truncate values less than cutpoint to zero
	{
	item[item < cutpoint] <- 0     	      #item values < 0 are truncated to zero 
	return(item)
	}  

Finally, for both real and artificial items, we report various item statistics, including angular location and skew. These are found with the angle and skew utility functions. Although it is typical to describe factor analysis results in terms of item factor loadings, it is sometimes useful to organize items in terms of polar coordinates (that is, the angular distances from factor 1, with vector lengths as communalities). This is particularly useful when examining items in a two-dimensional space, as is the case here. The angle function extracts two factors, rotates to the varimax criterion, and converts from Cartesian factor loadings into polar coordinates. To make the results more readable, we report them in degrees rather than radians.

angle = function(x) 
	{ 
	f=factanal(x,2,"varimax") 
	fload=f$loadings 
	commun=rowSums(fload*fload) 
	theta=sign(fload[,2])*180*acos(fload[,1]/sqrt(commun))/pi #vector angle (-180: 180) 
	angle <-  data.frame(x=fload[,1],y=fload[,2],communality= commun,angle=theta) 
	return(angle) 
	} 

A major threat to the interpretability of item by item correlations is skew. Large differences in skew will attenuate correlations drastically. The following function computes the skew for any item x.

skew= function (x, na.rm = FALSE) 
	{
	if (na.rm)    x <- x[!is.na(x)]             #remove missing values
	sum((x - mean(x))^3)/(length(x) * sd(x)^3)  #calculate skew   
	}

R code used in the simulation (which calls the functions listed above) In order to compare the effect of the number of subjects and the number of items, the five functions defined earlier as well as the VSS code which can be found in the Psych package (http://personality-project.org/r/) are called in six sections, for 16, 36, and 72 unipolar or bipolar items. Each section contains a loop varying the number of subjects (200, 800, 3200). The output is evaluated in terms of the Very Simple Structure Criterion and of chi square goodness of fit.

samplesize <- c(200,800,3200) #examine the effect of three sample sizes

nvar <- 16                    #examine the effect of the number of variables - 16 unipolar items
vss.16.unipolar <- list()	#results will be stored here
for (i in 1:3)                #generate three data sets of varying sample size 
	{ 
	items <- simulate.items(nsub=samplesize[i])
	catitem <- categorical.item(items)
	truncitem <- truncate.item(catitem)
	vss.16.unipolar[[i]] <- list(VSS(truncitem,rotate="varimax")) #examine VSS for the Varimax rotated solution
	}

nvar <- 36                    #examine the effect of the number of variables - 36 unipolar items
vss.36.unipolar <- list()	#results will be stored here
for (i in 1:3)                #generate three data sets of varying sample size 
	{ 
	items <- simulate.items(nsub=samplesize[i])
	catitem <- categorical.item(items)
	truncitem <- truncate.item(catitem)
	vss.36.unipolar[[i]] <- list(VSS(truncitem,rotate="varimax")) #examine VSS for the Varimax rotated solution
	}

nvar <- 72			#examine the effect of the number of variables - 72 unipolar items
vss.72.unipolar <- list()	#results will be stored here
for (i in 1:3)               	#generate three data sets of varying sample size 
	{ 	
	items <- simulate.items(nsub=samplesize[i])
	catitem <- categorical.item(items)
	truncitem <- truncate.item(catitem)
	vss.72.unipolar[[i]] <- list(VSS(truncitem,rotate="varimax")) #examine VSS for the Varimax rotated solution
	}

nvar <- 16			# examine the effect of the number of variables - 16 bipolar items 
vss.16.bipolar <- list()	#results will be stored here
for (i in 1:3)           	#generate three data sets of varying sample size 
	{ 
	items <- simulate.items(nsub=samplesize[i])
	catitem <- categorical.item(items)
	vss.16.bipolar[[i]] <- list(VSS(catitem,rotate="varimax"))  #examine VSS for the Varimax rotated solution
	}

nvar <- 36			# examine the effect of the number of variables - 36 bipolar items 
vss.36.bipolar <- list()	#results will be stored here
for (i in 1:3)           	#generate three data sets of varying sample size 
	{ 
items <- simulate.items(nsub=samplesize[i])
	catitem <- categorical.item(items)
	vss.36.bipolar[[i]] <- list(VSS(catitem,rotate="varimax"))  #examine VSS for the Varimax rotated solution
	}
                 
nvar <- 72			# examine the effect of the number of variables -72 bipolar items 
vss.72.bipolar <- list()	#results will be stored here
for (i in 1:3)           	#generate three data sets of varying sample size 
	{ 
	items <- simulate.items(nsub=samplesize[i])
	catitem <- categorical.item(items)
	vss.72.bipolar[[i]] <- list(VSS(catitem,rotate="varimax"))  #examine VSS for the Varimax rotated solution
	}

Generating a plot

Below is the code for creating a plot displaying a subset of the above analyses, for 72 bipolar, 72 unipolar, and 16 unipolar items, with each of the sample sizes.

#Generating plots: this next set generates a 3 by 3 plot: 
#72 bipolar and unipolar, and 16 unipolar VSS plots for Ns=200,800, 3200

plot.new()             #set up a new plot page
par(mfrow=c(3,3))      #3 rows and 3 columns allow us to compare results
for (i in 1:3)         #for the 3 sample sizes show the VSS plots for 72 bipolar items
	{ 
	x <- as.data.frame(vss.72.bipolar[[i]])
	VSS.plot(x,paste("N= ",samplesize[i], "\n 72 bipolar variables")) 
	}
for (i in 1:3)		#for the 3 sample sizes show the VSS plots for 72 unipolar items
	{ 
	x <- as.data.frame(vss.72.unipolar[[i]])
	VSS.plot(x,paste("N= ",samplesize[i], "\n 72 unipolar variables")) 
	}
for (i in 1:3)    	#for the 3 sample sizes show the VSS plots for 16 unipolar items
	{ 
	x <- as.data.frame(vss.16.unipolar[[i]])
	VSS.plot(x,paste("N= ",samplesize[i], "\n 16 unipolar variables")) 
	}