Simulating multivariate structures using R

The following examples shows how to simulate a multivariate structure with a particular measurement model and a particular structural model. This example produces data suitable for demonstrations of regression, correlation, factor analysis, or structural equation modeling. See the mvtnorm package for more elegant ways to simulate covariance matrices. The set of procedures shown here are meant to help the user think about structural models.

The basic logic is in terms of a measurement (factor) model relating observed variables to a set of unobserved factors. Then we have an effects model that describes how the latent variables are interrelated. (This is the basic logic of structural equation modeling, but of course, here we are doing it in reverse.)

First we create a function (mes) that does the work. Parameters to be passed to this function are a factor model (also known as a measurement model) relating how each item relates to a number of latent factors. Then we create an effects model, which is a set of path coefficients between the latent variables.

In this page, as well as most of my examples, the blue text can be copied directly into R.




mes <- function(fmodel,effect,numberofcases=1000) {   # define a general function in terms of a factor model and an effects matrix 
 
 numberofvariables <- dim(fmodel)[1]        #problem size determined by input to the function
 numberoflatent <- dim(fmodel)[2]
 tmodel <- t(fmodel)      #transpose of model
# fmodel  %*% tmodel        #show the resulting  measurement structure   

communality=diag(fmodel%*%tmodel)       #find how much to weight true scores and errors given the measurement model
  uniqueness=1-communality         
 errorweight=sqrt(uniqueness)
 errorweight=diag(errorweight)      #how much to weight the errors
 
 latentscores <- matrix(rnorm(numberofcases*(numberoflatent)),numberofcases) #create true scores for the latent variables
 #round(cor(latentscores),2)      #if uncommented, this shows the sample true score correlation matrix of the factors       
 latentscores <- latentscores%*%effect      #create true scores to reflect structural relations between the factors
 # round(cor(latentscores),2)           #note that the factors are now correlated
         
 truescores <- latentscores %*% tmodel
 #round(cor(truescores),2)           #show the true score correlation matrix (without error)
 error<- matrix(rnorm(numberofcases*(numberofvariables)),numberofcases)  #create normal error scores
 error=error%*%errorweight 
 observedscore=truescores+error 
 allscores<- data.frame(observedscore,truescores)
 return(allscores) }       #end of function mes
 
 
 

The first example is the classic multiple regression of two predictors and one criterion. We are assuming no errors of measurement and uncorrelated (in the population) predictors. The values of beta for X1 and X2 may be changed for alternative demonstrations. Note that the unique variance of the Y variable is specified in terms of what is not predicted by X1 and X2.

 #example 1   2 predictors, 1 criterion variable   no errors of measurement
  
 
 
 beta1 <- .5
 beta2 <- .6
 uniquey <- sqrt(1-(beta1^2 + beta2^2)) 
 fmodel <- matrix(c (1,0,0,
                     0,1,0,
                     0,0,1), 
                     nrow=3,ncol=3,byrow=TRUE)
 effect <- matrix(c ( 1,0,beta1,
                      0,1,beta2,
                      0,0,uniquey),nrow=3,ncol=3,byrow=TRUE)
 
 observed.df <- mes(fmodel,effect)
 names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty")
 round(var(observed.df),2)
 summary(lm(Y ~ X1+X2,data=observed.df))


 

This next example is merely two correlated predictors, still with no measurement error.

  #example 2   2 predictors, 1 criterion variable   no errors of measurement correlated predictors
  
  
 beta1 <- .5
 beta2 <- .6
 rx1x2 <- .5
 uniquex2 <-  sqrt(1-rx1x2) 
  uniquey <- sqrt(1-(beta1^2 + beta2^2)) 
 fmodel <- matrix(c (1,0,0,
                     0,1,0,
                     0,0,1), 
                     nrow=3,ncol=3,byrow=TRUE)
 effect <- matrix(c ( 1,rx1x2,beta1,
                      0,beta2,0,
                      0,0,uniquey),nrow=3,ncol=3,byrow=TRUE)
 
  observed.df <- mes(fmodel,effect)
 names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty")
 round(var(observed.df),2)
  round(cor(observed.df),2)
 summary(lm(Y ~ X1+X2,data=observed.df))

  

Here we modify example 1 by introducing errors of measurement into the criterion variable. This does not affect the relationships between the true scores, but does between the observed scores.

 
  #example 3  2 predictors, 1 criterion variable   errors in measurement of criterion
  
  
 beta1 <- .5
 beta2 <- .6
 uniquey <- sqrt(1-(beta1^2 + beta2^2)) 
 fmodel <- matrix(c (1,0,0,
                     0,1,0,
                     0,0,.5), 
                     nrow=3,ncol=3,byrow=TRUE)
 effect <- matrix(c ( 1,0,beta1,
                      0,1,beta2,
                      0,0,uniquey),nrow=3,ncol=3,byrow=TRUE)
                      
   observed.df <- mes(fmodel,effect)
 names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty")
 round(var(observed.df),2)
  round(cor(observed.df),2)
 summary(lm(Y ~ X1+X2,data=observed.df))
 
 

  

Now we introduce errors in measurement on the predictors. An interesting exercise is to vary the amount of error for the X1 and X2 variables. Also, compare the effect of changing the beta weights.

 #example 4  2 predictors, 1 criterion variable   errors in measurement of predictors
 
 beta1 <- .5
 beta2 <- .6
 uniquey <- sqrt(1-(beta1^2 + beta2^2)) 
 fmodel <- matrix(c (.8,0,0,
                     0,.6,0,
                     0,0,1), 
                     nrow=3,ncol=3,byrow=TRUE)
 effect <- matrix(c ( 1,0,beta1,
                      0,1,beta2,
                      0,0,uniquey),nrow=3,ncol=3,byrow=TRUE)
                      
   observed.df <- mes(fmodel,effect)
 names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty")
 round(var(observed.df),2)
  round(cor(observed.df),2)
 summary(lm(Y ~ X1+X2,data=observed.df))
 

Now introduce errors in measurement into both predictors as well as the criterion variable.

  #example 5  2 predictors, 1 criterion variable   errors in measurement
 
 beta1 <- .5
 beta2 <- .6
 uniquey <- sqrt(1-(beta1^2 + beta2^2)) 
 
 fmodel <- matrix(c (.8,0,0,
                     0,.6,0,
                     0,0,.5), 
                     nrow=3,ncol=3,byrow=TRUE)

effect <- matrix(c ( 1,0,beta1,
                      0,1,beta2,
                      0,0,uniquey),nrow=3,ncol=3,byrow=TRUE)
                      
  observed.df <- mes(fmodel,effect)
 names(observed.df) <- c("X1","X2", "Y","T1","T2","Ty")
 round(var(observed.df),2)
  round(cor(observed.df),2)
 summary(lm(Y ~ X1+X2,data=observed.df))
 
 
 

Now, finally, we can consider what happens in the case where we have errors of measurement in the predictors as well as the criterion, but more importantly, we have multiple indicators of all of the latent variables. This is, of course, an example of the general problem encountered in structural equation modeling. By having multiple indicators of the latent variables, we are able to estimate the errors of measurement that affected all of the previous models but that was not possible to estimate. For this case, notice that the factor model (fmodel) has more than one variable loading on each factor. The effects matrix is the same as before.

This particular example then rescales the observed score matrix to put the variables into more "realistic" terms. The model may be seen as having three estimates of ability, two of achievement, and three of performance. The particular example assumes that there are 3 measures of ability (GREV, GREQ, GREA), two measures of motivation (achievment motivation and anxiety), and three measures of performance (Prelims, GPA, MA). These titles are, of course, arbitrary and can be changed easily.

 
 #example 6    9 variables on 3 factors, with the first two predicting the 3rd


 title<-  "data set for Psychology 405: Psychometric Theory"  #<---title goes here

 
                             #measurement (factor) model for 3 factors and 9 variables
 
 fmodel <- matrix(c (.9, 0, 0,
                    .8, 0, 0,
                    .7, .7, 0,
                     0, .6, 0,
                     0,  -.8, 0,
                     0, 0, .7,
                     0, 0, .6,
                     0, 0, .5), 
                     nrow=numberofvariables,ncol=numberoflatent,byrow=TRUE)
                     
                     
 
                                        #structural model for 3 factors (the diagonals reflect unique variance, the off diagonals the structure coefficients)  
                                        
 beta1 <- .7
 beta2 <- .6
 uniquey <- sqrt(1-(beta1^2 + beta2^2)) 
 
 effect <- matrix(c ( 1,0,beta1,
                      0,1,beta2,
                      0,0,uniquey),nrow=3,ncol=3,byrow=TRUE)
                  
 observedscore <- mes(fmodel,effect)    #1000 subjects is the default
 
 
 round(cor(observedscore),2)          #show the correlation matrix
                                      #give the data "realistic" properties
 
 GREV=round(observedscore[,1]*100+500,0)
 GREQ=round(observedscore[,2]*100+500,0)
 GREA=round(observedscore[,3]*100+500,0)
 Ach=round(observedscore[,4]*10+50,0)
 Anx=round(-observedscore[,5]*10+50,0)
 Prelim=round(observedscore[,6]+10,0)
 GPA=round(observedscore[,7]*.5+4,2)
 MA=round(observedscore[,8]*.5+3,1)
 
 data=data.frame(GREV,GREQ,GREA,Ach,Anx,Prelim,GPA,MA)
 summary(data)                     #basic summary statistics
 round(cor(data),2)                #show the resulting correlations
                                   #it is, of course, identical to the previous one
                                   
                                   
                                   
  #example 6    9 variables on 3 factors, with the first two predicting the 3rd
# The particular example assumes that there are 3 measures of ability (GREV, GREQ, GREA), two measures of motivation (achievment #motivation and anxiety), and three measures of performance (Prelims, GPA, MA).  These titles are, of course, arbitrary and can #be changed easily.


 title<-  "data set for Psychology 405: Psychometric Theory"  #<---title goes here

 
                                        #measurement (factor) model for 3 factors and 9 variables
 
 fmodel <- matrix(c (.9, 0, 0,
                    .8, 0, 0,
                    .7, 0, 0,
                     0, .6, 0,
                     0,  .8, 0,
                     0, 0, .7,
                     0, 0, .6,
                     0, 0, .5), 
                     nrow=numberofvariables,ncol=numberoflatent,byrow=TRUE)
 
                                        #structural model for 3 factors (the diagonals reflect unique variance, the off diagonals the structure coefficients)  
 effect <- matrix(c(1,0,.7,
                  0,1,.6,
                  0,.0,.39),nrow=numberoflatent,byrow=TRUE)     
                  
 observedscore <- mes(fmodel,effect)    #1000 subjects is the default
 
 
 round(cor(observedscore),2)          #show the correlation matrix
                                      #give the data "realistic" properties
 sds <-c(100,100,100,10,10,1,.5,.5,rep(1,8))
 means <- c(500,500,500,50,50,10,4,3,rep(0,8))
 t <-observedscore *sds+means
 
 GREV=round(observedscore[,1]*100+500,0)
 GREQ=round(observedscore[,2]*100+500,0)
 GREA=round(observedscore[,3]*100+500,0)
 Ach=round(observedscore[,4]*10+50,0)
 Anx=round(-observedscore[,5]*10+50,0)
 Prelim=round(observedscore[,6]+10,0)
 GPA=round(observedscore[,7]*.5+4,2)
 MA=round(observedscore[,8]*.5+3,1)
 
 data=data.frame(GREV,GREQ,GREA,Ach,Anx,Prelim,GPA,MA)
 summary(data)                     #basic summary statistics
 round(cor(data),2)                #show the resulting correlations
                                   #it is, of course, identical to the previous one
                                   
 
 
 #this data set has been saved  and may be used for another analyses
 
 datafilename="http://personality-project.org/R/datasets/psychometrics.prob2.txt" 
 dataset =read.table(datafilename,header=TRUE)  #read the data file
 
 
 
part of a short guide to R

Version of May 4, 2004 - revised October 22, 2005 to be somewhat more readable
William Revelle Department of Psychology
Northwestern University