DeclareDesign Community

Simulate artificial data matrix based on 5-point-Likert response

Hi everyone. I need your help. Could someone tell me how to simulate matrix data based on Likert-scale response (5-point Likert scale – 1-Extremely Unhappy, 2-Unhappy,3-Neutral,4-Happy,5-Extremely Happy). I need to simulate artificial data for 5 different constructs/domains with 10 items each for sample size, N=100. I also assumed each of the items has factor loading 0.7. I already installed fabricatr, likert and psych packages and I was trying to create this simulated data but wasn’t able to do it. Could one help me how to get this? Thank you in advance.

fabricatr is likely not the best tool for creating random matrices - most of the functionality is built for data frames.

Cribbing heavily from http://dwoll.de/rexrepos/posts/multFA.html :

N <- 100

Lambda <- diag(.7, 5)[rep(1:5, each=10),]

P <- nrow(Lambda)
Q <- ncol(Lambda)

FF <- MASS::mvrnorm(N, rep(0, Q), diag(Q))
E  <- MASS::mvrnorm(N, rep(0, P), diag(P))
X  <- FF %*% t(Lambda) + E

Then you can use whatever likert functions you like to map the gaussian variables to discrete scales.

thank you for response

A v belated reply to this, as I think it answers a question I had. I am simulating data where before and after intervention, each participant has N items that are responded to and are correct/incorrect, so the resulting data are integers. I was originally simulating this using binomial, but then wondered if it would be better to simulate as continuous normal and then convert to integer.
There are 2 reasons I was pondering this: first, I have to estimate a treatment effect to be added. In reality these effects can only be whole integers, but I wondered if it made more sense to model it as a continuous effect, and then again convert resulting score to integer.
2nd reason is that I am guessing this may get more critical with small N items.
I am sure I could check this empirically to see what difference it makes, but if there is a good logic for preferring one approach to simulation (ie simulate as rnorm and then convert into integer, or simulate using binomial), that would be good to know.

Generally I would match up distribution for data generation with the assumptions of your model, which is best-case for a given sample size; if you took a gaussian variate and binned it, that would correspond to a probability model with an oprobit link - I’m told that is old fashioned for IRT but it can work well.

You can also simulate how bad your model fit degrades when its assumptions don’t match the data generating process.