# Multi-Group Factor Analysis

In this example, we will show how to use lslx to conduct multi-group factor analysis. The example uses data HolzingerSwineford1939 in the package lavaan. Hence, lavaan must be installed.

## Model Specification

In the following specification, x1 - x9 is assumed to be measurements of 3 latent factors: visual, textual, and speed.

model_mgfa <- "visual  :=> 1 * x1 + x2 + x3
textual :=> 1 * x4 + x5 + x6
speed   :=> 1 * x7 + x8 + x9"

The operator :=> means that the LHS latent factors is defined by the RHS observed variables. In this model, visual is mainly measured by x1 - x3, textual is mainly measured by x4 - x6, and speed is mainly measured by x7 - x9. Loadings of x1, x4, and x7 are fixed at 1 for scale setting. The above specification is valid for both groups. Details of model syntax can be found in the section of Model Syntax via ?lslx.

## Object Initialization

lslx is written as an R6 class. Everytime we conduct analysis with lslx, an lslx object must be initialized. The following code initializes an lslx object named lslx_mgfa.

library(lslx)
lslx_mgfa <- lslx$new(model = model_mgfa, data = lavaan::HolzingerSwineford1939, group_variable = "school", reference_group = "Pasteur") An 'lslx' R6 class is initialized via 'data' argument. Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9 Latent Factors: visual textual speed Groups: Grant-White Pasteur Reference Group: Pasteur NOTE: Because Pasteur is set as reference, coefficients in other groups actually represent increments from the reference. Here, lslx is the object generator for lslx object and new is the build-in method of lslx to generate a new lslx object. The initialization of lslx requires users to specify a model for model specification (argument model) and a data set to be fitted (argument sample_data). The data set must contain all the observed variables specified in the given model. Because in this example a multi-group analysis is considered, variable for group labeling (argument group_variable) must be specified. In lslx, two types of parameterization can be used in multi-group analysis. The first type is the same with the traditional multi-group SEM, which treats model parameters in each group separately. The second type sets one group as reference and treats model parameters in other groups as increments with respect to the reference. Under the second type of parameterization, the group heterogeneity can be efficiently explored if we treat the increments as penalized parameters. In this example, Pasteur is set as reference. Hence, the parameters in Grant-White now reflect differences from the reference. ## Model Respecification After an lslx object is initialized, the heterogeneity of a multi-group model can be quickly respecified by $free_heterogeneity(), $fix_heterogeneity(), and $penalize_heterogeneity() methods. The following code sets x2<-visual, x3<-visual, x5<-textual, x6<-textual, x8<-speed, x9<-speed, and x2<-1, x3<-1, x5<-1, x6<-1, x8<-1, x9<-1 in Grant-White as penalized parameters. Note that parameters in Grant-White now reflect differences since Pasteur is set as reference.

lslx_mgfa$penalize_heterogeneity(block = c("y<-1", "y<-f"), group = "Grant-White") The relation x1<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x2<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x3<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x4<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x5<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x6<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x7<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x8<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x9<-1 under Grant-White is set as PENALIZED with starting value = 0. The relation x2<-visual under Grant-White is set as PENALIZED with starting value = 0. The relation x3<-visual under Grant-White is set as PENALIZED with starting value = 0. The relation x5<-textual under Grant-White is set as PENALIZED with starting value = 0. The relation x6<-textual under Grant-White is set as PENALIZED with starting value = 0. The relation x8<-speed under Grant-White is set as PENALIZED with starting value = 0. The relation x9<-speed under Grant-White is set as PENALIZED with starting value = 0. NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment. NOTE: Please check whether the starting value for the increment represents a difference.  Since the homogeneity of latent factor means may not be a reasonable assumption when examining measurement invariance, the following code relaxes this assumption lslx_mgfa$free_block(block = "f<-1", group = "Grant-White")
The relation visual<-1 under Grant-White is set as FREE with starting value = 0.
The relation textual<-1 under Grant-White is set as FREE with starting value = 0.
The relation speed<-1 under Grant-White is set as FREE with starting value = 0.
NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment.
NOTE: Please check whether the starting value for the increment represents a difference. 

To see more methods to modify a specified model, please check the section of Set-Related Method via ?lslx.

## Model Fitting

After an lslx object is initialized, method $fit_mcp() can be used to fit the specified model into the given data with MCP. lslx_mgfa$fit_mcp()
CONGRATS: Algorithm converges under EVERY specified penalty level.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100 

All the fitting result will be stored in the fitting field of lslx_mgfa.

## Model Summarizing

Unlike traditional SEM analysis, lslx fits the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Available selectors can be found in the section of Penalty Level Selection via ?lslx. The following code summarize the fitting result under the penalty level selected by Haughton’s Bayesian information criterion (HBIC).

lslx_mgfa$summarize(selector = "hbic") General Information number of observations 301 number of complete observations 301 number of missing patterns none number of groups 2 number of responses 9 number of factors 3 number of free coefficients 48 number of penalized coefficients 15 Numerical Conditions selected lambda 0.134 selected delta 3.063 selected step none objective value 0.485 objective gradient absolute maximum 0.001 objective Hessian convexity 0.187 number of iterations 11.000 loss value 0.430 number of non-zero coefficients 50.000 degrees of freedom 58.000 robust degrees of freedom 60.646 scaling factor 1.046 Fit Indices root mean square error of approximation (rmsea) 0.090 comparative fit index (cfi) 0.919 non-normed fit index (nnfi) 0.900 standardized root mean of residual (srmr) 0.085 Likelihood Ratio Test statistic df p-value unadjusted 129.424 58.000 0.000 mean-adjusted 123.777 58.000 0.000 Root Mean Square Error of Approximation Test estimate lower upper unadjusted 0.090 0.065 0.115 mean-adjusted 0.089 0.063 0.114 Coefficient Test (Group = "Pasteur", Std.Error = "sandwich") Factor Loading (reference component) type estimate std.error z-value P(>|z|) lower upper x1<-visual fixed 1.000 - - - - - x2<-visual free 0.604 0.143 4.211 0.000 0.323 0.885 x3<-visual free 0.789 0.157 5.027 0.000 0.481 1.096 x4<-textual fixed 1.000 - - - - - x5<-textual free 1.120 0.067 16.599 0.000 0.988 1.252 x6<-textual free 0.932 0.064 14.678 0.000 0.808 1.057 x7<-speed fixed 1.000 - - - - - x8<-speed free 1.200 0.134 8.947 0.000 0.937 1.463 x9<-speed free 1.040 0.208 5.005 0.000 0.633 1.448 Covariance (reference component) type estimate std.error z-value P(>|z|) lower upper textual<->visual free 0.406 0.135 3.017 0.003 0.142 0.671 speed<->visual free 0.169 0.066 2.565 0.010 0.040 0.298 speed<->textual free 0.173 0.060 2.899 0.004 0.056 0.290 Variance (reference component) type estimate std.error z-value P(>|z|) lower upper visual<->visual free 0.801 0.230 3.489 0.000 0.351 1.252 textual<->textual free 0.880 0.135 6.532 0.000 0.616 1.144 speed<->speed free 0.305 0.083 3.684 0.000 0.143 0.467 x1<->x1 free 0.556 0.181 3.077 0.002 0.202 0.910 x2<->x2 free 1.269 0.172 7.370 0.000 0.931 1.606 x3<->x3 free 0.881 0.131 6.744 0.000 0.625 1.136 x4<->x4 free 0.446 0.070 6.328 0.000 0.308 0.584 x5<->x5 free 0.502 0.083 6.019 0.000 0.339 0.666 x6<->x6 free 0.263 0.058 4.518 0.000 0.149 0.377 x7<->x7 free 0.849 0.113 7.516 0.000 0.628 1.071 x8<->x8 free 0.516 0.094 5.469 0.000 0.331 0.701 x9<->x9 free 0.656 0.118 5.573 0.000 0.426 0.887 Intercept (reference component) type estimate std.error z-value P(>|z|) lower upper x1<-1 free 4.914 0.095 51.569 0.000 4.727 5.101 x2<-1 free 6.087 0.080 75.899 0.000 5.930 6.245 x3<-1 free 2.487 0.093 26.780 0.000 2.305 2.669 x4<-1 free 2.778 0.087 31.915 0.000 2.608 2.949 x5<-1 free 4.035 0.103 39.171 0.000 3.833 4.237 x6<-1 free 1.926 0.075 25.776 0.000 1.779 2.072 x7<-1 free 4.432 0.087 51.183 0.000 4.263 4.602 x8<-1 free 5.569 0.074 75.578 0.000 5.425 5.714 x9<-1 free 5.409 0.070 77.099 0.000 5.272 5.547 Coefficient Test (Group = "Grant-White", Std.Error = "sandwich") Factor Loading (increment component) type estimate std.error z-value P(>|z|) lower upper x1<-visual fixed 0.000 - - - - - x2<-visual pen 0.000 - - - - - x3<-visual pen 0.000 - - - - - x4<-textual fixed 0.000 - - - - - x5<-textual pen 0.000 - - - - - x6<-textual pen 0.000 - - - - - x7<-speed fixed 0.000 - - - - - x8<-speed pen 0.000 - - - - - x9<-speed pen 0.000 - - - - - Covariance (increment component) type estimate std.error z-value P(>|z|) lower upper textual<->visual free 0.020 0.144 0.136 0.892 -0.263 0.303 speed<->visual free 0.144 0.105 1.363 0.173 -0.063 0.351 speed<->textual free 0.050 0.108 0.461 0.645 -0.163 0.263 Variance (increment component) type estimate std.error z-value P(>|z|) lower upper visual<->visual free -0.085 0.198 -0.427 0.669 -0.473 0.304 textual<->textual free -0.011 0.167 -0.063 0.950 -0.337 0.316 speed<->speed free 0.170 0.094 1.801 0.072 -0.015 0.355 x1<->x1 free 0.094 0.178 0.530 0.596 -0.254 0.442 x2<->x2 free -0.329 0.221 -1.490 0.136 -0.761 0.104 x3<->x3 free -0.277 0.138 -2.000 0.045 -0.548 -0.006 x4<->x4 free -0.103 0.094 -1.101 0.271 -0.286 0.080 x5<->x5 free -0.126 0.103 -1.220 0.223 -0.327 0.076 x6<->x6 free 0.174 0.093 1.874 0.061 -0.008 0.356 x7<->x7 free -0.250 0.133 -1.886 0.059 -0.510 0.010 x8<->x8 free -0.109 0.142 -0.769 0.442 -0.387 0.169 x9<->x9 free -0.126 0.142 -0.884 0.377 -0.404 0.153 Intercept (increment component) type estimate std.error z-value P(>|z|) lower upper visual<-1 free 0.050 0.132 0.377 0.706 -0.209 0.309 textual<-1 free 0.576 0.120 4.789 0.000 0.340 0.812 speed<-1 free -0.072 0.089 -0.807 0.419 -0.245 0.102 x1<-1 pen 0.000 - - - - - x2<-1 pen 0.000 - - - - - x3<-1 pen -0.531 0.117 -4.520 0.000 -0.761 -0.301 x4<-1 pen 0.000 - - - - - x5<-1 pen 0.000 - - - - - x6<-1 pen 0.000 - - - - - x7<-1 pen -0.440 0.108 -4.065 0.000 -0.652 -0.228 x8<-1 pen 0.000 - - - - - x9<-1 pen 0.000 - - - - -  In this example, we can see that all of the loadings are invariant across the two groups. However, the intercepts of x3 and x7 seem to be not invariant. The $summarize() method also shows the result of significance tests for the coefficients. In lslx, the default standard errors are calculated based on sandwich formula whenever raw data is available. It is generally valid even when the model is misspecified and the data is not normal. However, it may not be valid after selecting an optimal penalty level.