Skip to content

ConfirmatoryFA

ConfirmatoryFA(in_dataset, id_column, desc_columns, out_folder=’./ResultsCFA/’, loadings_df=None, model=None, threshold=0.4, iterations=None, mean=False, median=False, verbose=False, save_parameters=False, overwrite=False)

Confirmatory Factorial Analysis

ConfirmatoryFA is a script that can be used to perform a confirmatory factorial analysis (CFA) to test a hypothesized model of the relationships between observed variables and latent constructs. The script will output factor scores and statistics of goodness of fit such as Chi-square, RMSEA, CFI and TLI. The script will also generate a html report containing the results of the analysis. A good reference to understand those metrics can be accessed in [1].

Using EFA Scores Or CFA Scores

Both method can be used to derive factor scores. Since there is no clear consensus surrounding which is preferred (see [2]) the script will output both factor scores. As shown in [3], both methods highly correlate with one another. It then comes down to the user’s preference.

Input Specifications

Dataset can contain multiple descriptive rows before the variables of interest. Simply specify the number of descriptive rows using –desc-columns. Rows with missing values will be removed by default, please select the mean or median option to impute missing data (be cautious when doing this).

References

[1] Costa, V., & Sarmento, R. Confirmatory Factor Analysis. https://arxiv.org/ftp/arxiv/papers/1905/1905.05598.pdf

[2] https://stats.stackexchange.com/questions/346499/whether-to-use-efa-or-cfa-to-predict-latent-variables-scores

[3] https://github.com/gagnonanthony/NeuroStatX/pull/11

Example Usage

ConfirmatoryFA --in-dataset dataset.csv --id-column ID --desc-columns 1
--out-folder ./output/ --loadings-df loadings.csv --threshold 0.40
--mean -v -f

Parameters

in_dataset : Input dataset(s) to use in the factorial analysis. If multiple files are provided as input, will be merged according to the subject id columns. For multiple inputs, use this: –in-dataset df1 –in-dataset df2 […]

id_column : Name of the column containing the subject’s ID tag. Required for proper handling of IDs and merging multiple datasets.

desc_columns : Number of descriptive columns at the beginning of the dataset to exclude in statistics and descriptive tables.

out_folder : Path of the folder in which the results will be written. If not specified, current folder and default name will be used (e.g. = ./output/).

loadings_df : Filename of the dataframe containing the loadings of the EFA analysis. Columns must be factors and rows variables.

model : Model specification for the CFA analysis. Must be provided within brackets. (ex: –model “factor1 =~ var1 + var2 + var3” –model “factor2 =~ var4 + var5”)

threshold : Threshold to use to determine variables to include for each factor in CFA analysis. (ex: if set to 0.40, only variables with loadings higher than 0.40 will be assigned to a factor in the CFA model).

iterations : Number of iterations to perform the bootstrapping of the model.

mean : Impute missing values in the original dataset based on the column mean.

median : Impute missing values in the original dataset based on the column median.

verbose : If true, produce verbose output.

save_parameters : If true, save the parameters used in the analysis in a text file.

overwrite : If true, force overwriting of existing output files.