cli.PartialLeastSquareRegression

PartialLeastSquareRegression(in_graph, out_folder, attributes=None, weight=‘membership’, splits=10, permutations=1000, scoring=ScoringMethod.r2, processes=1, plot_distributions=False, verbose=False, save_parameters=False, overwrite=False)

Partial Least Square Regression (PLSR)

Plsr performs a Partial Least Square Regression (PLSR) on a graph using the nodes’ attributes as predictors and the edges’ weights as response variable. The script will perform a cross-validation to determine the optimal number of components to use for the PLSR model. It will then perform a permutation testing to determine if the model is statistically significant. Finally, it will output the PLSR coefficients and statistics as well as plots of the distributions of the attributes and edges’ weights and the PLSR coefficients.

Preprocessing

The script will scale the data to unit variance and zero mean and will perform a log transformation on the edges’ weights (for now, it assumes that the weights represent a membership value resulting from a fuzzy clustering analysis).

Nodes’ Attributes

The script takes only one graph file as input. The graph file must be in .gexf format. The script will then fetch the attributes from the graph file and will perform the PLSR analysis on the attributes and edges’ weights. If no attributes are provided, the script will use all attributes found in the graph file. To set attributes to the nodes in the graph file, please see AddNodesAttributes.

Scoring Options

The script will perform a permutation testing to determine if the model is statistically significant. The script will compute the p-value for the permutation testing using the R2 score by default. However, the user can also choose multiple scores to compute the p-value. The available scores can be seen in [1]. The equation used to compute the single-tailed p-value is:

p-value = ∑(score_perm >= score) / (nb_permutations)

Coefficient Significance

The script will also compute the p-value for the coefficients using the permutation testing. The p-value for the coefficients is computed by comparing the coefficients obtained from the PLSR model with the coefficients obtained from the permutation testing. The equation used to compute the two-tailed p-value is:

p-value = ∑(abs(coef_perm) >= abs(coef)) / (nb_permutations)

References

[1] Scikit-learn scoring methods (https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter)

Example Usage

PartialLeastSquareRegression --in-graph graph.gexf
--out-folder output_folder -v -s

Parameters

in_graph : Graph file containing the data for the PLSR model.

out_folder : Output folder.

attributes : Attributes names to include in the PLSR model. If None are provided, all attributes will be included.

weight : Edge weight to use for the PLSR model.

splits : Number of splits to use for the cross-validation.

permutations : Number of permutations to use for the permutation testing.

scoring : Scoring method to use for the permutation testing.

processes : Number of processes to use for the cross-validation.

verbose : If true, produce verbose output.

plot_distributions : If true, will plot the distributions of the edges’ weights.

save_parameters : If true, will save input parameters to .txt file.

overwrite : If true, force overwriting of existing output files.