cli.ComputeGraphNetwork

ComputeGraphNetwork(in_dataset, id_column, desc_columns, out_folder=’./graph_results/’, verbose=False, overwrite=False, save_parameters=False, plot_distribution=False, import_data=False, layout=NetworkLayout.Spring, weight=‘membership’)

Graph Network Clustering Visualization

ComputeGraphNetwork is a script computing an undirected weighted graph network from fuzzy clustering c-partitioned membership matrix. It is designed to work seemlessly with FuzzyClustering. Mapping membership matrices to a graph network allows the future use of graph theory statistics such as shortest path, betweenness centrality, etc. The concept of this script was initially proposed in [1].

Layout Algorithms

In order to generate a graph network, the nodes positions need to be determined in relation with their connections to other nodes (and the weigth of those connections). Those connections are also called edges and contain a weight in the case of a weighted graph network. Possible algorithms to choose from are :

Kamada Kawai Layout: Use the Kamada-Kawai path-length cost-function. Not : the optimal solution for large network as it is computer intensive. For details, see [2].

Spectral Layout: Position is determined using the eigenvectors of the : graph Laplacian. For details, see [2].

Spring Layout: Use the Fruchterman-Reingold force-directed algorithm. : Suitable for large network with high number of nodes. For details, see [2]. This is the default method.

Importing Data Within The .gml File

If the –import-data flag is set to True, the descriptive data will be imported within the .gml file. The imported data will be stored as node’s attributes. This is useful for future use of the graph network in visualization scripts or in statistical analysis (view AverageWeightedPath or Plsr). This ensure a robust handling of data and reduce the probability of data mismatch between subjects.

References

[1] Ariza-Jiménez, L., Villa, L. F., & Quintero, O. L. (2019). Memberships : Networks for High-Dimensional Fuzzy Clustering Visualization., Applied Computer Sciences in Engineering (Vol. 1052, pp. 263–273). Springer International Publishing.(https://doi.org/10.1007/978-3-030-31019-6_23)

[2] NetworkX Documentation (https://networkx.org/documentation/stable/reference/drawing.html)

Example Usage

ComputeGraphNetwork --in-dataset cluster_membership.xlsx
--id-column subjectkey --desc-columns 1 --out-folder output/

For large graphs (~10 000 nodes), it might take ~5 mins to run using the spring layout and depending on your hardware.

Parameters

in_dataset : Input dataset containing membership values for each clusters.

id_column : Name of the column containing the subject’s ID tag. Required for proper handling of IDs.

desc_columns : Number of descriptive columns at the beginning of the dataset.

out_folder : Path of the folder in which the results will be written. If not specified, current folder and default name will be used.

verbose : If true, produce verbose output.

overwrite : If true, force overwriting of existing output files.

save_parameters : If true, save the parameters used in a .txt file.

plot_distribution : If true, will plot the membership distribution and delta.

import_data : If true, will import the data from the input dataset within the graph network file.

layout : Layout algorithm to determine the nodes position.

weight : Name of the column containing the edge weight. Default is ‘membership’.