viz
plot_clustering_results(lst, title, metric, output, errorbar=None, annotation=None)
Function to plot goodness of fit indicators resulting from a clustering model. Resulting plot will be saved in the output folder specified in function’s arguments.
Parameters
lst : List of values to plot.
title : Title of the plot.
metric : Metric name.
output : Output filename.
errorbar : List of values to plot as errorbar (CI, SD, etc.). Defaults to None.
annotation : Annotation to add directly on the plot. Defaults to None.
plot_dendrogram(X, output, title=‘Dendrograms’, annotation=None)
Function to plot a dendrogram plot showing hierarchical clustering. Useful to visually determine the appropriate number of clusters. Adapted from: https://towardsdatascience.com/cheat-sheet-to-implementing-7-methods-for-selecting-optimal-number-of-clusters-in-python-898241e1d6ad
Parameters
X : Data on which clustering will be performed.
output : Output filename and path.
title : Title for the plot. Defaults to ‘Dendrograms’.
annotation : Annotation to add directly on the plot. Defaults to None.
plot_parallel_plot(X, labels, output, mean_values=False, cmap=‘magma’, title=‘Parallel Coordinates plot.’)
Function to plot a parallel coordinates plot to visualize differences between clusters. Useful to highlight significant changes between clusters and interpret them. Adapted from: https://towardsdatascience.com/the-art-of-effective-visualization-of-multi-dimensional-data-6c7202990c57
Parameters
X : Input dataset of shape (S, F).
labels : Array of hard membership value (S, ).
output : Filename of the png file.
mean_values : If true, will plot the mean values of each features for each clusters. Defaults to False.
cmap : Colormap to use for the plot. Defaults to ‘magma’. See https://matplotlib.org/stable/tutorials/colors/colormaps.html
title : Title of the plot. Defaults to ‘Parallel Coordinates plot.’
radar_plot(X, labels, output, frame=‘circle’, title=‘Radar plot’, cmap=‘magma’)
Function to plot a radar plot for all features in the original dataset stratified by clusters. T-test between clusters’ mean within a feature is also computed and annotated directly on the plot. When plotting a high number of clusters, plotting of significant annotation is polluting the plot, will be fixed in the future.
Parameters
X : Input dataset of shape (S, F).
labels : Array of hard membership value (S, ).
output : Filename of the png file.
frame : Shape of the radar plot. Defaults to ‘circle’. Choices are ‘circle’ or ‘polygon’.
title : Title of the plot. Defaults to ‘Radar plot’.
cmap : Colormap to use for the plot. Defaults to ‘magma’. See https://matplotlib.org/stable/tutorials/colors/colormaps.html
sort_int_labels_legend(ax, title=None)
Function automatically reorder numerically labels with matching handles in matplotlib legend.
Parameters
ax : Axes object.
title : Title of the legend. Defaults to None.
Returns
ax.legend : Axes legend object