Skip to content

Introduction to NeuroStatX

Loading and Preparing Data

The NeuroStatX package provides modular data loading utilities for tabular and network-based data commonly encountered in neurocognitive and connectomic research. This tutorial introduces the two main loader classes:

  • DatasetLoader for structured behavioral, cognitive, or demographic data
  • GraphLoader for network data (e.g., constructing a graph from fuzzy membership values)

DatasetLoader

The DatasetLoader class is designed to be able to contain tabular-like data and access various kind of basic information automatically. You can also interact with the data, apply functions, and save it in multiple formats.

Here is a brief list of the underlying functions:

  • Load data from any formats.
  • Import data from existing objects.
  • Access metadata.
  • Join dataset together.
  • Save in any formats.
  • Apply custom functions.

Basic Usage

To cover the basic usage of the DatasetLoader class, let’s use the data/example.csv spreadsheet. You can find this dataset on the data/ folder on the NeuroStatX GitHub. It contains 50 dummy subjects, with their sex, age, cognitive, and behavioral scores.

from neurostatx.io.loader import DatasetLoader
# Initialize the loader
df = DatasetLoader()
# Load the dataframe
df.load_data("data/example.csv")

Accessing basic metadata

Our example dataset is currently loaded, we can access some basic information, such as the number of subjects or the number of variables it contains.

print(df.get_metadata())

Filtering methods

Once loaded, one might be interested in dropping some columns, or extracting only the descriptive columns from the dataset. The DatasetLoader already have built-in functions to do this. Let’s start by viewing the top 5 rows of the data.

df.get_data().head(5)

We can see two descriptive columns, Age and Sex. Now, let’s try extracting them.

df.get_descriptive_columns([1,2]).head(5) # Since they are at position 1 and 2 in the columns

However, they were not removed from the actual dataset, simply extracted. Let’s truly remove them from the dataset now using the drop_columns() function.

df.drop_columns([1,2])
df.get_data().head(5)

As you can see, they were successfully removed from the dataset! Now you get the basics for loading and manipulating datasets. In the following tutorials, you will also learn to apply functions, and use this class to process and organize your data! Let’s move to the GraphLoader class now.


GraphLoader

The GraphLoader class is a versatile class to load, build, save, and visualize graph network object. Similarly to the DatasetLoader class, any functions can be applied to this class, assuming it is designed to run on a graph network object. Here is a brief list of currently implemented functions:

  • Load a graph network from a file in any format.
  • Build a graph from an edge list.
  • Compute the layout of the graph.
  • Add/fetch attributes to nodes or edges.
  • Visualize the graph network.
  • Save the graph network in any format.

Basic Usage

Since we do not have currently a graph network, let’s build one from scratch, starting by creating a small dataframe then feeding it to the GraphLoader class.

import pandas as pd
from neurostatx.io.loader import GraphLoader
# Dummy dataframe containing 3 columns and 3 rows.
data = pd.DataFrame({
'source': ['c1', 'c2', 'c3'],
'target': ['c2', 'c3', 'c1'],
'weight': [0.2, 0.5, 0.7]
})
# Build our graph.
G = GraphLoader().build_graph(data, edge_attr="weight")

We now have a built graph network to play around with! As you can see in the code snippet above, it contains 3 nodes (a, b, c), with three weighted edges (a -> b, b -> c, and c -> a). Now let’s try visualizing it. First, we need to compute a layout, meaning we need to determine where our nodes will be in our network space. There is multiple options to do this, but let’s stick to the default for now (e.g., spring layout). For weighted graph, this allows the relative position of each node to be determined using the edge’s weight.

G.layout(weight="weight")
G.visualize("test.png", weight="weight")

If you open test.png, you will see your graph network with weight as the edge color and size! GraphImage

If you want, you can play around with the visualize() function to get the look you desire. Here a modified version of the previous graph. Of course, the node positions will stay the same, since we did not recompute the layout.

G.visualize("test.png",
weight="weight",
centroid_node_shape=1000,
centroid_node_color="lightgray",
colormap="viridis",
title="Beautiful Graph Network",
legend_title="Weight")

GraphImageCustom

Adding node and edge attributes

An interesting aspect of graph network is that you can append data to both nodes and edges. For example, if nodes are subjects, we could potentially add demographic data such as age to each node. First, let’s look at which data is currently contained within each node.

G.get_graph().nodes(data=True)

We see that only the node positions is accessible for now. Those came from our previous computation of the graph layout. Let’s now add age to each node and see what happen.

# Create a dictionary of age for each node.
age = {
"c1": {"age": 20},
"c2": {"age": 26},
"c3": {"age": 32}
}
# Add it to the graph object.
G.add_node_attribute(age)
G.get_graph().nodes(data=True)

You can see that each node has now its associated age. To add edge attributes, you can repeat the previous process but use the add_edge_attributes() function. Here is a small example.

# Create a dictionary of pair of nodes and the edge attribute.
length = {
("c1", "c2"): {"length": 100},
("c2", "c3"): {"length": 200},
("c3", "c1"): {"length": 150}
}
# Add it to the graph.
G.add_edge_attribute(length)
G.get_graph().edges(data=True)

You can even see the previous weight data that we initially set! Now, we could recompute a graph layout using the length rather than weight, and visualize the results.

# Layout computation.
G.layout(weight="length")
G.visualize("test_length.png", weight="length")

GraphNetworkLength

There it is! You recomputed and visualized the graph network using the length data rather than the weight. Now, you can save your graph network using save_graph(), and then load it back using load_graph() similarly to the DatasetLoader.