flowstab.network_clustering¶

Module for network clustering and related operations

Attributes¶

logger

Classes¶

`BaseClustering`	Base class for autocovariance matrix clustering with the Louvain algorithm.
`Clustering`	Symmetric Clustering.
`FlowIntegralClustering`	FlowIntegralClustering.
`Partition`	Node partition as a list of node sets and a node to cluster dict.
`SparseClustering`	Symmetric Clustering using sparse matrices.

Functions¶

`avg_norm_var_information`(clusters_lists_list[, ...])	Returns the average normalized variation of information for the
`jaccard_distance`(clusters1, clusters2)	Returns the Jaccard distance between two clustering.
`n_random_seeds`(n)
`norm_mutual_information`(clusters1, clusters2)	Normalized mutial information between two non-overlapping clustering.
`norm_var_information`(clusters1, clusters2[, N, ...])	Normalized variation of information between two non-overlapping clustering.
`run_multi_louvain`(clustering, num_repeat, **kwargs)	Helper function to run multiple (serial) instances of the Louvain algorithm for the
`sort_clusters`(cluster_list_to_sort, cluster_list_model)	Quick heuristic to sort a list of clusters in order to closely match another list
`static_clustering`(A[, t, rnd_seed, discrete_time_rw, ...])	Initializes a clustering instance to optimize the continuous time

Module Contents¶

class flowstab.network_clustering.BaseClustering(T=None, p1=None, p2=None, S=None, source_cluster_list=None, source_node_to_cluster_dict=None, target_cluster_list=None, target_node_to_cluster_dict=None, rnd_state=None, rnd_seed=None, S_threshold=None)[source]¶

Base class for autocovariance matrix clustering with the Louvain algorithm.

This class provides a base implementation for clustering autocovariance matrices using the Louvain algorithm. It can be used to cluster nodes in a network based on their autocovariance matrices.

Initialize a BaseClustering object.

Parameters:

T (numpy.ndarray) – NxN transition matrix, T[i,j] is the probability of going from node i to node j between t1 and t2.
p1 (numpy.ndarray) – Nx1 probability density at t1. Default is the uniform probability.
p2 (numpy.ndarray) – Nx1 probability density at t2. Default is p1 @ T.
S (numpy.ndarray) – NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2).
source_cluster_list (list) – List of sets of nodes describing the source partition. Default is singleton clusters.
source_node_to_cluster_dict (dict) – Dictionary with mapping between nodes and source cluster number. Default is singleton clusters.
target_cluster_list (list) – List of sets of nodes describing the target partition. Default is singleton clusters.
target_node_to_cluster_dict (dict) – Dictionary with mapping between nodes and target cluster number. Default is singleton clusters.
rnd_state (np.random.RandomState) – Random state object. Default creates a new one.
rnd_seed (int) – Seed for the random object. Default is a random seed.
S_threshold (float) – Smallest values of S. Used to trim insignificantly small values.
note:: (..) – At least T or S must be provided to initialize the clustering.
note:: – Clusters can either be initialized with a cluster_list or a node_to_cluster_dict.

abstract compute_stability(R=None)[source]¶

Compute the stability of the clustering.

Parameters:: R (numpy.ndarray) – The clustered autocovariance matrix.
Returns:: stability – The stability of the clustering.
Return type:: float

find_louvain_clustering(delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, print_num_loops=False)[source]¶

Returns n_meta_loop

rnd_seed is used to set to state of the random number generator. Default is to keep the current state

S_threshold = None[source]¶

p1 = None[source]¶

p2 = None[source]¶

source_part[source]¶

target_part[source]¶

class flowstab.network_clustering.Clustering(p1=None, p2=None, T=None, S=None, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, rnd_seed=None, S_threshold=None)[source]¶

Bases: BaseClustering

Symmetric Clustering.

Finds the best partition that optimizes the stability between two times defined as the trace of the clustered autocovariance matrix.

At least T must be given to initialize the clustering.

Clusters can either be initilized with a cluster_list or a node_to_cluster_dict.

Parameters:

T (numpy.ndarrays) – NxN transition matrix, T[i,j] is the probability of going from node i to node j between t1 and t2.
p1 (numpy.ndarrays) – Nx1 probability density at t1. Default is the uniform probability.
p2 (numpy.ndarrays) – Nx1 probability density at t2. Default is p1 @ T.
S (numpy.ndarrays) – NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2).
cluster_list (list) – list of set of nodes describing the partition. Default is singleton clusters.
node_to_cluster_dict (dict) – dictionary with mapping between nodes and cluster number. Default is singleton clusters.
rnd_state (np.random.RandomState) – Random state object. Default creates a new one.
rnd_seed (int) – Seed for the random object. Default is a random seed.
S_threshold (float) – Smallest values of S. Used to trim insignificantly small values.

compute_stability(R=None)[source]¶

Returns the stability of the clusters given in cluster_list computed between times t1 and t2

Here, for symmetric clustering, we only care about the diagonal of R.

Returns n_meta_loop

rnd_seed is used to set to state of the random number generator. Default is to keep the current state

partition[source]¶

target_part = None[source]¶

class flowstab.network_clustering.FlowIntegralClustering(T_list=None, p1=None, time_list=None, T_inter_list=None, integral_time_grid=None, reverse_time=False, rtol=1e-08, verbose=False)[source]¶

FlowIntegralClustering.

Class to finds the best partition that optimizes the integral of the autocovariance between two times computed as

int_t1^t2 [diag(p1) @ T(t1,t) @ np.diag(1/pt) @ T(t1,t).T @ np.diag(p1) - outer(p1,p1)] dt

T_list or T_inter_list must be given to initialize the clustering.

Clusters can either be initilized with a cluster_list or a node_to_cluster_dict.

Parameters:

T_list (list of scipy.sparse.csr matrices or numpy.ndarray) – list of K succesive NxN transition matrices, Tk[i,j] is the probability of going from node i to node j between t1 and tk+1.
T_inter_list (list of scipy.sparse.csr matrices or numpy.ndarray) – list of K succesive NxN inter event transition matrices, T_inter_k[i,j] is the probability of going from node i to node j between tk and tk+1.
p1 (numpy.ndarrays) – Nx1 probability density at t1. Default is the uniform probability.
time_list (list or numpy.array) – list of K+1 time instants corresponding to the tk of the transition matrices. Default is unit times.
integral_time_grid (list) – list of times until which to compute the integral. The final times and indices used are stored in _t_integral_grid and _k_integral_grid. Default is [time_list[0], time_list[-1]]
reverse_time (bool) – Whether to reverse time when computing T_list and I_list from T_inter_list, for backward flow stability. Default is False (forward flow stability). If T_list is provided in input, it must have been computed with the corresponding time direction.
rtol (float) – Relative tolerance to set I values to zero. Values smaller than I.max()*rtol are set to zero to keep I sparse.

find_louvain_clustering(k=0, delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, S_threshold=None)[source]¶

Louvain algorithm to find the best partition.

The best partition is saved in `self.partition[k]

Parameters:

k (int) – Index of the covariance integral to cluster. self.I_list[k] will be used.
delta_r_threshold (float, optional) – Minimal value for an improvement of the quality function. The default is np.finfo(float).eps.
n_meta_iter_max (int, optional) – Maximum number of meta iterations. The default is 1000.
n_sub_iter_max (int, optional) – Maximum number of sub iterations. The default is 1000.
verbose (bool, int, optional) – Degree of verbosity. The default is False.
rnd_seed (int) – Seed for the random object. Default is a random seed.
save_progress (bool, optional) – Whether to save the progress in the Clustering.partition_progress and Clustering.stability_progress. The default is False.
cluster_list (list of sets, optional) – list of set of nodes describing the partition. Default is singleton clusters.
node_to_cluster_dict (dict, optional) – dictionary with mapping between nodes and cluster number. Default is singleton clusters.
rnd_state (np.random.RandomState) – Random state object. Default creates a new one.
S_threshold (float.) – Smallest values of the covariance. Used to trim insignificantly small values. Default is None (no thresholding)

Returns:

n_loop – Number of meta loops of the louvain algorithm.

Return type:

int

clustering[source]¶

is_nparray = False[source]¶

is_sparse = False[source]¶

is_sparse_stoch = False[source]¶

num_nodes[source]¶

p1 = None[source]¶

partition[source]¶

reversed_time = False[source]¶

time_list[source]¶

class flowstab.network_clustering.Partition(num_nodes: int, cluster_list: Sequence[set[int]] | None = None, node_to_cluster_dict: dict | None = None, check_integrity: bool = False)[source]¶

Node partition as a list of node sets and a node to cluster dict.

A Partition object represents a node partition that can be described as a list of node sets and a node to cluster dict. It provides methods to manipulate and analyze the partition.

Initialize a Partition object.

Parameters:

num_nodes (int) – Number of nodes in the partition.
cluster_list (Collection | None, optional) – List of clusters with each cluster being a set of nodes.
node_to_cluster_dict (dict | None, optional) – Mapping that maps each node to the index of the corresponding cluster in cluster_list.
check_integrity (bool, optional) – If True, check that all nodes are in only one cluster.
note:: (..) – If both cluster_list and node_to_cluster_dict are provided, a ValueError is raised. If neither is provided, the default clustering is one node per cluster.

__repr__()[source]¶

Return a string representation of the Partition object.

Returns:: A string representation of the Partition object.
Return type:: str

__str__()[source]¶

check_integrity()[source]¶

Check the integrity of the partition.

This method checks that all nodes are in only one cluster, i.e. non-overlapping clusters whose union is the full set of nodes.

Raises:: ValueError – If the partition is not valid.

get_indicator(sparse=True)[source]¶

Get the indicator matrix for the partition.

Parameters:: sparse (bool, optional) – If True, return a sparse indicator matrix. Default is True.
Returns:: The indicator matrix for the partition.
Return type:: ndarray or sparse matrix

get_num_clusters(non_empty=False)[source]¶

Get the number of clusters in the partition.

Parameters:: non_empty (bool, optional) – If True, only count non-empty clusters. Default is False.
Returns:: The number of clusters in the partition.
Return type:: int

iter_cluster_node_index()[source]¶

Iterate over the node indices for each cluster.

Yields:: list – The node indices for each cluster.

move_node(node, c_f)[source]¶

Move a node to cluster c_f.

Parameters:

node (int) – The node to move.
c_f (int) – The index of the cluster to move the node to.

remove_empty_clusters()[source]¶

Remove empty clusters from cluster_list and node_to_cluster_dict and reindex clusters.

This method removes any empty clusters from the cluster_list and updates the node_to_cluster_dict accordingly. It also reindexes the clusters to ensure that the indices are consecutive.

num_nodes[source]¶

class flowstab.network_clustering.SparseClustering(p1=None, p2=None, T=None, S=None, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, rnd_seed=None)[source]¶

Bases: Clustering

Symmetric Clustering using sparse matrices.

TODO: fix formatting; complete if needed

Finds the best partition that optimizes the stability between two times defined as the trace of the clustered autocovariance matrix.

At least T must be given to initialize the clustering.

Clusters can either be initilized with a cluster_list or a node_to_cluster_dict.

Parameters:

T (scipy csr_sparse matrix) – NxN transition matrix, T[i,j] is the probability of going from node i to node j between t1 and t2.
p1 (numpy.ndarrays) – Nx1 probability density at t1. Default is the uniform probability.
p2 (numpy.ndarrays) – Nx1 probability density at t2. Default is p1 @ T.
S (SparseAutocovMat) – NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2).
cluster_list (list) – list of set of nodes describing the partition. Default is singleton clusters.
node_to_cluster_dict (dict) – dictionary with mapping between nodes and cluster number. Default is singleton clusters.
rnd_state (np.random.RandomState) – Random state object. Default creates a new one.
rnd_seed (int) – Seed for the random object. Default is a random seed.
S_threshold (float) – Smallest values of S. Used to trim insignificantly small values.

TODO: adding docstring

compute_stability(R=None)[source]¶

Returns the stability of the clusters given in cluster_list computed between times t1 and t2

Here, for symmetric clustering, we only care about the diagonal of R.

p1 = None[source]¶

p2 = None[source]¶

partition[source]¶

source_part[source]¶

target_part = None[source]¶

flowstab.network_clustering.avg_norm_var_information(clusters_lists_list, num_samples=None)[source]¶

Returns the average normalized variation of information for the clusters_list in clusters_lists_list.

By default uses all N*(N-1)/2 combinations possible of cluster_list pairs, where N = len(clusters_lists_list) or uses num_samples pairs if num_samples < N*(N-1)/2.

flowstab.network_clustering.jaccard_distance(clusters1, clusters2)[source]¶

Returns the Jaccard distance between two clustering.

inputs can be node_to_cluster dictionaries or cluster lists of node sets

flowstab.network_clustering.n_random_seeds(n)[source]¶

flowstab.network_clustering.norm_mutual_information(clusters1, clusters2)[source]¶

Normalized mutial information between two non-overlapping clustering.

The mutual information is normalized by the max of the two individual entropies.

\[NMI = (H(C1)+H(C2)-H(C1,C2))/max(H(C1),H(C2))\]

inputs can be node_to_cluster dictionaries, cluster lists of node sets or instances of Partition.

flowstab.network_clustering.norm_var_information(clusters1, clusters2, N=None, use_clust_list=False)[source]¶

Normalized variation of information between two non-overlapping clustering.

\[\hat{V}(C_1,C_2) = ({H(C1|C2)+H(C2|C1)})/{log_2 N}\]

inputs can be node_to_cluster dictionaries, cluster lists of node sets or instances of Partition.

flowstab.network_clustering.run_multi_louvain(clustering, num_repeat, **kwargs)[source]¶

Helper function to run multiple (serial) instances of the Louvain algorithm for the clustering given in clutering.

Parameters:

clustering (Clustering or SparseClustering instance)
num_repeat (int) – Number of repetition of the algorithm.
**kwargs (key word args) – Arguments passed when initializing the clusterings.

Returns:

n_loops (list) – list with the number of louvain loop for each repetition.
cluster_lists (list) – list with the cluster lists of each repetition..
stabilities (list) – list with the value of the stability for each repetition.
seeds (list) – list with the random seeds used in each repetition.

flowstab.network_clustering.sort_clusters(cluster_list_to_sort, cluster_list_model, thresh_ratio=0.3)[source]¶: Quick heuristic to sort a list of clusters in order to closely match another list

flowstab.network_clustering.static_clustering(A, t=1, rnd_seed=None, discrete_time_rw=False, linearized=False, directed=False)[source]¶

Initializes a clustering instance to optimize the continuous time Markov stability from the graph given by the adjacency matrix A.

Parameters:

A (scipy sparse csr matrix or numpy ndarray) – Adjacency matrix.
t (int or float, optional) – Markov time (resolution parameter). The default is 1.
rnd_seed (int, optional) – The default is None.
discrete_time_rw (bool, optional) – If true, powers of the transition matrix to the power of t are used for varying resolution. t must be an int. The default is False.
linearized (bool, optional) – If true, the linearized version of the Markov Stability is used. if linearized=false and discrete_time_rw==false, the matrix exponential of the random walk Laplacian is used to compute the transition matrix (slow). The default is False.
directed (bool, optional) – If true, the network must be strongly connected. The default is False.

Raises:

TypeError – ‘A must be numpy array or scipy csr.’.
ValueError – ‘sum of degrees equal 0’.

Returns:

Clustering instance initialized for Markov Stability optimization.

Return type:

Instance of `Clustering’ or ‘SparseClustering’

flowstab.network_clustering.logger[source]¶

Flow Stability

Documentation

flowstab.network_clustering¶

Attributes¶

Classes¶

Functions¶

Module Contents¶