flowstab.network_clustering =========================== .. py:module:: flowstab.network_clustering .. autoapi-nested-parse:: Module for network clustering and related operations Copyright (C) 2021 Alexandre Bovet Attributes ---------- .. autoapisummary:: flowstab.network_clustering.logger Classes ------- .. autoapisummary:: flowstab.network_clustering.BaseClustering flowstab.network_clustering.Clustering flowstab.network_clustering.FlowIntegralClustering flowstab.network_clustering.Partition flowstab.network_clustering.SparseClustering Functions --------- .. autoapisummary:: flowstab.network_clustering.avg_norm_var_information flowstab.network_clustering.jaccard_distance flowstab.network_clustering.n_random_seeds flowstab.network_clustering.norm_mutual_information flowstab.network_clustering.norm_var_information flowstab.network_clustering.run_multi_louvain flowstab.network_clustering.sort_clusters flowstab.network_clustering.static_clustering Module Contents --------------- .. py:class:: BaseClustering(T=None, p1=None, p2=None, S=None, source_cluster_list=None, source_node_to_cluster_dict=None, target_cluster_list=None, target_node_to_cluster_dict=None, rnd_state=None, rnd_seed=None, S_threshold=None) Base class for autocovariance matrix clustering with the Louvain algorithm. This class provides a base implementation for clustering autocovariance matrices using the Louvain algorithm. It can be used to cluster nodes in a network based on their autocovariance matrices. Initialize a BaseClustering object. :param T: NxN transition matrix, T[i,j] is the probability of going from node i to node j between t1 and t2. :type T: numpy.ndarray :param p1: Nx1 probability density at t1. Default is the uniform probability. :type p1: numpy.ndarray :param p2: Nx1 probability density at t2. Default is p1 @ T. :type p2: numpy.ndarray :param S: NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2). :type S: numpy.ndarray :param source_cluster_list: List of sets of nodes describing the source partition. Default is singleton clusters. :type source_cluster_list: list :param source_node_to_cluster_dict: Dictionary with mapping between nodes and source cluster number. Default is singleton clusters. :type source_node_to_cluster_dict: dict :param target_cluster_list: List of sets of nodes describing the target partition. Default is singleton clusters. :type target_cluster_list: list :param target_node_to_cluster_dict: Dictionary with mapping between nodes and target cluster number. Default is singleton clusters. :type target_node_to_cluster_dict: dict :param rnd_state: Random state object. Default creates a new one. :type rnd_state: np.random.RandomState :param rnd_seed: Seed for the random object. Default is a random seed. :type rnd_seed: int :param S_threshold: Smallest values of S. Used to trim insignificantly small values. :type S_threshold: float :param .. note::: At least `T` or `S` must be provided to initialize the clustering. :param .. note::: Clusters can either be initialized with a cluster_list or a node_to_cluster_dict. .. py:method:: compute_stability(R=None) :abstractmethod: Compute the stability of the clustering. :param R: The clustered autocovariance matrix. :type R: numpy.ndarray :returns: **stability** -- The stability of the clustering. :rtype: float .. py:method:: find_louvain_clustering(delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, print_num_loops=False) Returns n_meta_loop `rnd_seed` is used to set to state of the random number generator. Default is to keep the current state .. py:attribute:: S_threshold :value: None .. py:attribute:: p1 :value: None .. py:attribute:: p2 :value: None .. py:attribute:: source_part .. py:attribute:: target_part .. py:class:: Clustering(p1=None, p2=None, T=None, S=None, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, rnd_seed=None, S_threshold=None) Bases: :py:obj:`BaseClustering` Symmetric Clustering. Finds the best partition that optimizes the stability between two times defined as the trace of the clustered autocovariance matrix. At least `T` must be given to initialize the clustering. Clusters can either be initilized with a cluster_list or a node_to_cluster_dict. :param T: NxN transition matrix, T[i,j] is the probability of going from node i to node j between t1 and t2. :type T: numpy.ndarrays :param p1: Nx1 probability density at t1. Default is the uniform probability. :type p1: numpy.ndarrays :param p2: Nx1 probability density at t2. Default is p1 @ T. :type p2: numpy.ndarrays :param S: NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2). :type S: numpy.ndarrays :param cluster_list: list of set of nodes describing the partition. Default is singleton clusters. :type cluster_list: list :param node_to_cluster_dict: dictionary with mapping between nodes and cluster number. Default is singleton clusters. :type node_to_cluster_dict: dict :param rnd_state: Random state object. Default creates a new one. :type rnd_state: np.random.RandomState :param rnd_seed: Seed for the random object. Default is a random seed. :type rnd_seed: int :param S_threshold: Smallest values of S. Used to trim insignificantly small values. :type S_threshold: float .. py:method:: compute_stability(R=None) Returns the stability of the clusters given in `cluster_list` computed between times `t1` and `t2` Here, for symmetric clustering, we only care about the diagonal of R. .. py:method:: find_louvain_clustering(delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, print_num_loops=False) Returns n_meta_loop `rnd_seed` is used to set to state of the random number generator. Default is to keep the current state .. py:attribute:: partition .. py:attribute:: target_part :value: None .. py:class:: FlowIntegralClustering(T_list=None, p1=None, time_list=None, T_inter_list=None, integral_time_grid=None, reverse_time=False, rtol=1e-08, verbose=False) FlowIntegralClustering. Class to finds the best partition that optimizes the integral of the autocovariance between two times computed as int_t1^t2 [diag(p1) @ T(t1,t) @ np.diag(1/pt) @ T(t1,t).T @ np.diag(p1) - outer(p1,p1)] dt `T_list` or `T_inter_list` must be given to initialize the clustering. Clusters can either be initilized with a cluster_list or a node_to_cluster_dict. :param T_list: list of K succesive NxN transition matrices, Tk[i,j] is the probability of going from node i to node j between `t1` and `tk+1`. :type T_list: list of scipy.sparse.csr matrices or numpy.ndarray :param T_inter_list: list of K succesive NxN inter event transition matrices, T_inter_k[i,j] is the probability of going from node i to node j between `tk` and `tk+1`. :type T_inter_list: list of scipy.sparse.csr matrices or numpy.ndarray :param p1: Nx1 probability density at t1. Default is the uniform probability. :type p1: numpy.ndarrays :param time_list: list of K+1 time instants corresponding to the `tk` of the transition matrices. Default is unit times. :type time_list: list or numpy.array :param integral_time_grid: list of times until which to compute the integral. The final times and indices used are stored in _t_integral_grid and _k_integral_grid. Default is [time_list[0], time_list[-1]] :type integral_time_grid: list :param reverse_time: Whether to reverse time when computing T_list and I_list from T_inter_list, for backward flow stability. Default is False (forward flow stability). If T_list is provided in input, it must have been computed with the corresponding time direction. :type reverse_time: bool :param rtol: Relative tolerance to set I values to zero. Values smaller than I.max()*rtol are set to zero to keep I sparse. :type rtol: float .. py:method:: find_louvain_clustering(k=0, delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, S_threshold=None) Louvain algorithm to find the best partition. The best partition is saved in `self.partition[k] :param k: Index of the covariance integral to cluster. self.I_list[k] will be used. :type k: int :param delta_r_threshold: Minimal value for an improvement of the quality function. The default is np.finfo(float).eps. :type delta_r_threshold: float, optional :param n_meta_iter_max: Maximum number of meta iterations. The default is 1000. :type n_meta_iter_max: int, optional :param n_sub_iter_max: Maximum number of sub iterations. The default is 1000. :type n_sub_iter_max: int, optional :param verbose: Degree of verbosity. The default is False. :type verbose: bool, int, optional :param rnd_seed: Seed for the random object. Default is a random seed. :type rnd_seed: int :param save_progress: Whether to save the progress in the Clustering.partition_progress and Clustering.stability_progress. The default is False. :type save_progress: bool, optional :param cluster_list: list of set of nodes describing the partition. Default is singleton clusters. :type cluster_list: list of sets, optional :param node_to_cluster_dict: dictionary with mapping between nodes and cluster number. Default is singleton clusters. :type node_to_cluster_dict: dict, optional :param rnd_state: Random state object. Default creates a new one. :type rnd_state: np.random.RandomState :param S_threshold: Smallest values of the covariance. Used to trim insignificantly small values. Default is None (no thresholding) :type S_threshold: float. :returns: **n_loop** -- Number of meta loops of the louvain algorithm. :rtype: int .. py:attribute:: clustering .. py:attribute:: is_nparray :value: False .. py:attribute:: is_sparse :value: False .. py:attribute:: is_sparse_stoch :value: False .. py:attribute:: num_nodes .. py:attribute:: p1 :value: None .. py:attribute:: partition .. py:attribute:: reversed_time :value: False .. py:attribute:: time_list .. py:class:: Partition(num_nodes: int, cluster_list: Sequence[set[int]] | None = None, node_to_cluster_dict: dict | None = None, check_integrity: bool = False) Node partition as a list of node sets and a node to cluster dict. A Partition object represents a node partition that can be described as a list of node sets and a node to cluster dict. It provides methods to manipulate and analyze the partition. Initialize a Partition object. :param num_nodes: Number of nodes in the partition. :type num_nodes: int :param cluster_list: List of clusters with each cluster being a set of nodes. :type cluster_list: Collection | None, optional :param node_to_cluster_dict: Mapping that maps each node to the index of the corresponding cluster in `cluster_list`. :type node_to_cluster_dict: dict | None, optional :param check_integrity: If True, check that all nodes are in only one cluster. :type check_integrity: bool, optional :param .. note::: If both `cluster_list` and `node_to_cluster_dict` are provided, a ValueError is raised. If neither is provided, the default clustering is one node per cluster. .. py:method:: __repr__() Return a string representation of the Partition object. :returns: A string representation of the Partition object. :rtype: str .. py:method:: __str__() .. py:method:: check_integrity() Check the integrity of the partition. This method checks that all nodes are in only one cluster, i.e. non-overlapping clusters whose union is the full set of nodes. :raises ValueError: If the partition is not valid. .. py:method:: get_indicator(sparse=True) Get the indicator matrix for the partition. :param sparse: If True, return a sparse indicator matrix. Default is True. :type sparse: bool, optional :returns: The indicator matrix for the partition. :rtype: ndarray or sparse matrix .. py:method:: get_num_clusters(non_empty=False) Get the number of clusters in the partition. :param non_empty: If True, only count non-empty clusters. Default is False. :type non_empty: bool, optional :returns: The number of clusters in the partition. :rtype: int .. py:method:: iter_cluster_node_index() Iterate over the node indices for each cluster. :Yields: *list* -- The node indices for each cluster. .. py:method:: move_node(node, c_f) Move a node to cluster `c_f`. :param node: The node to move. :type node: int :param c_f: The index of the cluster to move the node to. :type c_f: int .. py:method:: remove_empty_clusters() Remove empty clusters from `cluster_list` and `node_to_cluster_dict` and reindex clusters. This method removes any empty clusters from the `cluster_list` and updates the `node_to_cluster_dict` accordingly. It also reindexes the clusters to ensure that the indices are consecutive. .. py:attribute:: num_nodes .. py:class:: SparseClustering(p1=None, p2=None, T=None, S=None, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, rnd_seed=None) Bases: :py:obj:`Clustering` Symmetric Clustering using sparse matrices. TODO: fix formatting; complete if needed Finds the best partition that optimizes the stability between two times defined as the trace of the clustered autocovariance matrix. At least `T` must be given to initialize the clustering. Clusters can either be initilized with a cluster_list or a node_to_cluster_dict. :param T: NxN transition matrix, T[i,j] is the probability of going from node i to node j between t1 and t2. :type T: scipy csr_sparse matrix :param p1: Nx1 probability density at t1. Default is the uniform probability. :type p1: numpy.ndarrays :param p2: Nx1 probability density at t2. Default is p1 @ T. :type p2: numpy.ndarrays :param S: NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2). :type S: SparseAutocovMat :param cluster_list: list of set of nodes describing the partition. Default is singleton clusters. :type cluster_list: list :param node_to_cluster_dict: dictionary with mapping between nodes and cluster number. Default is singleton clusters. :type node_to_cluster_dict: dict :param rnd_state: Random state object. Default creates a new one. :type rnd_state: np.random.RandomState :param rnd_seed: Seed for the random object. Default is a random seed. :type rnd_seed: int :param S_threshold: Smallest values of S. Used to trim insignificantly small values. :type S_threshold: float TODO: adding docstring .. py:method:: compute_stability(R=None) Returns the stability of the clusters given in `cluster_list` computed between times `t1` and `t2` Here, for symmetric clustering, we only care about the diagonal of R. .. py:attribute:: p1 :value: None .. py:attribute:: p2 :value: None .. py:attribute:: partition .. py:attribute:: source_part .. py:attribute:: target_part :value: None .. py:function:: avg_norm_var_information(clusters_lists_list, num_samples=None) Returns the average normalized variation of information for the clusters_list in clusters_lists_list. By default uses all N*(N-1)/2 combinations possible of cluster_list pairs, where N = len(clusters_lists_list) or uses num_samples pairs if num_samples < N*(N-1)/2. .. py:function:: jaccard_distance(clusters1, clusters2) Returns the Jaccard distance between two clustering. inputs can be node_to_cluster dictionaries or cluster lists of node sets .. py:function:: n_random_seeds(n) .. py:function:: norm_mutual_information(clusters1, clusters2) Normalized mutial information between two non-overlapping clustering. The mutual information is normalized by the max of the two individual entropies. .. math:: NMI = (H(C1)+H(C2)-H(C1,C2))/max(H(C1),H(C2)) inputs can be node_to_cluster dictionaries, cluster lists of node sets or instances of Partition. .. py:function:: norm_var_information(clusters1, clusters2, N=None, use_clust_list=False) Normalized variation of information between two non-overlapping clustering. .. math:: \hat{V}(C_1,C_2) = ({H(C1|C2)+H(C2|C1)})/{log_2 N} inputs can be node_to_cluster dictionaries, cluster lists of node sets or instances of Partition. .. py:function:: run_multi_louvain(clustering, num_repeat, **kwargs) Helper function to run multiple (serial) instances of the Louvain algorithm for the clustering given in `clutering`. :param clustering: :type clustering: Clustering or SparseClustering instance :param num_repeat: Number of repetition of the algorithm. :type num_repeat: int :param \*\*kwargs: Arguments passed when initializing the clusterings. :type \*\*kwargs: key word args :returns: * **n_loops** (*list*) -- list with the number of louvain loop for each repetition. * **cluster_lists** (*list*) -- list with the cluster lists of each repetition.. * **stabilities** (*list*) -- list with the value of the stability for each repetition. * **seeds** (*list*) -- list with the random seeds used in each repetition. .. py:function:: sort_clusters(cluster_list_to_sort, cluster_list_model, thresh_ratio=0.3) Quick heuristic to sort a list of clusters in order to closely match another list .. py:function:: static_clustering(A, t=1, rnd_seed=None, discrete_time_rw=False, linearized=False, directed=False) Initializes a clustering instance to optimize the continuous time Markov stability from the graph given by the adjacency matrix `A`. :param A: Adjacency matrix. :type A: scipy sparse csr matrix or numpy ndarray :param t: Markov time (resolution parameter). The default is 1. :type t: int or float, optional :param rnd_seed: The default is None. :type rnd_seed: int, optional :param discrete_time_rw: If true, powers of the transition matrix to the power of `t` are used for varying resolution. `t` must be an int. The default is False. :type discrete_time_rw: bool, optional :param linearized: If true, the linearized version of the Markov Stability is used. if linearized=false and discrete_time_rw==false, the matrix exponential of the random walk Laplacian is used to compute the transition matrix (slow). The default is False. :type linearized: bool, optional :param directed: If true, the network must be strongly connected. The default is False. :type directed: bool, optional :raises TypeError: 'A must be numpy array or scipy csr.'. :raises ValueError: 'sum of degrees equal 0'. :returns: Clustering instance initialized for Markov Stability optimization. :rtype: Instance of `Clustering' or 'SparseClustering' .. py:data:: logger