flowstab.network_clustering
===========================

.. py:module:: flowstab.network_clustering

.. autoapi-nested-parse::

   Module for network clustering and related operations

   Copyright (C) 2021 Alexandre Bovet <alexandre.bovet@maths.ox.ac.uk>


Attributes
----------

.. autoapisummary::

   flowstab.network_clustering.logger


Classes
-------

.. autoapisummary::

   flowstab.network_clustering.BaseClustering
   flowstab.network_clustering.Clustering
   flowstab.network_clustering.FlowIntegralClustering
   flowstab.network_clustering.Partition
   flowstab.network_clustering.SparseClustering


Functions
---------

.. autoapisummary::

   flowstab.network_clustering.avg_norm_var_information
   flowstab.network_clustering.jaccard_distance
   flowstab.network_clustering.n_random_seeds
   flowstab.network_clustering.norm_mutual_information
   flowstab.network_clustering.norm_var_information
   flowstab.network_clustering.run_multi_louvain
   flowstab.network_clustering.sort_clusters
   flowstab.network_clustering.static_clustering


Module Contents
---------------

.. py:class:: BaseClustering(T=None, p1=None, p2=None, S=None, source_cluster_list=None, source_node_to_cluster_dict=None, target_cluster_list=None, target_node_to_cluster_dict=None, rnd_state=None, rnd_seed=None, S_threshold=None)

   Base class for autocovariance matrix clustering with the Louvain algorithm.

   This class provides a base implementation for clustering autocovariance
   matrices using the Louvain algorithm.
   It can be used to cluster nodes in a network based on their autocovariance
   matrices.


   Initialize a BaseClustering object.

   :param T: NxN transition matrix, T[i,j] is the probability of going from
             node i to node j between t1 and t2.
   :type T: numpy.ndarray
   :param p1: Nx1 probability density at t1. Default is the uniform probability.
   :type p1: numpy.ndarray
   :param p2: Nx1 probability density at t2. Default is p1 @ T.
   :type p2: numpy.ndarray
   :param S: NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2).
   :type S: numpy.ndarray
   :param source_cluster_list: List of sets of nodes describing the source partition. Default is
                               singleton clusters.
   :type source_cluster_list: list
   :param source_node_to_cluster_dict: Dictionary with mapping between nodes and source cluster number.
                                       Default is singleton clusters.
   :type source_node_to_cluster_dict: dict
   :param target_cluster_list: List of sets of nodes describing the target partition.
                               Default is singleton clusters.
   :type target_cluster_list: list
   :param target_node_to_cluster_dict: Dictionary with mapping between nodes and target cluster number.
                                       Default is singleton clusters.
   :type target_node_to_cluster_dict: dict
   :param rnd_state: Random state object. Default creates a new one.
   :type rnd_state: np.random.RandomState
   :param rnd_seed: Seed for the random object. Default is a random seed.
   :type rnd_seed: int
   :param S_threshold: Smallest values of S. Used to trim insignificantly small values.
   :type S_threshold: float
   :param .. note::: At least `T` or `S` must be provided to initialize the clustering.
   :param .. note::: Clusters can either be initialized with a cluster_list or a
                     node_to_cluster_dict.


   .. py:method:: compute_stability(R=None)
      :abstractmethod:


      Compute the stability of the clustering.

      :param R: The clustered autocovariance matrix.
      :type R: numpy.ndarray

      :returns: **stability** -- The stability of the clustering.
      :rtype: float


   .. py:method:: find_louvain_clustering(delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, print_num_loops=False)

      Returns n_meta_loop

      `rnd_seed` is used to set to state of the random number generator.
      Default is to keep the current state


   .. py:attribute:: S_threshold
      :value: None


   .. py:attribute:: p1
      :value: None


   .. py:attribute:: p2
      :value: None


   .. py:attribute:: source_part


   .. py:attribute:: target_part


.. py:class:: Clustering(p1=None, p2=None, T=None, S=None, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, rnd_seed=None, S_threshold=None)

   Bases: :py:obj:`BaseClustering`


   Symmetric Clustering.

   Finds the best partition that optimizes the stability between two times
   defined as the trace of the clustered autocovariance matrix.

   At least `T` must be given to initialize the clustering.

   Clusters can either be initilized with a cluster_list or a
   node_to_cluster_dict.

   :param T: NxN transition matrix, T[i,j] is the probability of going from
             node i to node j between t1 and t2.
   :type T: numpy.ndarrays
   :param p1: Nx1 probability density at t1. Default is the uniform probability.
   :type p1: numpy.ndarrays
   :param p2: Nx1 probability density at t2. Default is p1 @ T.
   :type p2: numpy.ndarrays
   :param S: NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2).
   :type S: numpy.ndarrays
   :param cluster_list: list of set of nodes describing the partition. Default is singleton
                        clusters.
   :type cluster_list: list
   :param node_to_cluster_dict: dictionary with mapping between nodes and cluster number.
                                Default is singleton clusters.
   :type node_to_cluster_dict: dict
   :param rnd_state: Random state object. Default creates a new one.
   :type rnd_state: np.random.RandomState
   :param rnd_seed: Seed for the random object. Default is a random seed.
   :type rnd_seed: int
   :param S_threshold: Smallest values of S. Used to trim insignificantly small values.
   :type S_threshold: float


   .. py:method:: compute_stability(R=None)

      Returns the stability of the clusters given in `cluster_list`
      computed between times `t1` and `t2`

      Here, for symmetric clustering, we only care about the diagonal
      of R.


   .. py:method:: find_louvain_clustering(delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, print_num_loops=False)

      Returns n_meta_loop

      `rnd_seed` is used to set to state of the random number generator.
      Default is to keep the current state


   .. py:attribute:: partition


   .. py:attribute:: target_part
      :value: None


.. py:class:: FlowIntegralClustering(T_list=None, p1=None, time_list=None, T_inter_list=None, integral_time_grid=None, reverse_time=False, rtol=1e-08, verbose=False)

   FlowIntegralClustering.

   Class to finds the best partition that optimizes the integral of the
   autocovariance between two times computed as

   int_t1^t2 [diag(p1) @ T(t1,t) @ np.diag(1/pt) @ T(t1,t).T @ np.diag(p1) - outer(p1,p1)] dt

   `T_list` or `T_inter_list` must be given to initialize the clustering.

   Clusters can either be initilized with a cluster_list or a node_to_cluster_dict.


   :param T_list: list of K succesive NxN transition matrices, Tk[i,j] is the probability of
                  going from node i to node j between `t1` and `tk+1`.
   :type T_list: list of scipy.sparse.csr matrices or numpy.ndarray
   :param T_inter_list: list of K succesive NxN inter event transition matrices,
                        T_inter_k[i,j] is the probability of
                        going from node i to node j between `tk` and `tk+1`.
   :type T_inter_list: list of scipy.sparse.csr matrices or numpy.ndarray
   :param p1: Nx1 probability density at t1. Default is the uniform probability.
   :type p1: numpy.ndarrays
   :param time_list: list of K+1 time instants corresponding to the `tk` of the transition
                     matrices. Default is unit times.
   :type time_list: list or numpy.array
   :param integral_time_grid: list of times until which to compute the integral. The final times and
                              indices used are stored in _t_integral_grid and _k_integral_grid.
                              Default is [time_list[0], time_list[-1]]
   :type integral_time_grid: list
   :param reverse_time: Whether to reverse time when computing T_list and I_list from T_inter_list,
                        for backward flow stability. Default is False (forward flow stability).
                        If T_list is provided in input, it must have been computed with the
                        corresponding time direction.
   :type reverse_time: bool
   :param rtol: Relative tolerance to set I values to zero.
                Values smaller than I.max()*rtol are set to zero to
                keep I sparse.
   :type rtol: float


   .. py:method:: find_louvain_clustering(k=0, delta_r_threshold=np.finfo(float).eps, n_meta_iter_max=1000, n_sub_iter_max=1000, verbose=False, rnd_seed=None, save_progress=False, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, S_threshold=None)

      Louvain algorithm to find the best partition.

      The best partition is saved in `self.partition[k]


      :param k: Index of the covariance integral to cluster. self.I_list[k] will be used.
      :type k: int
      :param delta_r_threshold: Minimal value for an improvement of the quality function. The default is np.finfo(float).eps.
      :type delta_r_threshold: float, optional
      :param n_meta_iter_max: Maximum number of meta iterations. The default is 1000.
      :type n_meta_iter_max: int, optional
      :param n_sub_iter_max: Maximum number of sub iterations. The default is 1000.
      :type n_sub_iter_max: int, optional
      :param verbose: Degree of verbosity. The default is False.
      :type verbose: bool, int, optional
      :param rnd_seed: Seed for the random object. Default is a random seed.
      :type rnd_seed: int
      :param save_progress: Whether to save the progress in the Clustering.partition_progress
                            and Clustering.stability_progress. The default is False.
      :type save_progress: bool, optional
      :param cluster_list: list of set of nodes describing the partition. Default is singleton
                           clusters.
      :type cluster_list: list of sets, optional
      :param node_to_cluster_dict: dictionary with mapping between nodes and cluster number. Default is singleton
                                   clusters.
      :type node_to_cluster_dict: dict, optional
      :param rnd_state: Random state object. Default creates a new one.
      :type rnd_state: np.random.RandomState
      :param S_threshold: Smallest values of the covariance. Used to trim insignificantly small values. Default is None (no thresholding)
      :type S_threshold: float.

      :returns: **n_loop** -- Number of meta loops of the louvain algorithm.
      :rtype: int


   .. py:attribute:: clustering


   .. py:attribute:: is_nparray
      :value: False


   .. py:attribute:: is_sparse
      :value: False


   .. py:attribute:: is_sparse_stoch
      :value: False


   .. py:attribute:: num_nodes


   .. py:attribute:: p1
      :value: None


   .. py:attribute:: partition


   .. py:attribute:: reversed_time
      :value: False


   .. py:attribute:: time_list


.. py:class:: Partition(num_nodes: int, cluster_list: Sequence[set[int]] | None = None, node_to_cluster_dict: dict | None = None, check_integrity: bool = False)

   Node partition as a list of node sets and a node to cluster dict.

   A Partition object represents a node partition that can be described as
   a list of node sets and a node to cluster dict. It provides methods to
   manipulate and analyze the partition.

   Initialize a Partition object.

   :param num_nodes: Number of nodes in the partition.
   :type num_nodes: int
   :param cluster_list: List of clusters with each cluster being a set of nodes.
   :type cluster_list: Collection | None, optional
   :param node_to_cluster_dict: Mapping that maps each node to the index of the corresponding
                                cluster in `cluster_list`.
   :type node_to_cluster_dict: dict | None, optional
   :param check_integrity: If True, check that all nodes are in only one cluster.
   :type check_integrity: bool, optional
   :param .. note::: If both `cluster_list` and `node_to_cluster_dict` are provided,
                     a ValueError is raised. If neither is provided, the default
                     clustering is one node per cluster.


   .. py:method:: __repr__()

      Return a string representation of the Partition object.

      :returns: A string representation of the Partition object.
      :rtype: str


   .. py:method:: __str__()


   .. py:method:: check_integrity()

      Check the integrity of the partition.

      This method checks that all nodes are in only one cluster, i.e.
      non-overlapping clusters whose union is the full set of nodes.

      :raises ValueError: If the partition is not valid.


   .. py:method:: get_indicator(sparse=True)

      Get the indicator matrix for the partition.

      :param sparse: If True, return a sparse indicator matrix. Default is True.
      :type sparse: bool, optional

      :returns: The indicator matrix for the partition.
      :rtype: ndarray or sparse matrix


   .. py:method:: get_num_clusters(non_empty=False)

      Get the number of clusters in the partition.

      :param non_empty: If True, only count non-empty clusters. Default is False.
      :type non_empty: bool, optional

      :returns: The number of clusters in the partition.
      :rtype: int


   .. py:method:: iter_cluster_node_index()

      Iterate over the node indices for each cluster.

      :Yields: *list* -- The node indices for each cluster.


   .. py:method:: move_node(node, c_f)

      Move a node to cluster `c_f`.

      :param node: The node to move.
      :type node: int
      :param c_f: The index of the cluster to move the node to.
      :type c_f: int


   .. py:method:: remove_empty_clusters()

      Remove empty clusters from `cluster_list` and  `node_to_cluster_dict`
      and reindex clusters.

      This method removes any empty clusters from the `cluster_list` and
      updates the `node_to_cluster_dict` accordingly. It also reindexes the
      clusters to ensure that the indices are consecutive.


   .. py:attribute:: num_nodes


.. py:class:: SparseClustering(p1=None, p2=None, T=None, S=None, cluster_list=None, node_to_cluster_dict=None, rnd_state=None, rnd_seed=None)

   Bases: :py:obj:`Clustering`


   Symmetric Clustering using sparse matrices.

   TODO: fix formatting; complete if needed

   Finds the best partition that optimizes the stability between two times
   defined as the trace of the clustered autocovariance matrix.


   At least `T` must be given to initialize the clustering.

   Clusters can either be initilized with a cluster_list or a node_to_cluster_dict.


   :param T: NxN transition matrix, T[i,j] is the probability of going from node i to
             node j between t1 and t2.
   :type T: scipy csr_sparse matrix
   :param p1: Nx1 probability density at t1. Default is the uniform probability.
   :type p1: numpy.ndarrays
   :param p2: Nx1 probability density at t2. Default is p1 @ T.
   :type p2: numpy.ndarrays
   :param S: NxN covariance matrix. Default is diag(p1) @ T - outer(p1,p2).
   :type S: SparseAutocovMat
   :param cluster_list: list of set of nodes describing the partition. Default is singleton
                        clusters.
   :type cluster_list: list
   :param node_to_cluster_dict: dictionary with mapping between nodes and cluster number. Default is singleton
                                clusters.
   :type node_to_cluster_dict: dict
   :param rnd_state: Random state object. Default creates a new one.
   :type rnd_state: np.random.RandomState
   :param rnd_seed: Seed for the random object. Default is a random seed.
   :type rnd_seed: int
   :param S_threshold: Smallest values of S. Used to trim insignificantly small values.
   :type S_threshold: float

   TODO: adding docstring


   .. py:method:: compute_stability(R=None)

      Returns the stability of the clusters given in `cluster_list`
      computed between times `t1` and `t2`

      Here, for symmetric clustering, we only care about the diagonal
      of R.


   .. py:attribute:: p1
      :value: None


   .. py:attribute:: p2
      :value: None


   .. py:attribute:: partition


   .. py:attribute:: source_part


   .. py:attribute:: target_part
      :value: None


.. py:function:: avg_norm_var_information(clusters_lists_list, num_samples=None)

   Returns the average normalized variation of information for the
   clusters_list in clusters_lists_list.

   By default uses all N*(N-1)/2 combinations possible of cluster_list
   pairs, where N = len(clusters_lists_list)
   or uses num_samples pairs if num_samples < N*(N-1)/2.


.. py:function:: jaccard_distance(clusters1, clusters2)

   Returns the Jaccard distance between two clustering.

   inputs can be node_to_cluster dictionaries or cluster lists of node sets


.. py:function:: n_random_seeds(n)

.. py:function:: norm_mutual_information(clusters1, clusters2)

   Normalized mutial information between two non-overlapping clustering.

   The mutual information is normalized by the max of the two individual
   entropies.

   .. math::
       NMI = (H(C1)+H(C2)-H(C1,C2))/max(H(C1),H(C2))

   inputs can be node_to_cluster dictionaries, cluster lists of node sets
   or instances of Partition.


.. py:function:: norm_var_information(clusters1, clusters2, N=None, use_clust_list=False)

   Normalized variation of information between two non-overlapping clustering.

   .. math::

       \hat{V}(C_1,C_2) = ({H(C1|C2)+H(C2|C1)})/{log_2 N}

   inputs can be node_to_cluster dictionaries, cluster lists of node sets
   or instances of Partition.


.. py:function:: run_multi_louvain(clustering, num_repeat, **kwargs)

   Helper function to run multiple (serial) instances of the Louvain algorithm for the
   clustering given in `clutering`.

   :param clustering:
   :type clustering: Clustering or SparseClustering instance
   :param num_repeat: Number of repetition of the algorithm.
   :type num_repeat: int
   :param \*\*kwargs: Arguments passed when initializing the clusterings.
   :type \*\*kwargs: key word args

   :returns: * **n_loops** (*list*) -- list with the number of louvain loop for each repetition.
             * **cluster_lists** (*list*) -- list with the cluster lists of each repetition..
             * **stabilities** (*list*) -- list with the value of the stability for each repetition.
             * **seeds** (*list*) -- list with the random seeds used in each repetition.


.. py:function:: sort_clusters(cluster_list_to_sort, cluster_list_model, thresh_ratio=0.3)

   Quick heuristic to sort a list of clusters in order to closely match another list


.. py:function:: static_clustering(A, t=1, rnd_seed=None, discrete_time_rw=False, linearized=False, directed=False)

   Initializes a clustering instance to optimize the continuous time
   Markov stability from the graph given by the adjacency matrix `A`.

   :param A: Adjacency matrix.
   :type A: scipy sparse csr matrix or numpy ndarray
   :param t: Markov time (resolution parameter). The default is 1.
   :type t: int or float, optional
   :param rnd_seed: The default is None.
   :type rnd_seed: int, optional
   :param discrete_time_rw: If true, powers of the transition matrix to the power of `t` are used for varying resolution.
                            `t` must be an int. The default is False.
   :type discrete_time_rw: bool, optional
   :param linearized: If true, the linearized version of the Markov Stability is used.
                      if linearized=false and discrete_time_rw==false, the matrix exponential
                      of the random walk Laplacian is used to compute the transition matrix (slow).
                      The default is False.
   :type linearized: bool, optional
   :param directed: If true, the network must be strongly connected. The default is False.
   :type directed: bool, optional

   :raises TypeError: 'A must be numpy array or scipy csr.'.
   :raises ValueError: 'sum of degrees equal 0'.

   :returns: Clustering instance initialized for Markov Stability optimization.
   :rtype: Instance of `Clustering' or 'SparseClustering'


.. py:data:: logger