Asif Rahman

Group-by and count in Numpy

Posted on 2022-01-26

The crosstab function takes a list of array-like objects and returns a contingency table of counts. A pure numpy implementation of a pivot-table like this is useful in environments where we don't want to import the pandas package.

from typing import Tuple, List
import numpy as np


def crosstab(*args) -> Tuple[Tuple[np.ndarray], np.ndarray]:
    """
    Contingency table of counts.

    Parameters
    ----------
    args : list of array-like
        Arrays of discrete categorical data.

    Returns
    -------
    actual_levels : Tuple[np.ndarray]
        The actual levels of the categorical variables.
    count : np.ndarray
        The counts of the categorical variables cross-tabulated.

    Examples
    --------
    ```python
    categorical = [1,3,2,3]
    covariate = [5,3,3,4]
    levels, count = crosstab(categorical, covariate)
    ```
    """
    levels, indices = zip(*[np.unique(a, return_inverse=True) for a in args])
    count = np.zeros(list(map(len, levels)), dtype=int)
    np.add.at(count, indices, 1)
    return levels, count