Asif Rahman

Group-by and count in Numpy

Posted on 2022-01-26

The crosstab function takes a list of array-like objects and returns a contingency table of counts. A pure numpy implementation of a pivot-table like this is useful in environments where we don't want to import the pandas package.

from typing import Tuple, List
import numpy as np

def crosstab(*args) -> Tuple[Tuple[np.ndarray], np.ndarray]:
    Contingency table of counts.

    args : list of array-like
        Arrays of discrete categorical data.

    actual_levels : Tuple[np.ndarray]
        The actual levels of the categorical variables.
    count : np.ndarray
        The counts of the categorical variables cross-tabulated.

    categorical = [1,3,2,3]
    covariate = [5,3,3,4]
    levels, count = crosstab(categorical, covariate)
    levels, indices = zip(*[np.unique(a, return_inverse=True) for a in args])
    count = np.zeros(list(map(len, levels)), dtype=int), indices, 1)
    return levels, count