Group-by and count in Numpy
The crosstab
function takes a list of
array-like objects and returns a contingency table of
counts. A pure numpy implementation of a pivot-table
like this is useful in environments where we don’t want
to import the pandas package.
from typing import Tuple, List
import numpy as np
def crosstab(*args) -> Tuple[Tuple[np.ndarray], np.ndarray]:
"""
Contingency table of counts.
Parameters
----------
args : list of array-like
Arrays of discrete categorical data.
Returns
-------
actual_levels : Tuple[np.ndarray]
The actual levels of the categorical variables.
count : np.ndarray
The counts of the categorical variables cross-tabulated.
Examples
--------
```python
categorical = [1,3,2,3]
covariate = [5,3,3,4]
levels, count = crosstab(categorical, covariate)
```
"""
= zip(*[np.unique(a, return_inverse=True) for a in args])
levels, indices = np.zeros(list(map(len, levels)), dtype=int)
count 1)
np.add.at(count, indices, return levels, count