This snippet transforms a tall Pandas DataFrame with time-series data into a Numpy array while preserving the grouping. This is a common use case for me when preparing training data for recurrent neural networks, where each training sample belongs to a group (
EventID below), feature values (
FeatureValue) are orded by time (
DateTime), and I want to get the length of each sample (needed to train an RNN with variable length sequences).
event_col = 'EventID' time_col = 'DateTime' value_col = 'FeatureValue' xt = df.loc[:,[time_col, value_col]].values g = df.reset_index(drop=True).groupby(event_col) xtg = [xt[i.values,:] for k,i in g.groups.items()] SignalLengths = [len(i.values) for k,i in g.groups.items()] X_signal = np.array(xtg) EventIDs = list(g.groups.keys())