Transform Grouped Pandas DataFrame to Numpy Array
This snippet transforms a tall Pandas DataFrame with
time-series data into a Numpy array while preserving the
grouping. This is a common use case for me when
preparing training data for recurrent neural networks,
where each training sample belongs to a group
(EventID
below), feature values
(FeatureValue
) are orded by time
(DateTime
), and I want to get the length of
each sample (needed to train an RNN with variable length
sequences).
EventID | DateTime | FeatureValue |
---|---|---|
1 | 0 | 80 |
1 | 5 | 90 |
2 | 0 | 75 |
2 | 10 | 80 |
= 'EventID'
event_col = 'DateTime'
time_col = 'FeatureValue'
value_col = df.loc[:,[time_col, value_col]].values
xt = df.reset_index(drop=True).groupby(event_col)
g = [xt[i.values,:] for k,i in g.groups.items()]
xtg = [len(i.values) for k,i in g.groups.items()]
SignalLengths = np.array(xtg)
X_signal = list(g.groups.keys()) EventIDs