From your expected result, it appears that each “group” is based on contiguous id
values. For this, you can use the compare-cumsum-groupby pattern, and then use agg
to get the min
and max
values.
# Sample data.
df = pd.DataFrame(
{'id': [1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1],
'A': [0.0, 6.4, 27.0, 27.1, 27.4, 27.7, 30.6, 31.0, 36.6, 36.9, 37.1, 37.1, 37.3, 37.8, 38.9, 39.2, 39.3, 39.5, 39.7, 42.6, 42.6, 42.8, 44.9, 45.6, 51.0],
'B': [6.4, 27.0, 27.1, 27.4, 27.7, 30.6, 31.0, 36.6, 36.9, 37.1, 37.1, 37.3, 37.8, 38.9, 39.2, 39.3, 39.5, 39.7, 42.6, 42.6, 42.8, 44.9, 45.6, 51.0, 51.8]}
)
# Solution.
>>> (df
.groupby(df['id'].ne(df['id'].shift()).cumsum())
.agg({'id': 'first', 'A': 'min', 'B': 'max'})
.set_index('id'))
A B
id
1 0.0 6.4
2 6.4 30.6
1 30.6 39.7
2 39.7 42.8
1 42.8 45.6
2 45.6 51.0
1 51.0 51.8
It seems data are not numeric, so first change them to numeric and then use GroupBy.agg
with named aggregation working in pandas 0.25+:
df[['b','c']] = df[['b','c']].astype(float)
df1 = (df.groupby('a').agg(min_b=('b','min'),
max_b=('b','max'),
min_c=('c','min'),
max_c=('c','max'))
.reset_index())
print (df1)
a min_b max_b min_c max_c
0 1 0.0 51.0 6.4 51.8
1 2 6.4 45.6 27.0 51.0
Setup:
a b c
0 1 0.0 6.400000000
1 2 6.400000000 27.0
2 2 27.0 27.100000000
3 2 27.100000000 27.400000000
4 2 27.400000000 27.700000000
...
...
...
If need each groups separately:
df[['b','c']] = df[['b','c']].astype(float)
df1 = (df.groupby(df['a'].ne(df['a'].shift()).cumsum())
.agg(a=('a','first'),
min_b=('b','min'),
max_b=('b','max'),
min_c=('c','min'),
max_c=('c','max'))
.reset_index(drop=True))
print (df1)
a min_b max_b min_c max_c
0 1 0.0 0.0 6.4 6.4
1 2 6.4 27.7 27.0 30.6
2 1 30.6 39.5 31.0 39.7
3 2 39.7 42.6 42.6 42.8
4 1 42.8 44.9 44.9 45.6
5 2 45.6 45.6 51.0 51.0
6 1 0.0 51.0 6.4 51.8
7 2 6.4 6.4 30.6 30.6
8 1 30.6 30.6 39.5 39.5