pandas: Group by/Aggregate using aggregate function#

import pandas as pd

Loading data into Pandas DataFrame#

The dataset is from Kaggle - Pokemon.

data = pd.read_csv('data/Pokemon.csv')
data

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

800 rows × 13 columns

Aggregate functions like mean(), sum(), min(), and max() in pandas only work with numerical data. Non-numeric columns (such as strings or booleans) cannot be averaged or summed, so you must select only numeric columns when using these functions to avoid errors and get meaningful results.

Group by 1 column#

# Select only numeric columns for aggregation
numeric_cols = data.select_dtypes(include='number').columns
data.groupby(['Type 1'])[numeric_cols].mean().sort_values('Type 1').head(10)

Hide code cell output

# Total HP Attack Defense Sp. Atk Sp. Def Speed Generation
Type 1
Bug 334.492754 378.927536 56.884058 70.971014 70.724638 53.869565 64.797101 61.681159 3.217391
Dark 461.354839 445.741935 66.806452 88.387097 70.225806 74.645161 69.516129 76.161290 4.032258
Dragon 474.375000 550.531250 83.312500 112.125000 86.375000 96.843750 88.843750 83.031250 3.875000
Electric 363.500000 443.409091 59.795455 69.090909 66.295455 90.022727 73.704545 84.500000 3.272727
Fairy 449.529412 413.176471 74.117647 61.529412 65.705882 78.529412 84.705882 48.588235 4.117647
Fighting 363.851852 416.444444 69.851852 96.777778 65.925926 53.111111 64.703704 66.074074 3.370370
Fire 327.403846 458.076923 69.903846 84.769231 67.769231 88.980769 72.211538 74.442308 3.211538
Flying 677.750000 485.000000 70.750000 78.750000 66.250000 94.250000 72.500000 102.500000 5.500000
Ghost 486.500000 439.562500 64.437500 73.781250 81.187500 79.343750 76.468750 64.343750 4.187500
Grass 344.871429 421.142857 67.271429 73.214286 70.800000 77.500000 70.428571 61.928571 3.357143

Group by hierarchical columns#

data.groupby(['Type 1', 'Type 2'])[numeric_cols].count().head()

Hide code cell output

# Total HP Attack Defense Sp. Atk Sp. Def Speed Generation
Type 1 Type 2
Bug Electric 2 2 2 2 2 2 2 2 2
Fighting 2 2 2 2 2 2 2 2 2
Fire 2 2 2 2 2 2 2 2 2
Flying 14 14 14 14 14 14 14 14 14
Ghost 1 1 1 1 1 1 1 1 1