pandas: Group by/Aggregate using aggregate function#
import pandas as pd
Loading data into Pandas DataFrame#
The dataset is from Kaggle - Pokemon.
data = pd.read_csv('data/Pokemon.csv')
data
Aggregate functions like mean()
, sum()
, min()
, and max()
in pandas only work with numerical data. Non-numeric columns (such as strings or booleans) cannot be averaged or summed, so you must select only numeric columns when using these functions to avoid errors and get meaningful results.
Group by 1 column#
# Select only numeric columns for aggregation
numeric_cols = data.select_dtypes(include='number').columns
data.groupby(['Type 1'])[numeric_cols].mean().sort_values('Type 1').head(10)
Group by hierarchical columns#
data.groupby(['Type 1', 'Type 2'])[numeric_cols].count().head()