pandas: Reshape MultiIndexed Series#

Package Import#

import pandas as pd
import numpy as np

Dataset Import#

The dataset used in this notebook is from Kaggle - Pokemon.

data = pd.read_csv('data/Pokemon.csv')
data

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

800 rows × 13 columns

Reshape a MultiIndexed Series#

data.groupby(['Type 1', 'Legendary'])['#'].count()

Hide code cell output

Type 1    Legendary
Bug       False         69
Dark      False         29
          True           2
Dragon    False         20
          True          12
Electric  False         40
          True           4
Fairy     False         16
          True           1
Fighting  False         27
Fire      False         47
          True           5
Flying    False          2
          True           2
Ghost     False         30
          True           2
Grass     False         67
          True           3
Ground    False         28
          True           4
Ice       False         22
          True           2
Normal    False         96
          True           2
Poison    False         28
Psychic   False         43
          True          14
Rock      False         40
          True           4
Steel     False         23
          True           4
Water     False        108
          True           4
Name: #, dtype: int64

What if we want to convert the above Series into a DataFrame, with Type 1 as rows, Legendary as columns, and the counts as values?

  • Use unstack() after groupby() and count():

data.groupby(['Type 1', 'Legendary'])['#'].count().unstack()

Hide code cell output

Legendary False True
Type 1
Bug 69.0 NaN
Dark 29.0 2.0
Dragon 20.0 12.0
Electric 40.0 4.0
Fairy 16.0 1.0
Fighting 27.0 NaN
Fire 47.0 5.0
Flying 2.0 2.0
Ghost 30.0 2.0
Grass 67.0 3.0
Ground 28.0 4.0
Ice 22.0 2.0
Normal 96.0 2.0
Poison 28.0 NaN
Psychic 43.0 14.0
Rock 40.0 4.0
Steel 23.0 4.0
Water 108.0 4.0

But there’s a better way to do this by using pivot_table(), which can do more things like aggregation, filling missing values, and margins, which unstack() cannot do. See this article for more details.