pandas: Convert Continuous Data to Categorical Data#

Package Import#

import pandas as pd
import numpy as np

Dataset Import#

The dataset used in this notebook is from Kaggle - Pokemon.

data = pd.read_csv('data/Pokemon.csv')
data

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

800 rows × 13 columns

Convert continuous data to categorical data#

What if we want Attack to be categorized (< 50: ‘weak’, 50-100: ‘normal’, 100-150: ‘strong’, >150: ‘nani?!’)

Use pd.cut(<column>, <bin>, <labels>) to convert continuous data to categorical data. Here, we convert ‘Attack’ into 4 categories: ‘Weak’, ‘Normal’, ‘Strong’, ‘nani?!’.

df = data.copy()
df['Attack'] = pd.cut(df['Attack'], bins=[0, 50, 100, 150, 200], labels=['Weak', 'Normal', 'Strong', 'nani?!'])
df

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 Weak 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 Normal 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 Normal 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 Normal 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 Normal 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 Normal 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 700 50 nani?! 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 600 80 Strong 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 nani?! 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 Strong 120 130 90 70 6 True

800 rows × 13 columns