pandas: Convert Strings to Numbers#

Package Import#

import pandas as pd
import numpy as np

Dataset Import#

The dataset used in this notebook is from Kaggle - Pokemon.

data = pd.read_csv('data/Pokemon.csv')
data

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

800 rows × 13 columns

Convert strings to numbers#

df = pd.DataFrame({'col1': ['1.1', '2.2', '3.3'], 'col2': ['4.4', '5.5', '6.6'], 'col3': ['7.7', '8.8', '-']})
df, df.dtypes

Hide code cell output

(  col1 col2 col3
 0  1.1  4.4  7.7
 1  2.2  5.5  8.8
 2  3.3  6.6    -,
 col1    object
 col2    object
 col3    object
 dtype: object)

df.astype() can convert multiple columns at once. Use errors='ignore' to skip conversion errors.

df.astype({'col1': 'float', 'col2': 'float'}, errors='raise').dtypes

Hide code cell output

col1    float64
col2    float64
col3     object
dtype: object
df.astype({'col1': 'float', 'col2': 'float', 'col3': 'float'}, errors='ignore').dtypes

Hide code cell output

col1    float64
col2    float64
col3     object
dtype: object

A better way to convert strings to numbers is to use pd.to_numeric() with errors='coerce' to convert invalid parsing to NaN.

pd.to_numeric(df.col3, errors='coerce')

Hide code cell output

0    7.7
1    8.8
2    NaN
Name: col3, dtype: float64
pd.to_numeric(df.col3, errors='coerce').fillna(0)

Hide code cell output

0    7.7
1    8.8
2    0.0
Name: col3, dtype: float64
df = df.apply(pd.to_numeric, errors='coerce').fillna(0)
df, df.dtypes

Hide code cell output

(   col1  col2  col3
 0   1.1   4.4   7.7
 1   2.2   5.5   8.8
 2   3.3   6.6   0.0,
 col1    float64
 col2    float64
 col3    float64
 dtype: object)