pandas: Build DataFrame from Multiple Files (Row-wise)#
Package Import#
import pandas as pd
import numpy as np
Dataset Import#
The dataset used in this notebook is from Kaggle - Pokemon.
data = pd.read_csv('data/Pokemon.csv')
data
Show code cell output
Hide code cell output
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
795 | 719 | Diancie | Rock | Fairy | 600 | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 700 | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 600 | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 680 | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True |
799 | 721 | Volcanion | Fire | Water | 600 | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
800 rows × 13 columns
Build a DataFrame from multiple files (row-wise)#
Suppose we want to join Pokemon_0.csv
and Pokemon_1.csv
row-wise:
Use
pd.concat
and pass in the list of dataframes we want to join
df0 = pd.read_csv('data/Pokemon_0.csv')
df1 = pd.read_csv('data/Pokemon_1.csv')
df = pd.concat([df0, df1])
df.head()
Show code cell output
Hide code cell output
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
But here, we’ll see some unexpected indices, as they are not consecutive (reset at the first row of 2nd dataframe):
df.iloc[95:105]
Show code cell output
Hide code cell output
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
95 | 88 | Grimer | Poison | NaN | 325 | 80 | 80 | 50 | 40 | 50 | 25 | 1 | False |
96 | 89 | Muk | Poison | NaN | 500 | 105 | 105 | 75 | 65 | 100 | 50 | 1 | False |
97 | 90 | Shellder | Water | NaN | 305 | 30 | 65 | 100 | 45 | 25 | 40 | 1 | False |
98 | 91 | Cloyster | Water | Ice | 525 | 50 | 95 | 180 | 85 | 45 | 70 | 1 | False |
99 | 92 | Gastly | Ghost | Poison | 310 | 30 | 35 | 30 | 100 | 35 | 80 | 1 | False |
0 | 93 | Haunter | Ghost | Poison | 405 | 45 | 50 | 45 | 115 | 55 | 95 | 1 | False |
1 | 94 | Gengar | Ghost | Poison | 500 | 60 | 65 | 60 | 130 | 75 | 110 | 1 | False |
2 | 94 | GengarMega Gengar | Ghost | Poison | 600 | 60 | 65 | 80 | 170 | 95 | 130 | 1 | False |
3 | 95 | Onix | Rock | Ground | 385 | 35 | 45 | 160 | 30 | 45 | 70 | 1 | False |
4 | 96 | Drowzee | Psychic | NaN | 328 | 60 | 48 | 45 | 43 | 90 | 42 | 1 | False |
And, we just use ignore_index=True
to workaround:
df = pd.concat([df0, df1], ignore_index=True)
df.iloc[95:105]
Show code cell output
Hide code cell output
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
95 | 88 | Grimer | Poison | NaN | 325 | 80 | 80 | 50 | 40 | 50 | 25 | 1 | False |
96 | 89 | Muk | Poison | NaN | 500 | 105 | 105 | 75 | 65 | 100 | 50 | 1 | False |
97 | 90 | Shellder | Water | NaN | 305 | 30 | 65 | 100 | 45 | 25 | 40 | 1 | False |
98 | 91 | Cloyster | Water | Ice | 525 | 50 | 95 | 180 | 85 | 45 | 70 | 1 | False |
99 | 92 | Gastly | Ghost | Poison | 310 | 30 | 35 | 30 | 100 | 35 | 80 | 1 | False |
100 | 93 | Haunter | Ghost | Poison | 405 | 45 | 50 | 45 | 115 | 55 | 95 | 1 | False |
101 | 94 | Gengar | Ghost | Poison | 500 | 60 | 65 | 60 | 130 | 75 | 110 | 1 | False |
102 | 94 | GengarMega Gengar | Ghost | Poison | 600 | 60 | 65 | 80 | 170 | 95 | 130 | 1 | False |
103 | 95 | Onix | Rock | Ground | 385 | 35 | 45 | 160 | 30 | 45 | 70 | 1 | False |
104 | 96 | Drowzee | Psychic | NaN | 328 | 60 | 48 | 45 | 43 | 90 | 42 | 1 | False |