pandas: Build DataFrame from Multiple Files (Row-wise)#

Package Import#

import pandas as pd
import numpy as np

The dataset used in this notebook is from Kaggle - Pokemon.

data = pd.read_csv('data/Pokemon.csv')

data

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
4	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False
...	...	...	...	...	...	...	...	...	...	...	...	...	...
795	719	Diancie	Rock	Fairy	600	50	100	150	100	150	50	6	True
796	719	DiancieMega Diancie	Rock	Fairy	700	50	160	110	160	110	110	6	True
797	720	HoopaHoopa Confined	Psychic	Ghost	600	80	110	60	150	130	70	6	True
798	720	HoopaHoopa Unbound	Psychic	Dark	680	80	160	60	170	130	80	6	True
799	721	Volcanion	Fire	Water	600	80	110	120	130	90	70	6	True

800 rows × 13 columns

Suppose we want to join Pokemon_0.csv and Pokemon_1.csv row-wise:

df0 = pd.read_csv('data/Pokemon_0.csv')
df1 = pd.read_csv('data/Pokemon_1.csv')
df = pd.concat([df0, df1])
df.head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
4	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False

But here, we’ll see some unexpected indices, as they are not consecutive (reset at the first row of 2nd dataframe):

df.iloc[95:105]

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
95	88	Grimer	Poison	NaN	325	80	80	50	40	50	25	1	False
96	89	Muk	Poison	NaN	500	105	105	75	65	100	50	1	False
97	90	Shellder	Water	NaN	305	30	65	100	45	25	40	1	False
98	91	Cloyster	Water	Ice	525	50	95	180	85	45	70	1	False
99	92	Gastly	Ghost	Poison	310	30	35	30	100	35	80	1	False
0	93	Haunter	Ghost	Poison	405	45	50	45	115	55	95	1	False
1	94	Gengar	Ghost	Poison	500	60	65	60	130	75	110	1	False
2	94	GengarMega Gengar	Ghost	Poison	600	60	65	80	170	95	130	1	False
3	95	Onix	Rock	Ground	385	35	45	160	30	45	70	1	False
4	96	Drowzee	Psychic	NaN	328	60	48	45	43	90	42	1	False

And, we just use ignore_index=True to workaround:

df = pd.concat([df0, df1], ignore_index=True)
df.iloc[95:105]

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
95	88	Grimer	Poison	NaN	325	80	80	50	40	50	25	1	False
96	89	Muk	Poison	NaN	500	105	105	75	65	100	50	1	False
97	90	Shellder	Water	NaN	305	30	65	100	45	25	40	1	False
98	91	Cloyster	Water	Ice	525	50	95	180	85	45	70	1	False
99	92	Gastly	Ghost	Poison	310	30	35	30	100	35	80	1	False
100	93	Haunter	Ghost	Poison	405	45	50	45	115	55	95	1	False
101	94	Gengar	Ghost	Poison	500	60	65	60	130	75	110	1	False
102	94	GengarMega Gengar	Ghost	Poison	600	60	65	80	170	95	130	1	False
103	95	Onix	Rock	Ground	385	35	45	160	30	45	70	1	False
104	96	Drowzee	Psychic	NaN	328	60	48	45	43	90	42	1	False