# pandas: Create DataFrame

## Package Import

In [1]:
import pandas as pd
import numpy as np

## Dataset Import

The dataset used in this notebook is from [Kaggle - Pokemon](https://www.kaggle.com/datasets/abcsds/pokemon).

In [2]:
data = pd.read_csv('data/Pokemon.csv')

In [3]:
data

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


## Manually Create a DataFrame

From a Dictionary\
The columns order is the order of keys insertion:

In [4]:
df = pd.DataFrame({'Column 1': [100,200], 'Column 2': [300,400]})
df

Unnamed: 0,Column 1,Column 2
0,100,300
1,200,400


From a list of random values w/ column names:

In [5]:
pd.DataFrame(np.random.rand(4, 8), columns=list('abcdefgh'))

Unnamed: 0,a,b,c,d,e,f,g,h
0,0.155808,0.426032,0.82806,0.22079,0.990389,0.74713,0.048883,0.967204
1,0.459662,0.909124,0.942143,0.585818,0.165209,0.396012,0.585996,0.01202
2,0.451281,0.454347,0.902485,0.801921,0.007451,0.954774,0.510324,0.733164
3,0.027769,0.490229,0.210495,0.879985,0.3704,0.412179,0.689901,0.27735


From a dictionary including Series:

In [6]:
pd.DataFrame({'col1': [0,1,2,3], 'col2': pd.Series([2,3], index=[2,3])}, index=[0,1,2,3])

Unnamed: 0,col1,col2
0,0,
1,1,
2,2,2.0
3,3,3.0


From numpy ndarray:

In [7]:
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


From a numpy ndarray that has labeled columns:

In [8]:
d = np.array([(1,2,3), (4,5,6), (7,8,9)], dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
df = pd.DataFrame(data=d, columns=['c', 'a'])
df

Unnamed: 0,c,a
0,3,1
1,6,4
2,9,7


From Series/DataFrame:

In [9]:
ser = pd.Series([1,2,3], index=['a','b','c'])
df = pd.DataFrame(data=ser, index=['c', 'a'], columns=['hehe'])
df

Unnamed: 0,hehe
c,3
a,1


If we construct from DataFrame, then the columns in the new DataFrame must be a subset of the original columns. If not, the new columns will be filled with NaN.

In [10]:
df1 = pd.DataFrame([1,2,3], index=['a','b','c'], columns=['x'])
df2 = pd.DataFrame(data=df1, index=['c', 'a'])
df3 = pd.DataFrame(data=df1, index=['c', 'a'], columns=['z'])
print(df2, '\n',df3)

   x
c  3
a  1 
     z
c NaN
a NaN
