pandas: Create DataFrame#
Package Import#
import pandas as pd
import numpy as np
Dataset Import#
The dataset used in this notebook is from Kaggle - Pokemon.
data = pd.read_csv('data/Pokemon.csv')
data
Manually Create a DataFrame#
From a Dictionary
The columns order is the order of keys insertion:
df = pd.DataFrame({'Column 1': [100,200], 'Column 2': [300,400]})
df
From a list of random values w/ column names:
pd.DataFrame(np.random.rand(4, 8), columns=list('abcdefgh'))
From a dictionary including Series:
pd.DataFrame({'col1': [0,1,2,3], 'col2': pd.Series([2,3], index=[2,3])}, index=[0,1,2,3])
From numpy ndarray:
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df
From a numpy ndarray that has labeled columns:
d = np.array([(1,2,3), (4,5,6), (7,8,9)], dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
df = pd.DataFrame(data=d, columns=['c', 'a'])
df
From Series/DataFrame:
ser = pd.Series([1,2,3], index=['a','b','c'])
df = pd.DataFrame(data=ser, index=['c', 'a'], columns=['hehe'])
df
If we construct from DataFrame, then the columns in the new DataFrame must be a subset of the original columns. If not, the new columns will be filled with NaN.
df1 = pd.DataFrame([1,2,3], index=['a','b','c'], columns=['x'])
df2 = pd.DataFrame(data=df1, index=['c', 'a'])
df3 = pd.DataFrame(data=df1, index=['c', 'a'], columns=['z'])
print(df2, '\n',df3)