pandas: Pandas Basics Exploration#

Loading data into Pandas DataFrame#

import pandas as pd

data = pd.read_csv('data/Pokemon.csv')

data.shape, data.ndim

Get headers#

data.columns

First and last rows#

data.head(1), data.tail(1)

High level description of the data#

data.describe()

Show code cell output

Hide code cell output

	#	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation
count	800.000000	800.00000	800.000000	800.000000	800.000000	800.000000	800.000000	800.000000	800.00000
mean	362.813750	435.10250	69.258750	79.001250	73.842500	72.820000	71.902500	68.277500	3.32375
std	208.343798	119.96304	25.534669	32.457366	31.183501	32.722294	27.828916	29.060474	1.66129
min	1.000000	180.00000	1.000000	5.000000	5.000000	10.000000	20.000000	5.000000	1.00000
25%	184.750000	330.00000	50.000000	55.000000	50.000000	49.750000	50.000000	45.000000	2.00000
50%	364.500000	450.00000	65.000000	75.000000	70.000000	65.000000	70.000000	65.000000	3.00000
75%	539.250000	515.00000	80.000000	100.000000	90.000000	95.000000	90.000000	90.000000	5.00000
max	721.000000	780.00000	255.000000	190.000000	230.000000	194.000000	230.000000	180.000000	6.00000

Get specific columns#

multiple_columns = data[['Name', 'HP', 'Legendary']]
print(multiple_columns.head())

Get specific rows#

# Get the first row, which contains the feature names
middle_row = data.iloc[0]
print(middle_row)

Get specific coordinates item in the DataFrame#

# Get the name of the first 3 rows
item = data.iloc[[0, 1, 2], 1]
print(item)

Remove a column#

Here, if you do not use the columns parameter, you need to specify the axis (0 for rows and 1 for columns).

# Here, 'Total' column is the sum of HP,Attack,Defense, SP. Atk, SP. Def, Speed.
dropped_data = data.drop('Total', axis=1)
dropped_data.head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	80	82	83	100	100	80	1	False

Alternatively, you can use the columns parameter to specify the column(s) to drop without needing to specify the axis.

dropped_data = data.drop(columns='Total')
dropped_data.head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	80	82	83	100	100	80	1	False

inplace=True modifies the original DataFrame without needing to reassign it.

data.drop(columns='Total', inplace=True)
data.head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	80	82	83	100	100	80	1	False

Add a column#

# So, in order to add the 'Total' column again, we do the summation
data['Total'] = data.iloc[:, 4:10].sum(axis=1)
data.head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False	318
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False	405
2	3	Venusaur	Grass	Poison	80	82	83	100	100	80	1	False	525

`iloc` and `loc` in Pandas#

`iloc`#

# Read single row
data.iloc[3]

# Read multiple rows
data.iloc[[3,6]]

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
3	3	VenusaurMega Venusaur	Grass	Poison	80	100	123	122	120	80	1	False	625
6	6	Charizard	Fire	Flying	78	84	78	109	85	100	1	False	534

# Read a range of rows
data.iloc[3:7]

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
3	3	VenusaurMega Venusaur	Grass	Poison	80	100	123	122	120	80	1	False	625
4	4	Charmander	Fire	NaN	39	52	43	60	50	65	1	False	309
5	5	Charmeleon	Fire	NaN	58	64	58	80	65	80	1	False	405
6	6	Charizard	Fire	Flying	78	84	78	109	85	100	1	False	534

`loc`#

The loc property in Pandas is used to access a group of rows and columns by labels or a boolean array. Unlike iloc, which uses integer-based indexing, loc uses the actual labels of the index and columns. This makes it very useful for selecting data based on meaningful row or column names.

data.loc[data['Legendary'] == True].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
156	144	Articuno	Ice	Flying	90	85	100	95	125	85	1	True	580
157	145	Zapdos	Electric	Flying	90	90	85	125	90	100	1	True	580
158	146	Moltres	Fire	Flying	90	100	90	125	85	90	1	True	580
162	150	Mewtwo	Psychic	NaN	106	110	90	154	90	130	1	True	680
163	150	MewtwoMega Mewtwo X	Psychic	Fighting	106	190	100	154	100	130	1	True	780

If you have a condition to filter rows, then you must pass the columns that you want to select after the comma as a sequence. e.g., [condition, ['col1', 'col2']]

data.loc[data['Attack'] == 49, ['Name']]

Show code cell output

Hide code cell output

	Name
0	Bulbasaur
166	Chikorita
506	Finneon

Filtering data#

We can filter with multiple conditions at ease.

data.loc[(data['Type 1'] == 'Grass') & (data['Type 2'] == 'Poison')].head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False	318
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False	405
2	3	Venusaur	Grass	Poison	80	82	83	100	100	80	1	False	525

data.loc[(data['Type 1'] == 'Grass') | (data['Type 2'] == 'Poison')][0::10].loc[data['HP'] > 50].head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
50	45	Vileplume	Grass	Poison	75	80	85	110	90	50	1	False	490
101	94	Gengar	Ghost	Poison	60	65	60	130	75	110	1	False	500
197	182	Bellossom	Grass	NaN	75	80	95	90	100	50	2	False	490

Regex Filtering#

import re

Explicit filtering#

data.loc[data['Name'].str.contains('Mega')].head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
3	3	VenusaurMega Venusaur	Grass	Poison	80	100	123	122	120	80	1	False	625
7	6	CharizardMega Charizard X	Fire	Dragon	78	130	111	130	85	100	1	False	634
8	6	CharizardMega Charizard Y	Fire	Flying	78	104	78	159	115	100	1	False	634

Regex filtering, case sensitive by default#

data.loc[data['Type 1'].str.contains('Fire|grass', regex=True)].head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
4	4	Charmander	Fire	NaN	39	52	43	60	50	65	1	False	309
5	5	Charmeleon	Fire	NaN	58	64	58	80	65	80	1	False	405
6	6	Charizard	Fire	Flying	78	84	78	109	85	100	1	False	534

Regex filtering, case insensitive#

data.loc[data['Type 1'].str.contains('Fire|grass', flags=re.IGNORECASE, regex=True)].head(10)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False	318
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False	405
2	3	Venusaur	Grass	Poison	80	82	83	100	100	80	1	False	525
3	3	VenusaurMega Venusaur	Grass	Poison	80	100	123	122	120	80	1	False	625
4	4	Charmander	Fire	NaN	39	52	43	60	50	65	1	False	309
5	5	Charmeleon	Fire	NaN	58	64	58	80	65	80	1	False	405
6	6	Charizard	Fire	Flying	78	84	78	109	85	100	1	False	534
7	6	CharizardMega Charizard X	Fire	Dragon	78	130	111	130	85	100	1	False	634
8	6	CharizardMega Charizard Y	Fire	Flying	78	104	78	159	115	100	1	False	634
42	37	Vulpix	Fire	NaN	38	41	40	50	65	65	1	False	299

Using regex#

data.loc[data['Name'].str.contains('pi[a-z]*', flags=re.IGNORECASE, regex=True)].head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
13	10	Caterpie	Bug	NaN	45	30	35	20	20	45	1	False	195
20	16	Pidgey	Normal	Flying	40	45	40	35	35	56	1	False	251
21	17	Pidgeotto	Normal	Flying	63	60	55	50	50	71	1	False	349

data.loc[data['Name'].str.contains('^pi[a-z]*', flags=re.IGNORECASE, regex=True)].head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
20	16	Pidgey	Normal	Flying	40	45	40	35	35	56	1	False	251
21	17	Pidgeotto	Normal	Flying	63	60	55	50	50	71	1	False	349
22	18	Pidgeot	Normal	Flying	83	80	75	70	70	101	1	False	479

data.loc[data['Name'].str.contains('pi.*', flags=re.IGNORECASE, regex=True)].head(10)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
13	10	Caterpie	Bug	NaN	45	30	35	20	20	45	1	False	195
20	16	Pidgey	Normal	Flying	40	45	40	35	35	56	1	False	251
21	17	Pidgeotto	Normal	Flying	63	60	55	50	50	71	1	False	349
22	18	Pidgeot	Normal	Flying	83	80	75	70	70	101	1	False	479
23	18	PidgeotMega Pidgeot	Normal	Flying	83	80	80	135	80	121	1	False	579
30	25	Pikachu	Electric	NaN	35	55	40	50	50	90	1	False	320
42	37	Vulpix	Fire	NaN	38	41	40	50	65	65	1	False	299
76	70	Weepinbell	Grass	Poison	65	90	50	85	45	55	1	False	390
84	78	Rapidash	Fire	NaN	65	100	70	80	80	105	1	False	500
136	127	Pinsir	Bug	NaN	65	125	100	55	70	85	1	False	500

Sorting#

Sort by 1 column, ascending#

data.sort_values('Type 1').head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
600	540	Sewaddle	Bug	Grass	45	53	70	40	60	42	5	False	310
136	127	Pinsir	Bug	NaN	65	125	100	55	70	85	1	False	500
457	412	Burmy	Bug	NaN	40	29	45	29	45	36	4	False	224

Sort by 1 column, descending#

data.sort_values('Type 1', ascending=False).head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
371	339	Barboach	Water	Ground	50	48	43	46	41	60	3	False	288
97	90	Shellder	Water	NaN	30	65	100	45	25	40	1	False	305
240	222	Corsola	Water	Rock	55	55	85	65	85	35	2	False	380

Sort by multiple columns, 1 = ascending, 0 = descending#

data.sort_values(['Type 1', 'HP'], ascending=[1, 0]).head(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
520	469	Yanmega	Bug	Flying	86	76	86	116	56	95	4	False	515
698	637	Volcarona	Bug	Fire	85	60	65	135	105	100	5	False	550
231	214	Heracross	Bug	Fighting	80	125	75	40	95	85	2	False	500

Reset index#

In the cell below, you can see the index is not consecutive (the leftmost column). We can reset it with the reset_index() method. By default, it adds the old index as a new column. If you do not want that, use the drop=True parameter.

new_data = data.loc[(data['Type 1'] == 'Grass') | (data['Type 2'] == 'Poison')][0::10].loc[data['HP'] > 50].head(3)
new_data

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
50	45	Vileplume	Grass	Poison	75	80	85	110	90	50	1	False	490
101	94	Gengar	Ghost	Poison	60	65	60	130	75	110	1	False	500
197	182	Bellossom	Grass	NaN	75	80	95	90	100	50	2	False	490

new_data.reset_index(drop=True, inplace=True)
new_data

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary	Total
0	45	Vileplume	Grass	Poison	75	80	85	110	90	50	1	False	490
1	94	Gengar	Ghost	Poison	60	65	60	130	75	110	1	False	500
2	182	Bellossom	Grass	NaN	75	80	95	90	100	50	2	False	490

Group by/Aggregate using aggregate function#

Aggregate functions like mean(), sum(), min(), and max() in pandas only work with numerical data. Non-numeric columns (such as strings or booleans) cannot be averaged or summed, so you must select only numeric columns when using these functions to avoid errors and get meaningful results.

Group by 1 column#

# Select only numeric columns for aggregation
numeric_cols = data.select_dtypes(include='number').columns
data.groupby(['Type 1'])[numeric_cols].mean().sort_values('Type 1').head(10)

Show code cell output

Hide code cell output

	#	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Total
Type 1
Bug	334.492754	56.884058	70.971014	70.724638	53.869565	64.797101	61.681159	3.217391	378.927536
Dark	461.354839	66.806452	88.387097	70.225806	74.645161	69.516129	76.161290	4.032258	445.741935
Dragon	474.375000	83.312500	112.125000	86.375000	96.843750	88.843750	83.031250	3.875000	550.531250
Electric	363.500000	59.795455	69.090909	66.295455	90.022727	73.704545	84.500000	3.272727	443.409091
Fairy	449.529412	74.117647	61.529412	65.705882	78.529412	84.705882	48.588235	4.117647	413.176471
Fighting	363.851852	69.851852	96.777778	65.925926	53.111111	64.703704	66.074074	3.370370	416.444444
Fire	327.403846	69.903846	84.769231	67.769231	88.980769	72.211538	74.442308	3.211538	458.076923
Flying	677.750000	70.750000	78.750000	66.250000	94.250000	72.500000	102.500000	5.500000	485.000000
Ghost	486.500000	64.437500	73.781250	81.187500	79.343750	76.468750	64.343750	4.187500	439.562500
Grass	344.871429	67.271429	73.214286	70.800000	77.500000	70.428571	61.928571	3.357143	421.142857

Group by hierarchical columns#

data.groupby(['Type 1', 'Type 2'])[numeric_cols].count().head()

Show code cell output

Hide code cell output

		#	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Total
Type 1	Type 2
Bug	Electric	2	2	2	2	2	2	2	2	2
	Fighting	2	2	2	2	2	2	2	2	2
	Fire	2	2	2	2	2	2	2	2	2
	Flying	14	14	14	14	14	14	14	14	14
	Ghost	1	1	1	1	1	1	1	1	1

Quizzes#

1. Import the pandas library as pd.#

import pandas as pd

2. Read CSV into df#

df = pd.read_csv('data/Pokemon.csv')

3. Show first 5 rows#

df.head(5)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
4	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False

4. Show last 3 rows#

df.tail(3)

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
797	720	HoopaHoopa Confined	Psychic	Ghost	600	80	110	60	150	130	70	6	True
798	720	HoopaHoopa Unbound	Psychic	Dark	680	80	160	60	170	130	80	6	True
799	721	Volcanion	Fire	Water	600	80	110	120	130	90	70	6	True

5. Show all column names#

df.columns

6. Select rows index 10 to 15#

df.iloc[10:16]

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
10	8	Wartortle	Water	NaN	405	59	63	80	65	80	58	1	False
11	9	Blastoise	Water	NaN	530	79	83	100	85	105	78	1	False
12	9	BlastoiseMega Blastoise	Water	NaN	630	79	103	120	135	115	78	1	False
13	10	Caterpie	Bug	NaN	195	45	30	35	20	20	45	1	False
14	11	Metapod	Bug	NaN	205	50	20	55	25	25	30	1	False
15	12	Butterfree	Bug	Flying	395	60	45	50	90	80	70	1	False

7. Select row index 50#

df.iloc[50]

8. Filter rows Type 1 == Grass#

df.loc[df['Type 1'] == 'Grass'].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
48	43	Oddish	Grass	Poison	320	45	50	55	75	65	30	1	False

9. Filter rows HP > 100#

df.loc[df['HP'] > 100].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
44	39	Jigglypuff	Normal	Fairy	270	115	45	20	45	25	20	1	False
45	40	Wigglytuff	Normal	Fairy	435	140	70	45	85	50	45	1	False
96	89	Muk	Poison	NaN	500	105	105	75	65	100	50	1	False
120	112	Rhydon	Ground	Rock	485	105	130	120	45	45	40	1	False
121	113	Chansey	Normal	NaN	450	250	5	5	35	105	50	1	False

10. Filter rows Name contains chu#

df.loc[df['Name'].str.contains('chu')].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
30	25	Pikachu	Electric	NaN	320	35	55	40	50	50	90	1	False
31	26	Raichu	Electric	NaN	485	60	90	55	90	80	110	1	False
186	172	Pichu	Electric	NaN	205	20	40	15	35	35	60	2	False
257	238	Smoochum	Ice	Psychic	305	45	30	15	85	65	65	2	False

11. Sort by Attack descending#

df.sort_values('Attack', ascending=False).head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
163	150	MewtwoMega Mewtwo X	Psychic	Fighting	780	106	190	100	154	100	130	1	True
232	214	HeracrossMega Heracross	Bug	Fighting	600	80	185	115	40	105	75	2	False
429	386	DeoxysAttack Forme	Psychic	NaN	600	50	180	20	180	20	150	3	True
426	384	RayquazaMega Rayquaza	Dragon	Flying	780	105	180	100	180	100	115	3	True
424	383	GroudonPrimal Groudon	Ground	Fire	770	100	180	160	150	90	90	3	True

12. Reset index#

new_df = df.reset_index(drop=True)
new_df.head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
4	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False

13. Select columns Name, Type 1, HP#

df[['Name', 'Type 1', 'HP']].head()

Show code cell output

Hide code cell output

	Name	Type 1	HP
0	Bulbasaur	Grass	45
1	Ivysaur	Grass	60
2	Venusaur	Grass	80
3	VenusaurMega Venusaur	Grass	80
4	Charmander	Fire	39

14. Filter Type 1 == Grass and Type 2 == Poison#

df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
48	43	Oddish	Grass	Poison	320	45	50	55	75	65	30	1	False

15. Mean HP of Legendary#

df.loc[df['Legendary'] == True]['HP'].mean()

16. Rows with even indices#

even_indices_df = df.iloc[::2]
even_indices_df.head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
4	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False
6	6	Charizard	Fire	Flying	534	78	84	78	109	85	100	1	False
8	6	CharizardMega Charizard Y	Fire	Flying	634	78	104	78	159	115	100	1	False

17. Save even indices to filtered_pokemon.csv#

even_indices_df = df.iloc[::2]
even_indices_df.to_csv('data/filtered_pokemon.csv', index=True)
even_indices_df = pd.read_csv('data/filtered_pokemon.csv', index_col=0)
even_indices_df.head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
4	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False
6	6	Charizard	Fire	Flying	534	78	84	78	109	85	100	1	False
8	6	CharizardMega Charizard Y	Fire	Flying	634	78	104	78	159	115	100	1	False

18. First 10 rows Name, Type 1, Attack#

df[['Name', 'Type 1', 'Attack']].head(10)

Show code cell output

Hide code cell output

	Name	Type 1	Attack
0	Bulbasaur	Grass	49
1	Ivysaur	Grass	62
2	Venusaur	Grass	82
3	VenusaurMega Venusaur	Grass	100
4	Charmander	Fire	52
5	Charmeleon	Fire	64
6	Charizard	Fire	84
7	CharizardMega Charizard X	Fire	130
8	CharizardMega Charizard Y	Fire	104
9	Squirtle	Water	48

19. Max Defense#

df['Defense'].max()

20. Rows Speed above mean#

mean_speed = df['Speed'].mean()
new_df = df.loc[df['Speed'] > mean_speed]
print('mean speed: ', mean_speed)
new_df.head()

Show code cell output

Hide code cell output

mean speed:  68.2775

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
5	5	Charmeleon	Fire	NaN	405	58	64	58	80	65	80	1	False
6	6	Charizard	Fire	Flying	534	78	84	78	109	85	100	1	False
7	6	CharizardMega Charizard X	Fire	Dragon	634	78	130	111	130	85	100	1	False

21. First 3 rows Name and Attack#

df[['Name', 'Attack']].iloc[:3]

Show code cell output

Hide code cell output

	Name	Attack
0	Bulbasaur	49
1	Ivysaur	62
2	Venusaur	82

22. Count per Type 1#

df.groupby(['Type 1'])['#'].count().sort_index()

23. Rows Generation 3 or 5#

df.loc[(df['Generation'] == 3) | (df['Generation'] == 5)].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
272	252	Treecko	Grass	NaN	310	40	45	35	65	55	70	3	False
273	253	Grovyle	Grass	NaN	405	50	65	45	85	65	95	3	False
274	254	Sceptile	Grass	NaN	530	70	85	65	105	85	120	3	False
275	254	SceptileMega Sceptile	Grass	Dragon	630	70	110	75	145	85	145	3	False
276	255	Torchic	Fire	NaN	310	45	60	40	70	50	45	3	False

24. Replace Fire with Flame in Type 1#

df_copy = df.copy()
df_copy['Type 1'] = df_copy['Type 1'].str.replace('Fire', 'Flame')
df_copy.head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
4	4	Charmander	Flame	NaN	309	39	52	43	60	50	65	1	False

25. Rows Name startswith P#

df.loc[df['Name'].str.startswith('P')].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
20	16	Pidgey	Normal	Flying	251	40	45	40	35	35	56	1	False
21	17	Pidgeotto	Normal	Flying	349	63	60	55	50	50	71	1	False
22	18	Pidgeot	Normal	Flying	479	83	80	75	70	70	101	1	False
23	18	PidgeotMega Pidgeot	Normal	Flying	579	83	80	80	135	80	121	1	False
30	25	Pikachu	Electric	NaN	320	35	55	40	50	50	90	1	False

26. Mean Attack per Type 1#

df.groupby('Type 1')['Attack'].mean().sort_index()

27. Sort by Type 1 asc, Attack desc#

df.sort_values(['Type 1', 'Attack'], ascending=[1, 0]).head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
232	214	HeracrossMega Heracross	Bug	Fighting	600	80	185	115	40	105	75	2	False
137	127	PinsirMega Pinsir	Bug	Flying	600	65	155	120	65	90	105	1	False
19	15	BeedrillMega Beedrill	Bug	Poison	495	65	150	40	15	80	145	1	False
229	212	ScizorMega Scizor	Bug	Steel	600	70	150	140	65	100	75	2	False
650	589	Escavalier	Bug	Steel	495	70	135	105	60	105	20	5	False

28. Mean Defense per Type 1#

df.groupby('Type 1')['Defense'].mean().sort_index()

29. Legendary with Attack > 100#

df.loc[(df['Legendary'] == True) & (df['Attack'] > 100)].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
162	150	Mewtwo	Psychic	NaN	680	106	110	90	154	90	130	1	True
163	150	MewtwoMega Mewtwo X	Psychic	Fighting	780	106	190	100	154	100	130	1	True
164	150	MewtwoMega Mewtwo Y	Psychic	NaN	780	106	150	70	194	120	140	1	True
263	244	Entei	Fire	NaN	580	115	115	85	90	75	100	2	True
270	250	Ho-oh	Fire	Flying	680	106	130	90	110	154	90	2	True

30. New DataFrame Name and Total renamed Overall#

new_df = df[['Name', 'Total']].rename(columns={'Total': 'Overall'})
new_df.head()

Show code cell output

Hide code cell output

	Name	Overall
0	Bulbasaur	318
1	Ivysaur	405
2	Venusaur	525
3	VenusaurMega Venusaur	625
4	Charmander	309

31. Filter Type 1 == Water and Attack >= 80#

df.loc[(df['Type 1'] == 'Water') & (df['Attack'] >= 80)].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
11	9	Blastoise	Water	NaN	530	79	83	100	85	105	78	1	False
12	9	BlastoiseMega Blastoise	Water	NaN	630	79	103	120	135	115	78	1	False
60	55	Golduck	Water	NaN	500	80	82	78	95	80	85	1	False
67	62	Poliwrath	Water	Fighting	510	90	95	95	70	90	70	1	False
98	91	Cloyster	Water	Ice	525	50	95	180	85	45	70	1	False

32. Generation 4 Name and Defense#

df.loc[df['Generation'] == 4][['Name', 'Defense']].head()

Show code cell output

Hide code cell output

	Name	Defense
432	Turtwig	64
433	Grotle	85
434	Torterra	105
435	Chimchar	44
436	Monferno	52

33. Mean Total per (Type 1, Type 2)#

df.groupby(['Type 1', 'Type 2'])['Total'].mean().head(10)

34. Filter Total between 400 and 500#

df.loc[(df['Total'] >= 400) & (df['Total'] <= 500)].head()

Show code cell output

Hide code cell output

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
5	5	Charmeleon	Fire	NaN	405	58	64	58	80	65	80	1	False
10	8	Wartortle	Water	NaN	405	59	63	80	65	80	58	1	False
19	15	BeedrillMega Beedrill	Bug	Poison	495	65	150	40	15	80	145	1	False
22	18	Pidgeot	Normal	Flying	479	83	80	75	70	70	101	1	False

pandas: Pandas Basics Exploration#

Loading data into Pandas DataFrame#

Get headers#

First and last rows#

High level description of the data#

Get specific columns#

Get specific rows#

Get specific coordinates item in the DataFrame#

Remove a column#

Add a column#

iloc and loc in Pandas#

iloc#

loc#

Filtering data#

Regex Filtering#

Explicit filtering#

Regex filtering, case sensitive by default#

Regex filtering, case insensitive#

Using regex#

Sorting#

Sort by 1 column, ascending#

Sort by 1 column, descending#

Sort by multiple columns, 1 = ascending, 0 = descending#

Reset index#

Group by/Aggregate using aggregate function#

Group by 1 column#

Group by hierarchical columns#

Quizzes#

1. Import the pandas library as pd.#

2. Read CSV into df#

3. Show first 5 rows#

4. Show last 3 rows#

5. Show all column names#

6. Select rows index 10 to 15#

7. Select row index 50#

8. Filter rows Type 1 == Grass#

9. Filter rows HP > 100#

10. Filter rows Name contains chu#

11. Sort by Attack descending#

12. Reset index#

13. Select columns Name, Type 1, HP#

14. Filter Type 1 == Grass and Type 2 == Poison#

15. Mean HP of Legendary#

16. Rows with even indices#

17. Save even indices to filtered_pokemon.csv#

18. First 10 rows Name, Type 1, Attack#

19. Max Defense#

20. Rows Speed above mean#

21. First 3 rows Name and Attack#

22. Count per Type 1#

23. Rows Generation 3 or 5#

24. Replace Fire with Flame in Type 1#

25. Rows Name startswith P#

26. Mean Attack per Type 1#

27. Sort by Type 1 asc, Attack desc#

28. Mean Defense per Type 1#

29. Legendary with Attack > 100#

30. New DataFrame Name and Total renamed Overall#

31. Filter Type 1 == Water and Attack >= 80#

32. Generation 4 Name and Defense#

33. Mean Total per (Type 1, Type 2)#

34. Filter Total between 400 and 500#

This Page

`iloc` and `loc` in Pandas#

`iloc`#

`loc`#