pandas: Pandas Basics Exploration#

Loading data into Pandas DataFrame#

The dataset is from Kaggle - Pokemon.

import pandas as pd
data = pd.read_csv('data/Pokemon.csv')
data.shape, data.ndim

Hide code cell output

((800, 13), 2)

Get headers#

data.columns

Hide code cell output

Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
       'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')

First and last rows#

data.head(1), data.tail(1)

Hide code cell output

(   #       Name Type 1  Type 2  Total  HP  Attack  Defense  Sp. Atk  Sp. Def  \
 0  1  Bulbasaur  Grass  Poison    318  45      49       49       65       65   
 
    Speed  Generation  Legendary  
 0     45           1      False  ,
        #       Name Type 1 Type 2  Total  HP  Attack  Defense  Sp. Atk  \
 799  721  Volcanion   Fire  Water    600  80     110      120      130   
 
      Sp. Def  Speed  Generation  Legendary  
 799       90     70           6       True  )

High level description of the data#

data.describe()

Hide code cell output

# Total HP Attack Defense Sp. Atk Sp. Def Speed Generation
count 800.000000 800.00000 800.000000 800.000000 800.000000 800.000000 800.000000 800.000000 800.00000
mean 362.813750 435.10250 69.258750 79.001250 73.842500 72.820000 71.902500 68.277500 3.32375
std 208.343798 119.96304 25.534669 32.457366 31.183501 32.722294 27.828916 29.060474 1.66129
min 1.000000 180.00000 1.000000 5.000000 5.000000 10.000000 20.000000 5.000000 1.00000
25% 184.750000 330.00000 50.000000 55.000000 50.000000 49.750000 50.000000 45.000000 2.00000
50% 364.500000 450.00000 65.000000 75.000000 70.000000 65.000000 70.000000 65.000000 3.00000
75% 539.250000 515.00000 80.000000 100.000000 90.000000 95.000000 90.000000 90.000000 5.00000
max 721.000000 780.00000 255.000000 190.000000 230.000000 194.000000 230.000000 180.000000 6.00000

Get specific columns#

multiple_columns = data[['Name', 'HP', 'Legendary']]
print(multiple_columns.head())

Hide code cell output

                    Name  HP  Legendary
0              Bulbasaur  45      False
1                Ivysaur  60      False
2               Venusaur  80      False
3  VenusaurMega Venusaur  80      False
4             Charmander  39      False

Get specific rows#

# Get the first row, which contains the feature names
middle_row = data.iloc[0]
print(middle_row)

Hide code cell output

#                     1
Name          Bulbasaur
Type 1            Grass
Type 2           Poison
Total               318
HP                   45
Attack               49
Defense              49
Sp. Atk              65
Sp. Def              65
Speed                45
Generation            1
Legendary         False
Name: 0, dtype: object

Get specific coordinates item in the DataFrame#

# Get the name of the first 3 rows
item = data.iloc[[0, 1, 2], 1]
print(item)

Hide code cell output

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Name, dtype: object

Remove a column#

  • Here, if you do not use the columns parameter, you need to specify the axis (0 for rows and 1 for columns).

# Here, 'Total' column is the sum of HP,Attack,Defense, SP. Atk, SP. Def, Speed.
dropped_data = data.drop('Total', axis=1)
dropped_data.head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
  • Alternatively, you can use the columns parameter to specify the column(s) to drop without needing to specify the axis.

dropped_data = data.drop(columns='Total')
dropped_data.head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
  • inplace=True modifies the original DataFrame without needing to reassign it.

data.drop(columns='Total', inplace=True)
data.head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False

Add a column#

# So, in order to add the 'Total' column again, we do the summation
data['Total'] = data.iloc[:, 4:10].sum(axis=1)
data.head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False 318
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False 405
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False 525

iloc and loc in Pandas#

iloc#

# Read single row
data.iloc[3]

Hide code cell output

#                                 3
Name          VenusaurMega Venusaur
Type 1                        Grass
Type 2                       Poison
HP                               80
Attack                          100
Defense                         123
Sp. Atk                         122
Sp. Def                         120
Speed                            80
Generation                        1
Legendary                     False
Total                           625
Name: 3, dtype: object
# Read multiple rows
data.iloc[[3,6]]

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False 625
6 6 Charizard Fire Flying 78 84 78 109 85 100 1 False 534
# Read a range of rows
data.iloc[3:7]

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False 625
4 4 Charmander Fire NaN 39 52 43 60 50 65 1 False 309
5 5 Charmeleon Fire NaN 58 64 58 80 65 80 1 False 405
6 6 Charizard Fire Flying 78 84 78 109 85 100 1 False 534

loc#

The loc property in Pandas is used to access a group of rows and columns by labels or a boolean array. Unlike iloc, which uses integer-based indexing, loc uses the actual labels of the index and columns. This makes it very useful for selecting data based on meaningful row or column names.

data.loc[data['Legendary'] == True].head()

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
156 144 Articuno Ice Flying 90 85 100 95 125 85 1 True 580
157 145 Zapdos Electric Flying 90 90 85 125 90 100 1 True 580
158 146 Moltres Fire Flying 90 100 90 125 85 90 1 True 580
162 150 Mewtwo Psychic NaN 106 110 90 154 90 130 1 True 680
163 150 MewtwoMega Mewtwo X Psychic Fighting 106 190 100 154 100 130 1 True 780

If you have a condition to filter rows, then you must pass the columns that you want to select after the comma as a sequence. e.g., [condition, ['col1', 'col2']]

data.loc[data['Attack'] == 49, ['Name']]

Hide code cell output

Name
0 Bulbasaur
166 Chikorita
506 Finneon

Filtering data#

We can filter with multiple conditions at ease.

data.loc[(data['Type 1'] == 'Grass') & (data['Type 2'] == 'Poison')].head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False 318
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False 405
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False 525
data.loc[(data['Type 1'] == 'Grass') | (data['Type 2'] == 'Poison')][0::10].loc[data['HP'] > 50].head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
50 45 Vileplume Grass Poison 75 80 85 110 90 50 1 False 490
101 94 Gengar Ghost Poison 60 65 60 130 75 110 1 False 500
197 182 Bellossom Grass NaN 75 80 95 90 100 50 2 False 490

Regex Filtering#

import re

Explicit filtering#

data.loc[data['Name'].str.contains('Mega')].head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False 625
7 6 CharizardMega Charizard X Fire Dragon 78 130 111 130 85 100 1 False 634
8 6 CharizardMega Charizard Y Fire Flying 78 104 78 159 115 100 1 False 634

Regex filtering, case sensitive by default#

data.loc[data['Type 1'].str.contains('Fire|grass', regex=True)].head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
4 4 Charmander Fire NaN 39 52 43 60 50 65 1 False 309
5 5 Charmeleon Fire NaN 58 64 58 80 65 80 1 False 405
6 6 Charizard Fire Flying 78 84 78 109 85 100 1 False 534

Regex filtering, case insensitive#

data.loc[data['Type 1'].str.contains('Fire|grass', flags=re.IGNORECASE, regex=True)].head(10)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False 318
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False 405
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False 525
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False 625
4 4 Charmander Fire NaN 39 52 43 60 50 65 1 False 309
5 5 Charmeleon Fire NaN 58 64 58 80 65 80 1 False 405
6 6 Charizard Fire Flying 78 84 78 109 85 100 1 False 534
7 6 CharizardMega Charizard X Fire Dragon 78 130 111 130 85 100 1 False 634
8 6 CharizardMega Charizard Y Fire Flying 78 104 78 159 115 100 1 False 634
42 37 Vulpix Fire NaN 38 41 40 50 65 65 1 False 299

Using regex#

data.loc[data['Name'].str.contains('pi[a-z]*', flags=re.IGNORECASE, regex=True)].head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
13 10 Caterpie Bug NaN 45 30 35 20 20 45 1 False 195
20 16 Pidgey Normal Flying 40 45 40 35 35 56 1 False 251
21 17 Pidgeotto Normal Flying 63 60 55 50 50 71 1 False 349
data.loc[data['Name'].str.contains('^pi[a-z]*', flags=re.IGNORECASE, regex=True)].head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
20 16 Pidgey Normal Flying 40 45 40 35 35 56 1 False 251
21 17 Pidgeotto Normal Flying 63 60 55 50 50 71 1 False 349
22 18 Pidgeot Normal Flying 83 80 75 70 70 101 1 False 479
data.loc[data['Name'].str.contains('pi.*', flags=re.IGNORECASE, regex=True)].head(10)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
13 10 Caterpie Bug NaN 45 30 35 20 20 45 1 False 195
20 16 Pidgey Normal Flying 40 45 40 35 35 56 1 False 251
21 17 Pidgeotto Normal Flying 63 60 55 50 50 71 1 False 349
22 18 Pidgeot Normal Flying 83 80 75 70 70 101 1 False 479
23 18 PidgeotMega Pidgeot Normal Flying 83 80 80 135 80 121 1 False 579
30 25 Pikachu Electric NaN 35 55 40 50 50 90 1 False 320
42 37 Vulpix Fire NaN 38 41 40 50 65 65 1 False 299
76 70 Weepinbell Grass Poison 65 90 50 85 45 55 1 False 390
84 78 Rapidash Fire NaN 65 100 70 80 80 105 1 False 500
136 127 Pinsir Bug NaN 65 125 100 55 70 85 1 False 500

Sorting#

Sort by 1 column, ascending#

data.sort_values('Type 1').head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
600 540 Sewaddle Bug Grass 45 53 70 40 60 42 5 False 310
136 127 Pinsir Bug NaN 65 125 100 55 70 85 1 False 500
457 412 Burmy Bug NaN 40 29 45 29 45 36 4 False 224

Sort by 1 column, descending#

data.sort_values('Type 1', ascending=False).head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
371 339 Barboach Water Ground 50 48 43 46 41 60 3 False 288
97 90 Shellder Water NaN 30 65 100 45 25 40 1 False 305
240 222 Corsola Water Rock 55 55 85 65 85 35 2 False 380

Sort by multiple columns, 1 = ascending, 0 = descending#

data.sort_values(['Type 1', 'HP'], ascending=[1, 0]).head(3)

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
520 469 Yanmega Bug Flying 86 76 86 116 56 95 4 False 515
698 637 Volcarona Bug Fire 85 60 65 135 105 100 5 False 550
231 214 Heracross Bug Fighting 80 125 75 40 95 85 2 False 500

Reset index#

In the cell below, you can see the index is not consecutive (the leftmost column). We can reset it with the reset_index() method. By default, it adds the old index as a new column. If you do not want that, use the drop=True parameter.

new_data = data.loc[(data['Type 1'] == 'Grass') | (data['Type 2'] == 'Poison')][0::10].loc[data['HP'] > 50].head(3)
new_data

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
50 45 Vileplume Grass Poison 75 80 85 110 90 50 1 False 490
101 94 Gengar Ghost Poison 60 65 60 130 75 110 1 False 500
197 182 Bellossom Grass NaN 75 80 95 90 100 50 2 False 490
new_data.reset_index(drop=True, inplace=True)
new_data

Hide code cell output

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Total
0 45 Vileplume Grass Poison 75 80 85 110 90 50 1 False 490
1 94 Gengar Ghost Poison 60 65 60 130 75 110 1 False 500
2 182 Bellossom Grass NaN 75 80 95 90 100 50 2 False 490

Group by/Aggregate using aggregate function#

Aggregate functions like mean(), sum(), min(), and max() in pandas only work with numerical data. Non-numeric columns (such as strings or booleans) cannot be averaged or summed, so you must select only numeric columns when using these functions to avoid errors and get meaningful results.

Group by 1 column#

# Select only numeric columns for aggregation
numeric_cols = data.select_dtypes(include='number').columns
data.groupby(['Type 1'])[numeric_cols].mean().sort_values('Type 1').head(10)

Hide code cell output

# HP Attack Defense Sp. Atk Sp. Def Speed Generation Total
Type 1
Bug 334.492754 56.884058 70.971014 70.724638 53.869565 64.797101 61.681159 3.217391 378.927536
Dark 461.354839 66.806452 88.387097 70.225806 74.645161 69.516129 76.161290 4.032258 445.741935
Dragon 474.375000 83.312500 112.125000 86.375000 96.843750 88.843750 83.031250 3.875000 550.531250
Electric 363.500000 59.795455 69.090909 66.295455 90.022727 73.704545 84.500000 3.272727 443.409091
Fairy 449.529412 74.117647 61.529412 65.705882 78.529412 84.705882 48.588235 4.117647 413.176471
Fighting 363.851852 69.851852 96.777778 65.925926 53.111111 64.703704 66.074074 3.370370 416.444444
Fire 327.403846 69.903846 84.769231 67.769231 88.980769 72.211538 74.442308 3.211538 458.076923
Flying 677.750000 70.750000 78.750000 66.250000 94.250000 72.500000 102.500000 5.500000 485.000000
Ghost 486.500000 64.437500 73.781250 81.187500 79.343750 76.468750 64.343750 4.187500 439.562500
Grass 344.871429 67.271429 73.214286 70.800000 77.500000 70.428571 61.928571 3.357143 421.142857

Group by hierarchical columns#

data.groupby(['Type 1', 'Type 2'])[numeric_cols].count().head()

Hide code cell output

# HP Attack Defense Sp. Atk Sp. Def Speed Generation Total
Type 1 Type 2
Bug Electric 2 2 2 2 2 2 2 2 2
Fighting 2 2 2 2 2 2 2 2 2
Fire 2 2 2 2 2 2 2 2 2
Flying 14 14 14 14 14 14 14 14 14
Ghost 1 1 1 1 1 1 1 1 1

Quizzes#

1. Import the pandas library as pd.#

import pandas as pd

2. Read CSV into df#

df = pd.read_csv('data/Pokemon.csv')

3. Show first 5 rows#

df.head(5)

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False

4. Show last 3 rows#

df.tail(3)

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

5. Show all column names#

df.columns

Hide code cell output

Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
       'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')

6. Select rows index 10 to 15#

df.iloc[10:16]

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
10 8 Wartortle Water NaN 405 59 63 80 65 80 58 1 False
11 9 Blastoise Water NaN 530 79 83 100 85 105 78 1 False
12 9 BlastoiseMega Blastoise Water NaN 630 79 103 120 135 115 78 1 False
13 10 Caterpie Bug NaN 195 45 30 35 20 20 45 1 False
14 11 Metapod Bug NaN 205 50 20 55 25 25 30 1 False
15 12 Butterfree Bug Flying 395 60 45 50 90 80 70 1 False

7. Select row index 50#

df.iloc[50]

Hide code cell output

#                    45
Name          Vileplume
Type 1            Grass
Type 2           Poison
Total               490
HP                   75
Attack               80
Defense              85
Sp. Atk             110
Sp. Def              90
Speed                50
Generation            1
Legendary         False
Name: 50, dtype: object

8. Filter rows Type 1 == Grass#

df.loc[df['Type 1'] == 'Grass'].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
48 43 Oddish Grass Poison 320 45 50 55 75 65 30 1 False

9. Filter rows HP > 100#

df.loc[df['HP'] > 100].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
44 39 Jigglypuff Normal Fairy 270 115 45 20 45 25 20 1 False
45 40 Wigglytuff Normal Fairy 435 140 70 45 85 50 45 1 False
96 89 Muk Poison NaN 500 105 105 75 65 100 50 1 False
120 112 Rhydon Ground Rock 485 105 130 120 45 45 40 1 False
121 113 Chansey Normal NaN 450 250 5 5 35 105 50 1 False

10. Filter rows Name contains chu#

df.loc[df['Name'].str.contains('chu')].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
30 25 Pikachu Electric NaN 320 35 55 40 50 50 90 1 False
31 26 Raichu Electric NaN 485 60 90 55 90 80 110 1 False
186 172 Pichu Electric NaN 205 20 40 15 35 35 60 2 False
257 238 Smoochum Ice Psychic 305 45 30 15 85 65 65 2 False

11. Sort by Attack descending#

df.sort_values('Attack', ascending=False).head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
163 150 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 154 100 130 1 True
232 214 HeracrossMega Heracross Bug Fighting 600 80 185 115 40 105 75 2 False
429 386 DeoxysAttack Forme Psychic NaN 600 50 180 20 180 20 150 3 True
426 384 RayquazaMega Rayquaza Dragon Flying 780 105 180 100 180 100 115 3 True
424 383 GroudonPrimal Groudon Ground Fire 770 100 180 160 150 90 90 3 True

12. Reset index#

new_df = df.reset_index(drop=True)
new_df.head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False

13. Select columns Name, Type 1, HP#

df[['Name', 'Type 1', 'HP']].head()

Hide code cell output

Name Type 1 HP
0 Bulbasaur Grass 45
1 Ivysaur Grass 60
2 Venusaur Grass 80
3 VenusaurMega Venusaur Grass 80
4 Charmander Fire 39

14. Filter Type 1 == Grass and Type 2 == Poison#

df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
48 43 Oddish Grass Poison 320 45 50 55 75 65 30 1 False

15. Mean HP of Legendary#

df.loc[df['Legendary'] == True]['HP'].mean()

Hide code cell output

np.float64(92.73846153846154)

16. Rows with even indices#

even_indices_df = df.iloc[::2]
even_indices_df.head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 159 115 100 1 False

17. Save even indices to filtered_pokemon.csv#

even_indices_df = df.iloc[::2]
even_indices_df.to_csv('data/filtered_pokemon.csv', index=True)
even_indices_df = pd.read_csv('data/filtered_pokemon.csv', index_col=0)
even_indices_df.head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 159 115 100 1 False

18. First 10 rows Name, Type 1, Attack#

df[['Name', 'Type 1', 'Attack']].head(10)

Hide code cell output

Name Type 1 Attack
0 Bulbasaur Grass 49
1 Ivysaur Grass 62
2 Venusaur Grass 82
3 VenusaurMega Venusaur Grass 100
4 Charmander Fire 52
5 Charmeleon Fire 64
6 Charizard Fire 84
7 CharizardMega Charizard X Fire 130
8 CharizardMega Charizard Y Fire 104
9 Squirtle Water 48

19. Max Defense#

df['Defense'].max()

Hide code cell output

np.int64(230)

20. Rows Speed above mean#

mean_speed = df['Speed'].mean()
new_df = df.loc[df['Speed'] > mean_speed]
print('mean speed: ', mean_speed)
new_df.head()

Hide code cell output

mean speed:  68.2775
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
5 5 Charmeleon Fire NaN 405 58 64 58 80 65 80 1 False
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
7 6 CharizardMega Charizard X Fire Dragon 634 78 130 111 130 85 100 1 False

21. First 3 rows Name and Attack#

df[['Name', 'Attack']].iloc[:3]

Hide code cell output

Name Attack
0 Bulbasaur 49
1 Ivysaur 62
2 Venusaur 82

22. Count per Type 1#

df.groupby(['Type 1'])['#'].count().sort_index()

Hide code cell output

Type 1
Bug          69
Dark         31
Dragon       32
Electric     44
Fairy        17
Fighting     27
Fire         52
Flying        4
Ghost        32
Grass        70
Ground       32
Ice          24
Normal       98
Poison       28
Psychic      57
Rock         44
Steel        27
Water       112
Name: #, dtype: int64

23. Rows Generation 3 or 5#

df.loc[(df['Generation'] == 3) | (df['Generation'] == 5)].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
272 252 Treecko Grass NaN 310 40 45 35 65 55 70 3 False
273 253 Grovyle Grass NaN 405 50 65 45 85 65 95 3 False
274 254 Sceptile Grass NaN 530 70 85 65 105 85 120 3 False
275 254 SceptileMega Sceptile Grass Dragon 630 70 110 75 145 85 145 3 False
276 255 Torchic Fire NaN 310 45 60 40 70 50 45 3 False

24. Replace Fire with Flame in Type 1#

df_copy = df.copy()
df_copy['Type 1'] = df_copy['Type 1'].str.replace('Fire', 'Flame')
df_copy.head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Flame NaN 309 39 52 43 60 50 65 1 False

25. Rows Name startswith P#

df.loc[df['Name'].str.startswith('P')].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
20 16 Pidgey Normal Flying 251 40 45 40 35 35 56 1 False
21 17 Pidgeotto Normal Flying 349 63 60 55 50 50 71 1 False
22 18 Pidgeot Normal Flying 479 83 80 75 70 70 101 1 False
23 18 PidgeotMega Pidgeot Normal Flying 579 83 80 80 135 80 121 1 False
30 25 Pikachu Electric NaN 320 35 55 40 50 50 90 1 False

26. Mean Attack per Type 1#

df.groupby('Type 1')['Attack'].mean().sort_index()

Hide code cell output

Type 1
Bug          70.971014
Dark         88.387097
Dragon      112.125000
Electric     69.090909
Fairy        61.529412
Fighting     96.777778
Fire         84.769231
Flying       78.750000
Ghost        73.781250
Grass        73.214286
Ground       95.750000
Ice          72.750000
Normal       73.469388
Poison       74.678571
Psychic      71.456140
Rock         92.863636
Steel        92.703704
Water        74.151786
Name: Attack, dtype: float64

27. Sort by Type 1 asc, Attack desc#

df.sort_values(['Type 1', 'Attack'], ascending=[1, 0]).head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
232 214 HeracrossMega Heracross Bug Fighting 600 80 185 115 40 105 75 2 False
137 127 PinsirMega Pinsir Bug Flying 600 65 155 120 65 90 105 1 False
19 15 BeedrillMega Beedrill Bug Poison 495 65 150 40 15 80 145 1 False
229 212 ScizorMega Scizor Bug Steel 600 70 150 140 65 100 75 2 False
650 589 Escavalier Bug Steel 495 70 135 105 60 105 20 5 False

28. Mean Defense per Type 1#

df.groupby('Type 1')['Defense'].mean().sort_index()

Hide code cell output

Type 1
Bug          70.724638
Dark         70.225806
Dragon       86.375000
Electric     66.295455
Fairy        65.705882
Fighting     65.925926
Fire         67.769231
Flying       66.250000
Ghost        81.187500
Grass        70.800000
Ground       84.843750
Ice          71.416667
Normal       59.846939
Poison       68.821429
Psychic      67.684211
Rock        100.795455
Steel       126.370370
Water        72.946429
Name: Defense, dtype: float64

29. Legendary with Attack > 100#

df.loc[(df['Legendary'] == True) & (df['Attack'] > 100)].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
162 150 Mewtwo Psychic NaN 680 106 110 90 154 90 130 1 True
163 150 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 154 100 130 1 True
164 150 MewtwoMega Mewtwo Y Psychic NaN 780 106 150 70 194 120 140 1 True
263 244 Entei Fire NaN 580 115 115 85 90 75 100 2 True
270 250 Ho-oh Fire Flying 680 106 130 90 110 154 90 2 True

30. New DataFrame Name and Total renamed Overall#

new_df = df[['Name', 'Total']].rename(columns={'Total': 'Overall'})
new_df.head()

Hide code cell output

Name Overall
0 Bulbasaur 318
1 Ivysaur 405
2 Venusaur 525
3 VenusaurMega Venusaur 625
4 Charmander 309

31. Filter Type 1 == Water and Attack >= 80#

df.loc[(df['Type 1'] == 'Water') & (df['Attack'] >= 80)].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
11 9 Blastoise Water NaN 530 79 83 100 85 105 78 1 False
12 9 BlastoiseMega Blastoise Water NaN 630 79 103 120 135 115 78 1 False
60 55 Golduck Water NaN 500 80 82 78 95 80 85 1 False
67 62 Poliwrath Water Fighting 510 90 95 95 70 90 70 1 False
98 91 Cloyster Water Ice 525 50 95 180 85 45 70 1 False

32. Generation 4 Name and Defense#

df.loc[df['Generation'] == 4][['Name', 'Defense']].head()

Hide code cell output

Name Defense
432 Turtwig 64
433 Grotle 85
434 Torterra 105
435 Chimchar 44
436 Monferno 52

33. Mean Total per (Type 1, Type 2)#

df.groupby(['Type 1', 'Type 2'])['Total'].mean().head(10)

Hide code cell output

Type 1  Type 2  
Bug     Electric    395.500000
        Fighting    550.000000
        Fire        455.000000
        Flying      419.500000
        Ghost       236.000000
        Grass       384.000000
        Ground      345.000000
        Poison      347.916667
        Rock        435.000000
        Steel       509.714286
Name: Total, dtype: float64

34. Filter Total between 400 and 500#

df.loc[(df['Total'] >= 400) & (df['Total'] <= 500)].head()

Hide code cell output

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
5 5 Charmeleon Fire NaN 405 58 64 58 80 65 80 1 False
10 8 Wartortle Water NaN 405 59 63 80 65 80 58 1 False
19 15 BeedrillMega Beedrill Bug Poison 495 65 150 40 15 80 145 1 False
22 18 Pidgeot Normal Flying 479 83 80 75 70 70 101 1 False