IST 718 | LAB 3¶

01-23-20

V1 DATA SOURCE: Coaches9¶

O: OBTAIN¶

import pandas as pd
data = pd.read_csv("Coaches9.csv")

data.head()

S: SCRUB¶

Change the columns with dollar amounts as strings to floats
Inspect and possibly replace missing values

attrs = list(data[data.columns[3:]].columns)
data[attrs] = data[attrs].replace({'\$':'', ',': '', '--':None}, regex=True).astype('float')

data.dtypes

School           object
Conference       object
Coach            object
SchoolPay       float64
TotalPay        float64
Bonus           float64
BonusPaid       float64
AssistantPay    float64
Buyout          float64
dtype: object

data.isnull().sum()

School           0
Conference       0
Coach            0
SchoolPay        4
TotalPay         4
Bonus           22
BonusPaid       41
AssistantPay     0
Buyout          22
dtype: int64

Replacing Missing Values¶

Four options:

Drop the schools with missing values
Drop the attributes with missing values (e.g. 'BonusPaid')
Replace the missing values with some value (e.g. mean, 0)
Find additional data

For V1, we are going to

Drop the schools without SchoolPay
Drop the attribute BonusPaid
Fill Bonus & Buyout with the attribute mean*

For V2, we are going to use #4 and a more advanced version of #3* taking within conference mean

data.dropna(subset=['SchoolPay'], inplace=True)

data.drop('BonusPaid', axis=1, inplace=True)

median_bonus = data['Bonus'].median()
median_buyout = data['Buyout'].median()
data["Bonus"].fillna(median_bonus, inplace = True)
data["Buyout"].fillna(median_buyout, inplace = True)

data.isnull().sum()

School          0
Conference      0
Coach           0
SchoolPay       0
TotalPay        0
Bonus           0
AssistantPay    0
Buyout          0
dtype: int64

	School	Conference	Coach	SchoolPay	TotalPay	Bonus	BonusPaid	AssistantPay	Buyout
0	Air Force	Mt. West	Troy Calhoun	885000	885000	247000	--	$0	--
1	Akron	MAC	Terry Bowden	$411,000	$412,500	$225,000	$50,000	$0	$688,500
2	Alabama	SEC	Nick Saban	$8,307,000	$8,307,000	$1,100,000	$500,000	$0	$33,600,000
3	Alabama at Birmingham	C-USA	Bill Clark	$900,000	$900,000	$950,000	$165,471	$0	$3,847,500
4	Appalachian State	Sun Belt	Scott Satterfield	$712,500	$712,500	$295,000	$145,000	$0	$2,160,417