01-23-20
import pandas as pd
data = pd.read_csv("Coaches9.csv")
data.head()
attrs = list(data[data.columns[3:]].columns)
data[attrs] = data[attrs].replace({'\$':'', ',': '', '--':None}, regex=True).astype('float')
data.dtypes
data.isnull().sum()
Four options:
For V1, we are going to
SchoolPay
BonusPaid
Bonus
& Buyout
with the attribute mean*For V2, we are going to use #4 and a more advanced version of #3* taking within conference mean
data.dropna(subset=['SchoolPay'], inplace=True)
data.drop('BonusPaid', axis=1, inplace=True)
median_bonus = data['Bonus'].median()
median_buyout = data['Buyout'].median()
data["Bonus"].fillna(median_bonus, inplace = True)
data["Buyout"].fillna(median_buyout, inplace = True)
data.isnull().sum()