IST718 | BREAKOUT WEEK 3¶

Reading

What is your asessment of the sustainability journey?

How do consumers make choices? Yesterday? Today? Tomorrow?

What do we need to evaluate a sustainability journey?

What techniques can we use to make a recommendation engine?

Practice

Pick a country - any country - what does the data tell?

Could we build a model for predicting sustainable claim?

Should we build a model for predicting whether a product will result in a sustainable claim?

Develop a graphic to support your team's answer.

import pandas as pd
import numpy as np
from scipy.stats import uniform
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt

O: OBTAIN¶

url = "https://raw.githubusercontent.com/2SUBDA/Breakouts/Week3/Case3SalesProducts.csv"
items = pd.read_csv(url, error_bad_lines=False)

items.head()

items.dtypes

Country                  object
OrderMethod              object
RetailerType             object
ProductLine              object
ProductType              object
Product                  object
Year                      int64
Quarter                  object
SustainableClaim          int64
SustainableMarketing      int64
Revenue                 float64
Quantity                  int64
GrossMargin             float64
dtype: object

np.random.seed(1234)
items['runiform'] = uniform.rvs(loc = 0, scale = 1, size = len(items))
items_train = items[items['runiform'] >= 0.33]
items_test = items[items['runiform'] < 0.33]

my_model = str('Revenue ~ Quantity + SustainableClaim + SustainableMarketing')
train_model_fit = smf.ols(my_model, data = items_train).fit()
print(train_model_fit.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                Revenue   R-squared:                       0.136
Model:                            OLS   Adj. R-squared:                  0.136
Method:                 Least Squares   F-statistic:                     3112.
Date:                Wed, 22 Jan 2020   Prob (F-statistic):               0.00
Time:                        09:30:38   Log-Likelihood:            -7.3728e+05
No. Observations:               59273   AIC:                         1.475e+06
Df Residuals:                   59269   BIC:                         1.475e+06
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
Intercept             2.546e+04    872.335     29.184      0.000    2.37e+04    2.72e+04
Quantity                14.6617      0.166     88.080      0.000      14.335      14.988
SustainableClaim      1.741e+04    749.616     23.228      0.000    1.59e+04    1.89e+04
SustainableMarketing -4625.3628    781.817     -5.916      0.000   -6157.727   -3092.998
==============================================================================
Omnibus:                    61303.368   Durbin-Watson:                   1.235
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          7368770.263
Skew:                           4.961   Prob(JB):                         0.00
Kurtosis:                      56.714   Cond. No.                     8.91e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.91e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

items['Country'].value_counts()

United States     7482
Canada            5923
France            5779
Germany           5397
Japan             5359
United Kingdom    5102
Netherlands       4199
Switzerland       4103
Italy             4018
Austria           3862
Mexico            3845
Belgium           3710
Australia         3665
China             3652
Spain             3557
Singapore         3443
Finland           3409
Korea             3399
Brazil            3288
Sweden            2925
Denmark           2358
Name: Country, dtype: int64

Testing with Denmark¶

denmark = items[items['Country'] == 'Denmark']

denmark['SustainableClaim'].value_counts()

1    1300
0    1058
Name: SustainableClaim, dtype: int64

df = denmark.groupby(['Year','SustainableClaim']).sum()

df.reset_index(inplace=True)

df

df.pivot('Year', 'SustainableClaim', 'Revenue').plot(kind="bar")
plt.title('denmark')

Text(0.5, 1.0, 'denmark')

NOW IN A LOOP!¶

for country in set(items['Country'].values):
    country_df = items[items['Country'] == country]
    df = country_df.groupby(['Year','SustainableClaim']).sum()
    df.reset_index(inplace=True)
    df.pivot('Year', 'SustainableClaim', 'Revenue').plot(kind="bar")
    plt.title(country)
    plt.show()

	Country	OrderMethod	RetailerType	ProductLine	ProductType	Product	Year	Quarter	SustainableClaim	SustainableMarketing	Revenue	Quantity	GrossMargin
0	United States	Fax	Outdoors Shop	Camping Equipment	Cooking Gear	TrailChef Deluxe Cook Set	2012	Q1 2012	1	0	59628.66	489	0.347548
1	United States	Fax	Outdoors Shop	Camping Equipment	Cooking Gear	TrailChef Double Flame	2012	Q1 2012	0	1	35950.32	252	0.474274
2	United States	Fax	Outdoors Shop	Camping Equipment	Tents	Star Dome	2012	Q1 2012	1	0	89940.48	147	0.352772
3	United States	Fax	Outdoors Shop	Camping Equipment	Tents	Star Gazer 2	2012	Q1 2012	1	0	165883.41	303	0.282938
4	United States	Fax	Outdoors Shop	Camping Equipment	Sleeping Bags	Hibernator Lite	2012	Q1 2012	1	0	119822.20	1415	0.291450

	Year	SustainableClaim	SustainableMarketing	Revenue	Quantity	GrossMargin	runiform
0	2012	0	383	4678059.55	170989	204.590129	190.908413
1	2012	1	147	11215207.01	180199	182.929616	246.010735
2	2013	0	439	6881559.00	165780	234.224984	207.095384
3	2013	1	176	13168906.71	211958	187.305995	246.407795
4	2014	0	236	4399289.58	100666	124.233882	123.766425
5	2014	1	102	7555510.08	132507	104.625091	135.537813