Reading
What is your asessment of the sustainability journey?
How do consumers make choices? Yesterday? Today? Tomorrow?
What do we need to evaluate a sustainability journey?
What techniques can we use to make a recommendation engine?
Practice
Pick a country - any country - what does the data tell?Â
Could we build a model for predicting sustainable claim?
Should we build a model for predicting whether a product will result in a sustainable claim?
Develop a graphic to support your team's answer.
import pandas as pd
import numpy as np
from scipy.stats import uniform
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt
url = "https://raw.githubusercontent.com/2SUBDA/Breakouts/Week3/Case3SalesProducts.csv"
items = pd.read_csv(url, error_bad_lines=False)
items.head()
items.dtypes
np.random.seed(1234)
items['runiform'] = uniform.rvs(loc = 0, scale = 1, size = len(items))
items_train = items[items['runiform'] >= 0.33]
items_test = items[items['runiform'] < 0.33]
my_model = str('Revenue ~ Quantity + SustainableClaim + SustainableMarketing')
train_model_fit = smf.ols(my_model, data = items_train).fit()
print(train_model_fit.summary())
items['Country'].value_counts()
denmark = items[items['Country'] == 'Denmark']
denmark['SustainableClaim'].value_counts()
df = denmark.groupby(['Year','SustainableClaim']).sum()
df.reset_index(inplace=True)
df
df.pivot('Year', 'SustainableClaim', 'Revenue').plot(kind="bar")
plt.title('denmark')
for country in set(items['Country'].values):
country_df = items[items['Country'] == country]
df = country_df.groupby(['Year','SustainableClaim']).sum()
df.reset_index(inplace=True)
df.pivot('Year', 'SustainableClaim', 'Revenue').plot(kind="bar")
plt.title(country)
plt.show()