HW5 -- Artificial Artificial Intelligence

In [1]:
import pandas as pd
import numpy as np

neg = pd.read_csv('AMT_neg.csv')
pos = pd.read_csv('AMT_pos.csv')

Initial EDA

In [5]:
neg[:3]
Out[5]:
HITId HITTypeId Title Description Keywords Reward CreationTime MaxAssignments RequesterAnnotation AssignmentDurationInSeconds ... RejectionTime RequesterFeedback WorkTimeInSeconds LifetimeApprovalRate Last30DaysApprovalRate Last7DaysApprovalRate Input.text Answer.sentiment.label Approve Reject
0 3IQ9O0AYW6ZI3GD740H32KGG2SWITJ 3N0K7CX2I27L2NR2L8D93MF8LIRA5J Sentiment analysis Sentiment analysis sentiment, text $0.02 Fri Nov 01 12:08:17 PDT 2019 3 BatchId:3821423;OriginalHitTemplateId:928390909; 10800 ... NaN NaN 44 0% (0/0) 0% (0/0) 0% (0/0) Missed Opportunity\nI had been very excited to... Neutral NaN NaN
1 3IQ9O0AYW6ZI3GD740H32KGG2SWITJ 3N0K7CX2I27L2NR2L8D93MF8LIRA5J Sentiment analysis Sentiment analysis sentiment, text $0.02 Fri Nov 01 12:08:17 PDT 2019 3 BatchId:3821423;OriginalHitTemplateId:928390909; 10800 ... NaN NaN 7 0% (0/0) 0% (0/0) 0% (0/0) Missed Opportunity\nI had been very excited to... Negative NaN NaN
2 3IQ9O0AYW6ZI3GD740H32KGG2SWITJ 3N0K7CX2I27L2NR2L8D93MF8LIRA5J Sentiment analysis Sentiment analysis sentiment, text $0.02 Fri Nov 01 12:08:17 PDT 2019 3 BatchId:3821423;OriginalHitTemplateId:928390909; 10800 ... NaN NaN 449 0% (0/0) 0% (0/0) 0% (0/0) Missed Opportunity\nI had been very excited to... Positive NaN NaN

3 rows × 31 columns

In [6]:
pos[:3]
Out[6]:
HITId HITTypeId Title Description Keywords Reward CreationTime MaxAssignments RequesterAnnotation AssignmentDurationInSeconds ... RejectionTime RequesterFeedback WorkTimeInSeconds LifetimeApprovalRate Last30DaysApprovalRate Last7DaysApprovalRate Input.text Answer.sentiment.label Approve Reject
0 3VMV5CHJZ8F47P7CECH0H830NF4GTP 3N0K7CX2I27L2NR2L8D93MF8LIRA5J Sentiment analysis Sentiment analysis sentiment, text $0.02 Fri Nov 01 12:11:19 PDT 2019 3 BatchId:3821427;OriginalHitTemplateId:928390909; 10800 ... NaN NaN 355 0% (0/0) 0% (0/0) 0% (0/0) funny like a clown\nGreetings again from the d... Positive NaN NaN
1 3VMV5CHJZ8F47P7CECH0H830NF4GTP 3N0K7CX2I27L2NR2L8D93MF8LIRA5J Sentiment analysis Sentiment analysis sentiment, text $0.02 Fri Nov 01 12:11:19 PDT 2019 3 BatchId:3821427;OriginalHitTemplateId:928390909; 10800 ... NaN NaN 487 0% (0/0) 0% (0/0) 0% (0/0) funny like a clown\nGreetings again from the d... Neutral NaN NaN
2 3VMV5CHJZ8F47P7CECH0H830NF4GTP 3N0K7CX2I27L2NR2L8D93MF8LIRA5J Sentiment analysis Sentiment analysis sentiment, text $0.02 Fri Nov 01 12:11:19 PDT 2019 3 BatchId:3821427;OriginalHitTemplateId:928390909; 10800 ... NaN NaN 1052 0% (0/0) 0% (0/0) 0% (0/0) funny like a clown\nGreetings again from the d... Positive NaN NaN

3 rows × 31 columns

In [7]:
neg.columns.tolist()
Out[7]:
['HITId',
 'HITTypeId',
 'Title',
 'Description',
 'Keywords',
 'Reward',
 'CreationTime',
 'MaxAssignments',
 'RequesterAnnotation',
 'AssignmentDurationInSeconds',
 'AutoApprovalDelayInSeconds',
 'Expiration',
 'NumberOfSimilarHITs',
 'LifetimeInSeconds',
 'AssignmentId',
 'WorkerId',
 'AssignmentStatus',
 'AcceptTime',
 'SubmitTime',
 'AutoApprovalTime',
 'ApprovalTime',
 'RejectionTime',
 'RequesterFeedback',
 'WorkTimeInSeconds',
 'LifetimeApprovalRate',
 'Last30DaysApprovalRate',
 'Last7DaysApprovalRate',
 'Input.text',
 'Answer.sentiment.label',
 'Approve',
 'Reject']

How many unique turkers worked on each dataframe?

In [31]:
def get_unique(df, column):
    unique = np.unique(df[column], return_counts=True)
    df = pd.DataFrame(zip(unique[0], unique[1]))
    return len(unique[0]), unique, df

num_neg, unique_neg, u_neg_df = get_unique(neg, 'WorkerId')    
num_pos, unique_pos, u_pos_df = get_unique(pos, 'WorkerId')

print(num_neg, 'Turkers worked on NEG batch')
print(num_pos, 'Turkers worked on POS batch')
53 Turkers worked on NEG batch
38 Turkers worked on POS batch

How many HITS did each unique turker do?

In [32]:
u_neg_df.plot(kind='bar',x=0,y=1)
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x11cdcb978>
In [33]:
u_pos_df.plot(kind='bar',x=0,y=1)
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x11c5d1748>

What's the max and min HIT for unique turkers

In [39]:
print('For {}, the min was: {} and the max was: {}'.format('neg', unique_neg[1].min(), unique_neg[1].max())) 
print('For {}, the min was: {} and the max was: {}'.format('pos', unique_pos[1].min(), unique_pos[1].max())) 
For neg, the min was: 1 and the max was: 37
For pos, the min was: 1 and the max was: 40

Did a specitic Sentiment take longer for turkers to assess?

In [20]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.catplot(x="Answer.sentiment.label", 
            y="WorkTimeInSeconds", 
            kind="bar", 
            order=['Negative', 'Neutral', 'Positive'], 
            data=neg);
plt.title('Negative')
Out[20]:
Text(0.5, 1, 'Negative')
In [19]:
sns.catplot(x="Answer.sentiment.label", 
            y="WorkTimeInSeconds", 
            kind="bar", 
            order=['Negative', 'Neutral', 'Positive'], 
            data=pos)
plt.title('Positive')
Out[19]:
Text(0.5, 1, 'Positive')

How many turkers had less than 10 second response time?

In [44]:
response_time = neg[neg['WorkTimeInSeconds'] < 10]
response_time_check = neg[neg['WorkTimeInSeconds'] > 10]
In [45]:
len(response_time)
Out[45]:
48
In [46]:
len(response_time_check)
Out[46]:
312

Checking for potential bots

Did anyone have a consistent average low response time?

In [74]:
count = pos.groupby(['WorkerId'])['HITId'].count()
work_time = pos.groupby(['WorkerId'])['WorkTimeInSeconds'].mean()
new_df = pd.DataFrame([work_time, count]).T
new_df[:5]
Out[74]:
WorkTimeInSeconds HITId
WorkerId
A13CLN8L5HFT46 7.230769 13.0
A18WFPSLFV4FKY 47.000000 2.0
A1IQV3QUWRA8G1 22.000000 1.0
A1N1ULK71RHVMM 10.000000 3.0
A1S2MN0E9BHPVA 173.444444 27.0

Did anyone have a consistent average high response time?

In [75]:
new_df['WorkTimeInMin'] = new_df['WorkTimeInSeconds']/60
new_df[:5]
Out[75]:
WorkTimeInSeconds HITId WorkTimeInMin
WorkerId
A13CLN8L5HFT46 7.230769 13.0 0.120513
A18WFPSLFV4FKY 47.000000 2.0 0.783333
A1IQV3QUWRA8G1 22.000000 1.0 0.366667
A1N1ULK71RHVMM 10.000000 3.0 0.166667
A1S2MN0E9BHPVA 173.444444 27.0 2.890741
In [86]:
count = pos.groupby(['WorkerId', 'Answer.sentiment.label'])['Answer.sentiment.label'].count()
# count = pos.groupby(['WorkerId'])['Answer.sentiment.label'].count()
count
Out[86]:
WorkerId        Answer.sentiment.label
A13CLN8L5HFT46  Neutral                    2
                Positive                  11
A18WFPSLFV4FKY  Positive                   2
A1IQV3QUWRA8G1  Positive                   1
A1N1ULK71RHVMM  Negative                   1
                                          ..
AMC42JMQA8A5U   Positive                   1
AO2WNSGOXAX52   Neutral                    3
                Positive                   1
AOMFEAWQHU3D8   Neutral                    1
                Positive                   6
Name: Answer.sentiment.label, Length: 74, dtype: int64

Did anyone answer ONLY pos/neg/neutral?

In [117]:
pnn = pd.DataFrame()
pnn['Neutral'] = pos.groupby('WorkerId')['Answer.sentiment.label'].apply(lambda x: (x=='Neutral').sum())
pnn['Positive'] = pos.groupby('WorkerId')['Answer.sentiment.label'].apply(lambda x: (x=='Positive').sum())
pnn['Negative'] = pos.groupby('WorkerId')['Answer.sentiment.label'].apply(lambda x: (x=='Negative').sum())
pnn['Total'] = pos.groupby('WorkerId')['Answer.sentiment.label'].apply(lambda x: x.count())
pnn[:5]
Out[117]:
Neutral Positive Negative Total
WorkerId
A13CLN8L5HFT46 2 11 0 13
A18WFPSLFV4FKY 0 2 0 2
A1IQV3QUWRA8G1 0 1 0 1
A1N1ULK71RHVMM 0 2 1 3
A1S2MN0E9BHPVA 2 21 4 27

This is getting a little confusing, let's just look at our top performers

In [122]:
top = pnn.sort_values(by=['Total'], ascending=False)
In [123]:
top[:10]
Out[123]:
Neutral Positive Negative Total
WorkerId
A681XM15AN28F 13 20 7 40
A1Y66T7FKJ8PJA 5 23 7 35
A33ENZVC1XB4BA 0 34 0 34
A1S2MN0E9BHPVA 2 21 4 27
A37L5E8MHHQGZM 6 13 3 22
AE03LUY7RH400 4 10 7 21
A2G44A4ZPWRPXU 4 12 2 18
A1YK1IKACUJMV4 0 15 0 15
A3AW887GI0NLKF 3 10 2 15
A3HAEQW13YPT6A 0 14 0 14

Interesting!! Looking from here, we have three workers who ONLY chose positive.

Let's look at their response time to see if we can determine if they are a bot!!

In [130]:
top['Avg_WorkTimeInSeconds'] = pos.groupby('WorkerId')['WorkTimeInSeconds'].apply(lambda x: x.mean())
top['Avg_WorkTimeInMin'] = pos.groupby('WorkerId')['WorkTimeInSeconds'].apply(lambda x: x.mean()/60)
top['Min_WorkTimeInMin'] = pos.groupby('WorkerId')['WorkTimeInSeconds'].apply(lambda x: x.min()/60)
top['Max_WorkTimeInMin'] = pos.groupby('WorkerId')['WorkTimeInSeconds'].apply(lambda x: x.max()/60)
In [131]:
top[:10]
Out[131]:
Neutral Positive Negative Total Avg_WorkTimeInSeconds Avg_WorkTimeInMin Min_WorkTimeInMin Max_WorkTimeInMin
WorkerId
A681XM15AN28F 13 20 7 40 13.575000 0.226250 0.100000 0.833333
A1Y66T7FKJ8PJA 5 23 7 35 695.857143 11.597619 0.216667 22.000000
A33ENZVC1XB4BA 0 34 0 34 366.647059 6.110784 0.616667 9.916667
A1S2MN0E9BHPVA 2 21 4 27 173.444444 2.890741 0.400000 4.983333
A37L5E8MHHQGZM 6 13 3 22 346.272727 5.771212 2.150000 8.283333
AE03LUY7RH400 4 10 7 21 102.238095 1.703968 0.100000 3.433333
A2G44A4ZPWRPXU 4 12 2 18 221.277778 3.687963 0.383333 7.383333
A1YK1IKACUJMV4 0 15 0 15 593.600000 9.893333 1.716667 11.000000
A3AW887GI0NLKF 3 10 2 15 269.400000 4.490000 1.616667 7.216667
A3HAEQW13YPT6A 0 14 0 14 442.928571 7.382143 0.866667 11.100000

Even more interesting! These two don't appear to be bots, based on our current metric which is time variability.

HOWEVER, worker A681XM15AN28F appears to only work for an average of 13 seconds per review which doesn't seem like enough time to read and judge a review...

PART 2: Second submission to AMT

TOO MANY REVIEWERS!

Here is when we realized that doing a kappa score with over 30 individual reviewers would be tricky, so we rusubmitted to AMT and required the turkers to be 'Master' in the hopes that this additional barrier-to-entry would help reduce the amount of turkers working on the project

In [138]:
v2 = pd.read_csv('HW5_amt_v2.csv')
v2[:5]
len(v2)
Out[138]:
293

This time, I didn't separate the df into pos and neg before submitting to AMT, so we have to reimport the labels.

In [136]:
labels = pd.read_csv('all_JK_extremes_labeled.csv')
In [139]:
len(labels)
Out[139]:
98

Oops! That's right, we replicated each review * 3 so three separate people could look at each review

In [160]:
labels2 = labels.append([labels] * 2, ignore_index=True)
In [161]:
len(labels2)
Out[161]:
294
In [162]:
labels2.sort_values(by='0')
Out[162]:
0 PoN
76 #LetRottenTomatoesRotSquad\nI am a simple guy... P
174 #LetRottenTomatoesRotSquad\nI am a simple guy... P
272 #LetRottenTomatoesRotSquad\nI am a simple guy... P
116 A 'Triumph of the Will' for Nihilists\n'Joker... N
18 A 'Triumph of the Will' for Nihilists\n'Joker... N
... ... ...
227 lose of both time and money\nThis was one of ... N
31 lose of both time and money\nThis was one of ... N
207 poor plot\nPoor plot. i find no reason for jo... N
11 poor plot\nPoor plot. i find no reason for jo... N
109 poor plot\nPoor plot. i find no reason for jo... N

294 rows × 2 columns

Shoot! I realized I had to delete some emojis for the csv to be accepted by AMT, so the reviews themselves won't actually be matching... solution: Create two 'for-matching' columns made up of the first 5 words of each review

In [171]:
v2['for_matching'] = v2.apply(lambda x: x['Input.text'].split()[:5], axis=1)
In [172]:
labels2['for_matching'] = labels2.apply(lambda x: x['0'].split()[:5], axis=1)

Annnnnd why did I do that when I could just sort the df and apply the PoN

In [176]:
sorted_labels = labels2.sort_values(by='0')
sorted_labels[:6]
Out[176]:
0 PoN for_matching
76 #LetRottenTomatoesRotSquad\nI am a simple guy... P [#LetRottenTomatoesRotSquad, I, am, a, simple]
174 #LetRottenTomatoesRotSquad\nI am a simple guy... P [#LetRottenTomatoesRotSquad, I, am, a, simple]
272 #LetRottenTomatoesRotSquad\nI am a simple guy... P [#LetRottenTomatoesRotSquad, I, am, a, simple]
116 A 'Triumph of the Will' for Nihilists\n'Joker... N [A, 'Triumph, of, the, Will']
18 A 'Triumph of the Will' for Nihilists\n'Joker... N [A, 'Triumph, of, the, Will']
214 A 'Triumph of the Will' for Nihilists\n'Joker... N [A, 'Triumph, of, the, Will']
In [226]:
sorted_v2 = v2.sort_values(by='Input.text')
sorted_v2[sorted_v2.columns[-5:]][:6]
Out[226]:
Input.text Answer.sentiment.label Approve Reject for_matching
229 #LetRottenTomatoesRotSquad\nI am a simple guy... Positive NaN NaN [#LetRottenTomatoesRotSquad, I, am, a, simple]
228 #LetRottenTomatoesRotSquad\nI am a simple guy... Positive NaN NaN [#LetRottenTomatoesRotSquad, I, am, a, simple]
227 #LetRottenTomatoesRotSquad\nI am a simple guy... Positive NaN NaN [#LetRottenTomatoesRotSquad, I, am, a, simple]
53 A 'Triumph of the Will' for Nihilists\n'Joker... Neutral NaN NaN [A, 'Triumph, of, the, Will']
55 A 'Triumph of the Will' for Nihilists\n'Joker... Negative NaN NaN [A, 'Triumph, of, the, Will']
54 A 'Triumph of the Will' for Nihilists\n'Joker... Negative NaN NaN [A, 'Triumph, of, the, Will']
In [192]:
all_df = sorted_v2.copy()
# all_df['PoN'] = sorted_labels['PoN'].tolist()
# THIS DIDN'T WORK BECAUSE I DIDN'T WAIT UNTIL ALL WERE DONE FROM AMT. RESEARCHER ERROR BUT OMG I HATE MYSELF
In [193]:
len(all_df)
Out[193]:
293
In [194]:
293/3
Out[194]:
97.66666666666667

Confirming that YEP. 293 isn't divisible by 3, meaning I didn't wait until the last turker finished. omg.

Reuploading now -- WITH BETTER CODE AND BETTER VARIABLE NAMES!

In [224]:
turker = pd.read_csv('HW5_amt_294.csv')
print(len(turker))
turker[turker.columns[-5:]][:5]
294
Out[224]:
Last7DaysApprovalRate Input.text Answer.sentiment.label Approve Reject
0 0% (0/0) Everyone praised an overrated movie.\nOverrat... Negative NaN NaN
1 0% (0/0) Everyone praised an overrated movie.\nOverrat... Negative NaN NaN
2 0% (0/0) Everyone praised an overrated movie.\nOverrat... Negative NaN NaN
3 0% (0/0) What idiotic FIlm\nI can say that Phoenix is ... Negative NaN NaN
4 0% (0/0) What idiotic FIlm\nI can say that Phoenix is ... Negative NaN NaN
In [197]:
# Getting labels...
labels = pd.read_csv('all_JK_extremes_labeled.csv')
# X3
labels = labels.append([labels] * 2, ignore_index=True)
print(len(labels))
labels[:5]
294
Out[197]:
0 PoN
0 Everyone praised an overrated movie.\nOverrat... N
1 What idiotic FIlm\nI can say that Phoenix is ... N
2 Terrible\nThe only thing good about this movi... N
3 Watch Taxi Driver instead\nThis is a poor att... N
4 I learned one thing.\nIt borrows a lot of ele... N

NOW, TO SORT!

In [198]:
sorted_labels = labels.sort_values(by=['0'])
sorted_turker = turker.sort_values(by=['Input.text'])
In [199]:
sorted_labels[:5]
Out[199]:
0 PoN
76 #LetRottenTomatoesRotSquad\nI am a simple guy... P
174 #LetRottenTomatoesRotSquad\nI am a simple guy... P
272 #LetRottenTomatoesRotSquad\nI am a simple guy... P
116 A 'Triumph of the Will' for Nihilists\n'Joker... N
18 A 'Triumph of the Will' for Nihilists\n'Joker... N
In [206]:
sorted_turker['Input.text'][:5]
Out[206]:
228     #LetRottenTomatoesRotSquad\nI am a simple guy...
229     #LetRottenTomatoesRotSquad\nI am a simple guy...
230     #LetRottenTomatoesRotSquad\nI am a simple guy...
56      A 'Triumph of the Will' for Nihilists\n'Joker...
55      A 'Triumph of the Will' for Nihilists\n'Joker...
Name: Input.text, dtype: object

OMG HOORAY HOORAY HOORAY!!

NOTE: FUN FACT!! I can type here and then hit the esc key to turn this cell into markdown!!

In [223]:
# YUCK THIS IS SO AGGRIVATING!! This line below doens't work because it still uses indexes.
# So the P and N didn't match up 
# sorted_turker['PoN'] = sorted_labels['PoN']
sorted_turker['PoN'] = sorted_labels['PoN'].tolist()
sorted_turker[sorted_turker.columns[-5:]][:5]
Out[223]:
Input.text Answer.sentiment.label Approve Reject PoN
228 #LetRottenTomatoesRotSquad\nI am a simple guy... Positive NaN NaN P
229 #LetRottenTomatoesRotSquad\nI am a simple guy... Positive NaN NaN P
230 #LetRottenTomatoesRotSquad\nI am a simple guy... Positive NaN NaN P
56 A 'Triumph of the Will' for Nihilists\n'Joker... Negative NaN NaN N
55 A 'Triumph of the Will' for Nihilists\n'Joker... Negative NaN NaN N

PART 3: ANALYZE

First, let's clean ALL the things

In [237]:
all_df = sorted_turker[['Input.text', 'WorkerId', 'Answer.sentiment.label', 'PoN']]
In [238]:
all_df[:5]
Out[238]:
Input.text WorkerId Answer.sentiment.label PoN
228 #LetRottenTomatoesRotSquad\nI am a simple guy... A681XM15AN28F Positive P
229 #LetRottenTomatoesRotSquad\nI am a simple guy... A2XFO0X6RCS98M Positive P
230 #LetRottenTomatoesRotSquad\nI am a simple guy... AURYD2FH3FUOQ Positive P
56 A 'Triumph of the Will' for Nihilists\n'Joker... A1T79J0XQXDDGC Negative N
55 A 'Triumph of the Will' for Nihilists\n'Joker... A2XFO0X6RCS98M Negative N
In [242]:
all_df_all = all_df.copy()
all_df_all['APoN'] = all_df_all.apply(lambda x: x['Answer.sentiment.label'][0], axis=1)
In [243]:
all_df_all
Out[243]:
Input.text WorkerId Answer.sentiment.label PoN APoN
228 #LetRottenTomatoesRotSquad\nI am a simple guy... A681XM15AN28F Positive P P
229 #LetRottenTomatoesRotSquad\nI am a simple guy... A2XFO0X6RCS98M Positive P P
230 #LetRottenTomatoesRotSquad\nI am a simple guy... AURYD2FH3FUOQ Positive P P
56 A 'Triumph of the Will' for Nihilists\n'Joker... A1T79J0XQXDDGC Negative N N
55 A 'Triumph of the Will' for Nihilists\n'Joker... A2XFO0X6RCS98M Negative N N
... ... ... ... ... ...
265 Venice 76 review\nI have just watched the Joke... ARLGZWN6W91WD Positive N P
266 Venice 76 review\nI have just watched the Joke... A38DC3BG1ZCVZ2 Positive N P
93 lose of both time and money\nThis was one of t... A2XFO0X6RCS98M Negative N N
94 lose of both time and money\nThis was one of t... A3EZ0H07TSDAPW Negative N N
95 lose of both time and money\nThis was one of t... ASB8T0H7L99RF Negative N N

294 rows × 5 columns

In [244]:
all_df_all['agree'] = all_df_all.apply(lambda x: x['PoN'] == x['APoN'], axis=1)
In [263]:
all_df_all[-10:]
Out[263]:
Input.text WorkerId Answer.sentiment.label PoN APoN agree
38 This is extremely bad...\nThis whole film make... A3EZ0H07TSDAPW Negative N N True
216 Took my 65 year old mother to see it.\nI saw t... A3EZ0H07TSDAPW Positive N P False
217 Took my 65 year old mother to see it.\nI saw t... A2XFO0X6RCS98M Positive N P False
218 Took my 65 year old mother to see it.\nI saw t... AKSJ3C5O3V9RB Positive N P False
264 Venice 76 review\nI have just watched the Joke... A3EZ0H07TSDAPW Positive N P False
265 Venice 76 review\nI have just watched the Joke... ARLGZWN6W91WD Positive N P False
266 Venice 76 review\nI have just watched the Joke... A38DC3BG1ZCVZ2 Positive N P False
93 lose of both time and money\nThis was one of t... A2XFO0X6RCS98M Negative N N True
94 lose of both time and money\nThis was one of t... A3EZ0H07TSDAPW Negative N N True
95 lose of both time and money\nThis was one of t... ASB8T0H7L99RF Negative N N True

Lets see how many agree!

In [361]:
gdf = pd.DataFrame(all_df_all.groupby(['Input.text','PoN'])['agree'].mean())
gdf_forplot = gdf.copy()
gdf
Out[361]:
agree
Input.text PoN
#LetRottenTomatoesRotSquad\nI am a simple guy. I watch films, I enjoy life's simple pleasures.One thing I despise doing in life (because it feels pointless) is being far too judgemental or complex as I watch a film that is portraying a message or story in its own way.Will I compare Rambo to Ace Ventura? No. Will I compare Saving Private Ryan to Endgame? No. I watch each film individually and if it completes its own mission of what it wanted to deliver, then I'm a happy bunny. I will not let life's current politics, or the sensitivity of the pathetic world's expectations to override a successful STORY. For me, a story is the most important part of a film personally.With that being said... JOKER is a successful STORY. In fact, I will call it a masterpiece because of how it delivers the realistic, sympathetic character build ups of one of the most popular villains in the history of time and space.And one of the main reasons I became attached and cared for the character was because of PHOENIX. What a fantastic actor. He nailed it. That is all I need to say. I absolutely loved the Dark Knight Trilogy and Heath Ledger's Joker... But where Ledger successfully gave us a realistic Joker for the first time, PHOENIX has now taken that championship with respect. PHOENIX has now given us the same in his own way, with a deeper and more engrossing background. Additionally, with an outstanding Joker performance (the dancing and laughing)... Absolutely spot on.The mystery of the Joker is still retained in this movie... And even though its an "Origins" story... I still feel like The Joker is a complete mystery.PHOENIX is in almost every scene, and its because of him and his execution and commitment to the role which makes this movie a standalone masterpiece. And it brings the respect back to The Joker character after that terrible representation of... What's his name again?If you want a super hero movie, this movie is not that movie. Joker's mission as a movie is to take the audience on a crazy trip inside the mind of the Joker... Its not about cinematics... Or explosions... Or gadgets... Or casting... It's about "Why?"And now I see "Why?"So now ask yourself... "Why so serious?" P 1.000000
A 'Triumph of the Will' for Nihilists\n'Joker' is a sick, disgusting, dark, evil, violent, total piece of scheisse. It will seriously affect emotionally troubled young people already pulled at by suicide, addictive behaviors, and hopelessness. It's like a 'Triumph of the Will' for nihilists. I would defer from voting for Joaquin Phoenix for best actor until i see video of his normal non-acting self...it may be he wasn't acting in 'Joker' at all and is one majorly screwed up dude. And, what a weird alien body. He's naked from the waist up a lot in 'Joker' and, from the waist up, is, let's say, disquieting. All taut skin stretched tightly over an alien bone structure. The bones stick up and out at odd angles, in odd places, like the exoskeleton of a medical experiment gone wrong. Not a single pleasant moment in the movie. Not a moment of joy, nor beauty, nor happiness. No catharsis from all the misery and violence, just a stomach churning emptiness at the thought that the last fifty years have come down to this, and this is a circle of despair. A movie without hope, without humanity, showing a world that no sane person would live in. N 1.000000
A Breath of Fresh Cinema\nBursting with emotion. This film has won me over and dug me out of a ditch of cheap CGI fan fare.Absolutely outstanding. Really well filmed. It's pace, tone and character impressions really pay off. Most of the time you like your sat on the edge of Joaquin's feelings.Joker is a classic. And a cinema masterpiece. It was beautifully filmed. P 1.000000
A MASTERPIECE\nJoaquin Phoenix's performance is different. He is over acting some times. Phoenix is not being true to the character. He is what he is because what he experienced. The childhood trauma and the time he spend in the asylum. If He is out in the society because he is fit to do so. The seven medication doesn't seem to have a bit of effect on him.After the metro incident he is breaking a chain to free himself from what the society need him to be. There is no change in his ways. The character didn't evolve. He is same at the beginning and at the end.Seventy percent of the film's screen time is dedicated to Arthur Fleck. All the other characters are ill written, only to support him. They all lack a soul. Even Robert deniero disappoints us. He lost his aura somewhere in the nineties.\nThe film is spoon feeding us. The viewers are not idiots we know what realism is. The scene where he realizes the character Sophie is his figment of imagination is an utter failure. Apology to Andrei tarskovski. And the scene he is showing fatherly affection towards Murray is like a worst of the worst psycho moment.\nAfter the incident of metro killings, the plot loses its grip. I clearly couldn't define what happening and why. The script hides in the smoke of violence. That is too disturbing to watch. And I forget to mention the letter. A Pathetic turn of the film. The film 'saw' is better than this.\nI hate the communist tone used in the film. By showing us a glimpse of modern times, Todd Phillips making a point. The joker is saying he is not political. It is kind of a doublespeak. In fact the film is a doublespeak. Creating chaos without the intention of creating one. The poor class killing the rich for the societies inequality.What I loved in the Christopher Nolan's joker is his philosophy. He is an agent of chaos. He like to watch the world burn. While the Batman is an agent of order. He may be a clown but he is deep inside. The Nolan's joker can make an elaborate plan and execute it efficiently. But Philp's joker is mere a delusional shallow clown. I ask myself, Why the movie is called 'Joker'. Remove the names of Wayne family and give them another names. Replace Gotham by Manhattan. The result will not be a DC joker instead it will give us a psycho clown movie. Todd Philips used an icon for his own gain.A predictable ending. Giving the society a false message.for what? The film is not worth it. Yes, it is a masterpiece. Whisper it a trillion times, may be it will became one. N 0.333333
A brilliant movie\nThis movie is slow but never makes you feel bored. The story is narrated in a wonderful manner. This movie sets a good example for others comic based movies that how can a negative character can be portrayed in such a manner that you though know joker in not a good man but will be feeling bad for him at some point.\nThe acting done by Joaquin Phoenix who played the role of joker was amazing. He was flawless and did proper justice with the character of joker which very few actors were able to do. Joaquin Phoenix is a brilliant method actor.\nThis is a movie made to see in movie theatre not at tv or any streaming app. P 1.000000
... ... ...
The mirror of society\nActing 10/10\nActors 10/10\nSound 8/10\nSoundtrack 9/10\nDolby Atmos 6/10\nDrama 9/10\nAction 3/10\nSuspense 9/10\nStory 10/10Conclusion:\nWhen you ever want to know why exist joker and batman you must see this film.Jaquin phoenix play his role very intense he play the feelings of the character authentic i never think about that the character is not real i look outside and i can find one normal person who can have the problems from the playing character. Joker is more than a dc story joker is the mirror of the society. N 0.000000
This is extremely bad...\nThis whole film makes no sense, it's, apperently, completely overhyped.\nI went to the theatre without any expectations, prejudice, etc. but man, what a bunch of...\nThis is my first review on IMDB, I had to join, seeing this film was rated a 9.0 N 1.000000
Took my 65 year old mother to see it.\nI saw the movie after the opening weekend and loved it. I think it's a masterpiece. Convinced my 65 year old mother too see it, who couldn't even remember whose nemesis the Joker is. She had read about all the negative press about the movie being too violent and taking light of mental health issues.She loved the movie. Sympathized for Arthur and was rooting for the Joker (to a certain point) She was surprised about the small amount of violence is depicted after all the press putdown and enjoyed the "social commentary" the movie conveys.Her score: 9 / 10Mom is always right. N 0.000000
Venice 76 review\nI have just watched the Joker in Venice and I will say if Joaquin doesn't get an Oscar this year then something is wrong with this world. This perfomance is just jaw-dropping, it glues you to the screen and doesn't let go till the end. Story is very good and has some interesting connections with Batman lore(especially one you can't guess from trailers). There are some scenes that are so tense ,well-acted and imaginative that push this movie to 10. Cinematography and direction are great, Todd has proven himself as a director. Robert is also good in his "small" role. Will definitely see it again as soon as it out in October. N 0.000000
lose of both time and money\nThis was one of the worse movies I have seen. It was over acted by all actors, the story dragged on and on. I saw this movie for a bargain matinee and thought I was ripped off! The only good part was the comfy chairs and the half priced snacks!\nPhoenix was so over the top and not in a good way. PHOENIX STOP DANCING!!! Deniro played another version of all the parts he has played in the past few years!\n2 stars for the chair and 1 star for the snacks, -1 stars for wasting my day! N 1.000000

98 rows × 1 columns

OK so this actually gave us something we want... BUT PLEASE TELL ME THE BETTER WAY!!

In [362]:
three_agreed = gdf[gdf['agree'] == 1]
len(three_agreed)
Out[362]:
33
In [363]:
three_agreed_but_wrong = gdf[gdf['agree'] == 0]
len(three_agreed_but_wrong)
Out[363]:
31
In [364]:
disparity = gdf[(gdf['agree'] > 0) & (gdf['agree'] < 1)]
len(disparity)
Out[364]:
34
In [365]:
quickdf = pd.DataFrame({'labels': ['agreed', 'agreed, incorrect', 'disparity'], 'counts': [33,31,34]})
quickdf
Out[365]:
labels counts
0 agreed 33
1 agreed, incorrect 31
2 disparity 34
In [366]:
quickdf.plot(kind='bar', x='labels', y='counts')
Out[366]:
<matplotlib.axes._subplots.AxesSubplot at 0x1203e7198>

Lol that is not super useful

In [367]:
# # three_agreed = gdf[(gdf['agree'] == 1) & (gdf['PoN'] == 'P')]
# # len(three_agreed)
# # gdf['PoN']
# gdf = gdf.reset_index()
# p_three_agreed = gdf[(gdf['agree'] == 1) & (gdf['PoN'] == 'P')]
# len(p_three_agreed)
In [368]:
# gdf = gdf.reset_index(drop=True)
# n_three_agreed = gdf[(gdf['agree'] == 1) & (gdf['PoN'] == 'N')]
# len(n_three_agreed)
In [369]:
# gdf = gdf.reset_index(drop=True)
# p_three_agreed_wrong = gdf[(gdf['agree'] == 0) & (gdf['PoN'] == 'P')]
# len(p_three_agreed_wrong)
In [370]:
# gdf = gdf.reset_index(drop=True)
# n_three_agreed_wrong = gdf[(gdf['agree'] == 0) & (gdf['PoN'] == 'N')]
# len(n_three_agreed_wrong)
In [371]:
# gdf = gdf.reset_index(drop=True)
# p_disparity = gdf[((gdf['agree'] / 1) != 0 ) & (gdf['PoN'] == 'P')]
# len(p_disparity)
In [372]:
# gdf = gdf.reset_index(drop=True)
# n_disparity = gdf[(gdf['agree'] / 1 != 0 ) & (gdf['PoN'] == 'N')]
# len(n_disparity)
In [373]:
# quickdf = pd.DataFrame({'labels': ['positive', 'negative'], 'counts': [18,15]})
# quickdf.plot(kind='bar', x='labels', y='counts')
In [374]:
# sns.catplot(x="Answer.sentiment.label", 
#             y="WorkTimeInSeconds", 
#             kind="bar", 
#             order=['Negative', 'Neutral', 'Positive'], 
#             data=gdp_forplot);
# plt.title('By Polarity')
gdf_forplot = gdf_forplot.reset_index()
In [380]:
gdf_forplot.groupby(['agree'])['agree'].count()
Out[380]:
agree
0.000000    31
0.333333    17
0.666667    17
1.000000    33
Name: agree, dtype: int64
In [382]:
gdf_forplot.groupby(['agree','PoN'])['agree'].count()
Out[382]:
agree     PoN
0.000000  N      14
          P      17
0.333333  N      10
          P       7
0.666667  N       9
          P       8
1.000000  N      15
          P      18
Name: agree, dtype: int64
In [ ]:
 
In [ ]: