VECTORIZATION (Pandas style!)

STEP 1: Import ALL the things

Import libraries

In [153]:
##########################################
# NOTE: I'm toying with the idea of requiring the library just above 
# when I use it so it makes more sense in context
##########################################
# import os
# import pandas as pd
# from nltk.tokenize import word_tokenize, sent_tokenize
# from nltk.sentiment import SentimentAnalyzer
# from nltk.sentiment.util import *
# from nltk.probability import FreqDist
# from nltk.sentiment.vader import SentimentIntensityAnalyzer
# sid = SentimentIntensityAnalyzer()

Import data from files

In [154]:
import os
def get_data_from_files(path):
    directory = os.listdir(path)
    results = []
    for file in directory:
        f=open(path+file)
        results.append(f.read())
        f.close()
    return results

neg = get_data_from_files('../neg_cornell/')
pos = get_data_from_files('../pos_cornell/')

STEP 2a: Turn that fresh text into a pandas DF

In [155]:
import pandas as pd
neg_df = pd.DataFrame(neg)
pos_df = pd.DataFrame(pos)

STEP 2b: Label it

In [156]:
pos_df['PoN'] = 'P'
neg_df['PoN'] = 'N'

STEP 2c: Combine the dfs

In [157]:
all_df = neg_df.append(pos_df)
In [158]:
all_df
Out[158]:
0 PoN
0 bad . bad . \nbad . \nthat one word seems to p... N
1 isn't it the ultimate sign of a movie's cinema... N
2 " gordy " is not a movie , it is a 90-minute-... N
3 disconnect the phone line . \ndon't accept the... N
4 when robert forster found himself famous again... N
... ... ...
995 one of the funniest carry on movies and the th... P
996 i remember making a pact , right after `patch ... P
997 barely scrapping by playing at a nyc piano bar... P
998 if the current trends of hollywood filmmaking ... P
999 capsule : the director of cure brings a weird ... P

2000 rows × 2 columns

STEP 3: TOKENIZE (and clean)!!

In [159]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *
In [160]:
## Came back and added sentences for tokinization for "Summary experiment"
def get_sentence_tokens(review):
    return sent_tokenize(review)
    
all_df['sentences'] = all_df.apply(lambda x: get_sentence_tokens(x[0]), axis=1)
all_df['num_sentences'] = all_df.apply(lambda x: len(x['sentences']), axis=1)
In [161]:
def get_tokens(sentence):
    tokens = word_tokenize(sentence)
    clean_tokens = [word.lower() for word in tokens if word.isalpha()]
    return clean_tokens

all_df['tokens'] = all_df.apply(lambda x: get_tokens(x[0]), axis=1)
all_df['num_tokens'] = all_df.apply(lambda x: len(x['tokens']), axis=1)
In [162]:
all_df
Out[162]:
0 PoN sentences num_sentences tokens num_tokens
0 bad . bad . \nbad . \nthat one word seems to p... N [bad ., bad ., bad ., that one word seems to p... 67 [bad, bad, bad, that, one, word, seems, to, pr... 1071
1 isn't it the ultimate sign of a movie's cinema... N [isn't it the ultimate sign of a movie's cinem... 32 [is, it, the, ultimate, sign, of, a, movie, ci... 553
2 " gordy " is not a movie , it is a 90-minute-... N [ " gordy " is not a movie , it is a 90-minute... 23 [gordy, is, not, a, movie, it, is, a, sesame, ... 478
3 disconnect the phone line . \ndon't accept the... N [disconnect the phone line ., don't accept the... 37 [disconnect, the, phone, line, do, accept, the... 604
4 when robert forster found himself famous again... N [when robert forster found himself famous agai... 29 [when, robert, forster, found, himself, famous... 386
... ... ... ... ... ... ...
995 one of the funniest carry on movies and the th... P [one of the funniest carry on movies and the t... 25 [one, of, the, funniest, carry, on, movies, an... 434
996 i remember making a pact , right after `patch ... P [i remember making a pact , right after `patch... 40 [i, remember, making, a, pact, right, after, p... 652
997 barely scrapping by playing at a nyc piano bar... P [barely scrapping by playing at a nyc piano ba... 23 [barely, scrapping, by, playing, at, a, nyc, p... 345
998 if the current trends of hollywood filmmaking ... P [if the current trends of hollywood filmmaking... 34 [if, the, current, trends, of, hollywood, film... 730
999 capsule : the director of cure brings a weird ... P [capsule : the director of cure brings a weird... 45 [capsule, the, director, of, cure, brings, a, ... 641

2000 rows × 6 columns

STEP 4: Remove Stopwords

In [163]:
from nltk.corpus import stopwords
stop_words = set(stopwords.words("english"))
def remove_stopwords(sentence):
    filtered_text = []
    for word in sentence:
        if word not in stop_words:
            filtered_text.append(word)
    return filtered_text
all_df['no_sw'] = all_df.apply(lambda x: remove_stopwords(x['tokens']),axis=1)
all_df['num_no_sw'] = all_df.apply(lambda x: len(x['no_sw']),axis=1)
In [164]:
all_df
Out[164]:
0 PoN sentences num_sentences tokens num_tokens no_sw num_no_sw
0 bad . bad . \nbad . \nthat one word seems to p... N [bad ., bad ., bad ., that one word seems to p... 67 [bad, bad, bad, that, one, word, seems, to, pr... 1071 [bad, bad, bad, one, word, seems, pretty, much... 515
1 isn't it the ultimate sign of a movie's cinema... N [isn't it the ultimate sign of a movie's cinem... 32 [is, it, the, ultimate, sign, of, a, movie, ci... 553 [ultimate, sign, movie, cinematic, ineptitude,... 297
2 " gordy " is not a movie , it is a 90-minute-... N [ " gordy " is not a movie , it is a 90-minute... 23 [gordy, is, not, a, movie, it, is, a, sesame, ... 478 [gordy, movie, sesame, street, skit, bad, one,... 239
3 disconnect the phone line . \ndon't accept the... N [disconnect the phone line ., don't accept the... 37 [disconnect, the, phone, line, do, accept, the... 604 [disconnect, phone, line, accept, charges, any... 323
4 when robert forster found himself famous again... N [when robert forster found himself famous agai... 29 [when, robert, forster, found, himself, famous... 386 [robert, forster, found, famous, appearing, ja... 185
... ... ... ... ... ... ... ... ...
995 one of the funniest carry on movies and the th... P [one of the funniest carry on movies and the t... 25 [one, of, the, funniest, carry, on, movies, an... 434 [one, funniest, carry, movies, third, medical,... 241
996 i remember making a pact , right after `patch ... P [i remember making a pact , right after `patch... 40 [i, remember, making, a, pact, right, after, p... 652 [remember, making, pact, right, patch, adams, ... 361
997 barely scrapping by playing at a nyc piano bar... P [barely scrapping by playing at a nyc piano ba... 23 [barely, scrapping, by, playing, at, a, nyc, p... 345 [barely, scrapping, playing, nyc, piano, bar, ... 177
998 if the current trends of hollywood filmmaking ... P [if the current trends of hollywood filmmaking... 34 [if, the, current, trends, of, hollywood, film... 730 [current, trends, hollywood, filmmaking, conti... 428
999 capsule : the director of cure brings a weird ... P [capsule : the director of cure brings a weird... 45 [capsule, the, director, of, cure, brings, a, ... 641 [capsule, director, cure, brings, weird, compl... 340

2000 rows × 8 columns

STEP 5: Create a Frequency Distribution

In [165]:
from nltk.probability import FreqDist
def get_most_common(tokens):
    fdist = FreqDist(tokens)
    return fdist.most_common(12)
all_df['topwords_unfil'] = all_df.apply(lambda x: get_most_common(x['tokens']),axis=1)
In [166]:
def get_most_common(tokens):
    fdist = FreqDist(tokens)
    return fdist.most_common(12)
all_df['topwords_fil'] = all_df.apply(lambda x: get_most_common(x['no_sw']),axis=1)

all_df
Out[166]:
0 PoN sentences num_sentences tokens num_tokens no_sw num_no_sw topwords_unfil topwords_fil
0 bad . bad . \nbad . \nthat one word seems to p... N [bad ., bad ., bad ., that one word seems to p... 67 [bad, bad, bad, that, one, word, seems, to, pr... 1071 [bad, bad, bad, one, word, seems, pretty, much... 515 [(the, 60), (a, 35), (to, 34), (of, 24), (this... [(movie, 17), (bad, 8), (one, 7), (meyer, 6), ...
1 isn't it the ultimate sign of a movie's cinema... N [isn't it the ultimate sign of a movie's cinem... 32 [is, it, the, ultimate, sign, of, a, movie, ci... 553 [ultimate, sign, movie, cinematic, ineptitude,... 297 [(the, 28), (a, 18), (of, 16), (to, 14), (i, 1... [(movie, 7), (one, 6), (first, 5), (much, 4), ...
2 " gordy " is not a movie , it is a 90-minute-... N [ " gordy " is not a movie , it is a 90-minute... 23 [gordy, is, not, a, movie, it, is, a, sesame, ... 478 [gordy, movie, sesame, street, skit, bad, one,... 239 [(the, 25), (and, 21), (to, 18), (is, 17), (a,... [(gordy, 8), (movie, 5), (one, 4), (stupid, 4)...
3 disconnect the phone line . \ndon't accept the... N [disconnect the phone line ., don't accept the... 37 [disconnect, the, phone, line, do, accept, the... 604 [disconnect, phone, line, accept, charges, any... 323 [(the, 41), (of, 17), (a, 17), (to, 16), (and,... [(hanging, 9), (sisters, 5), (ryan, 4), (time,...
4 when robert forster found himself famous again... N [when robert forster found himself famous agai... 29 [when, robert, forster, found, himself, famous... 386 [robert, forster, found, famous, appearing, ja... 185 [(the, 21), (it, 11), (i, 10), (to, 10), (of, ... [(film, 5), (movie, 5), (american, 4), (perfek...
... ... ... ... ... ... ... ... ... ... ...
995 one of the funniest carry on movies and the th... P [one of the funniest carry on movies and the t... 25 [one, of, the, funniest, carry, on, movies, an... 434 [one, funniest, carry, movies, third, medical,... 241 [(the, 26), (and, 21), (of, 11), (a, 10), (is,... [(nookey, 9), (hawtrey, 5), (carry, 4), (dr, 4...
996 i remember making a pact , right after `patch ... P [i remember making a pact , right after `patch... 40 [i, remember, making, a, pact, right, after, p... 652 [remember, making, pact, right, patch, adams, ... 361 [(the, 44), (of, 29), (and, 19), (a, 15), (it,... [(music, 8), (heart, 7), (craven, 6), (movie, ...
997 barely scrapping by playing at a nyc piano bar... P [barely scrapping by playing at a nyc piano ba... 23 [barely, scrapping, by, playing, at, a, nyc, p... 345 [barely, scrapping, playing, nyc, piano, bar, ... 177 [(a, 23), (is, 16), (the, 13), (and, 10), (of,... [(like, 4), (hutton, 3), (old, 3), (high, 2), ...
998 if the current trends of hollywood filmmaking ... P [if the current trends of hollywood filmmaking... 34 [if, the, current, trends, of, hollywood, film... 730 [current, trends, hollywood, filmmaking, conti... 428 [(the, 49), (of, 31), (and, 19), (in, 18), (to... [(one, 7), (like, 5), (l, 5), (hollywood, 4), ...
999 capsule : the director of cure brings a weird ... P [capsule : the director of cure brings a weird... 45 [capsule, the, director, of, cure, brings, a, ... 641 [capsule, director, cure, brings, weird, compl... 340 [(the, 33), (to, 28), (and, 21), (a, 18), (of,... [(computer, 11), (kurosawa, 8), (one, 5), (see...

2000 rows × 10 columns

In [167]:
def get_fdist(tokens):
    return (FreqDist(tokens))
    
all_df['freq_dist'] = all_df.apply(lambda x: get_fdist(x['no_sw']),axis=1)

STEP 6: Try Different Sentiment Analysis Tools

VADER

In [168]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
def get_vader_score(review):
    return sid.polarity_scores(review)

all_df['vader_all'] = all_df.apply(lambda x: get_vader_score(x[0]),axis=1)
In [169]:
def separate_vader_score(vader_score, key):
    return vader_score[key]

all_df['v_compound'] = all_df.apply(lambda x: separate_vader_score(x['vader_all'], 'compound'),axis=1)
all_df['v_neg'] = all_df.apply(lambda x: separate_vader_score(x['vader_all'], 'neg'),axis=1)
all_df['v_neu'] = all_df.apply(lambda x: separate_vader_score(x['vader_all'], 'neu'),axis=1)
all_df['v_pos'] = all_df.apply(lambda x: separate_vader_score(x['vader_all'], 'pos'),axis=1)
In [170]:
all_df
Out[170]:
0 PoN sentences num_sentences tokens num_tokens no_sw num_no_sw topwords_unfil topwords_fil freq_dist vader_all v_compound v_neg v_neu v_pos
0 bad . bad . \nbad . \nthat one word seems to p... N [bad ., bad ., bad ., that one word seems to p... 67 [bad, bad, bad, that, one, word, seems, to, pr... 1071 [bad, bad, bad, one, word, seems, pretty, much... 515 [(the, 60), (a, 35), (to, 34), (of, 24), (this... [(movie, 17), (bad, 8), (one, 7), (meyer, 6), ... {'bad': 8, 'one': 7, 'word': 1, 'seems': 1, 'p... {'neg': 0.134, 'neu': 0.715, 'pos': 0.151, 'co... 0.9695 0.134 0.715 0.151
1 isn't it the ultimate sign of a movie's cinema... N [isn't it the ultimate sign of a movie's cinem... 32 [is, it, the, ultimate, sign, of, a, movie, ci... 553 [ultimate, sign, movie, cinematic, ineptitude,... 297 [(the, 28), (a, 18), (of, 16), (to, 14), (i, 1... [(movie, 7), (one, 6), (first, 5), (much, 4), ... {'ultimate': 1, 'sign': 1, 'movie': 7, 'cinema... {'neg': 0.135, 'neu': 0.729, 'pos': 0.136, 'co... 0.1722 0.135 0.729 0.136
2 " gordy " is not a movie , it is a 90-minute-... N [ " gordy " is not a movie , it is a 90-minute... 23 [gordy, is, not, a, movie, it, is, a, sesame, ... 478 [gordy, movie, sesame, street, skit, bad, one,... 239 [(the, 25), (and, 21), (to, 18), (is, 17), (a,... [(gordy, 8), (movie, 5), (one, 4), (stupid, 4)... {'gordy': 8, 'movie': 5, 'sesame': 1, 'street'... {'neg': 0.185, 'neu': 0.74, 'pos': 0.075, 'com... -0.9970 0.185 0.740 0.075
3 disconnect the phone line . \ndon't accept the... N [disconnect the phone line ., don't accept the... 37 [disconnect, the, phone, line, do, accept, the... 604 [disconnect, phone, line, accept, charges, any... 323 [(the, 41), (of, 17), (a, 17), (to, 16), (and,... [(hanging, 9), (sisters, 5), (ryan, 4), (time,... {'disconnect': 1, 'phone': 2, 'line': 1, 'acce... {'neg': 0.101, 'neu': 0.744, 'pos': 0.155, 'co... 0.9861 0.101 0.744 0.155
4 when robert forster found himself famous again... N [when robert forster found himself famous agai... 29 [when, robert, forster, found, himself, famous... 386 [robert, forster, found, famous, appearing, ja... 185 [(the, 21), (it, 11), (i, 10), (to, 10), (of, ... [(film, 5), (movie, 5), (american, 4), (perfek... {'robert': 2, 'forster': 3, 'found': 1, 'famou... {'neg': 0.073, 'neu': 0.843, 'pos': 0.083, 'co... 0.7445 0.073 0.843 0.083
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 one of the funniest carry on movies and the th... P [one of the funniest carry on movies and the t... 25 [one, of, the, funniest, carry, on, movies, an... 434 [one, funniest, carry, movies, third, medical,... 241 [(the, 26), (and, 21), (of, 11), (a, 10), (is,... [(nookey, 9), (hawtrey, 5), (carry, 4), (dr, 4... {'one': 1, 'funniest': 1, 'carry': 4, 'movies'... {'neg': 0.056, 'neu': 0.799, 'pos': 0.145, 'co... 0.9913 0.056 0.799 0.145
996 i remember making a pact , right after `patch ... P [i remember making a pact , right after `patch... 40 [i, remember, making, a, pact, right, after, p... 652 [remember, making, pact, right, patch, adams, ... 361 [(the, 44), (of, 29), (and, 19), (a, 15), (it,... [(music, 8), (heart, 7), (craven, 6), (movie, ... {'remember': 1, 'making': 1, 'pact': 1, 'right... {'neg': 0.072, 'neu': 0.734, 'pos': 0.195, 'co... 0.9985 0.072 0.734 0.195
997 barely scrapping by playing at a nyc piano bar... P [barely scrapping by playing at a nyc piano ba... 23 [barely, scrapping, by, playing, at, a, nyc, p... 345 [barely, scrapping, playing, nyc, piano, bar, ... 177 [(a, 23), (is, 16), (the, 13), (and, 10), (of,... [(like, 4), (hutton, 3), (old, 3), (high, 2), ... {'barely': 1, 'scrapping': 1, 'playing': 1, 'n... {'neg': 0.069, 'neu': 0.719, 'pos': 0.212, 'co... 0.9964 0.069 0.719 0.212
998 if the current trends of hollywood filmmaking ... P [if the current trends of hollywood filmmaking... 34 [if, the, current, trends, of, hollywood, film... 730 [current, trends, hollywood, filmmaking, conti... 428 [(the, 49), (of, 31), (and, 19), (in, 18), (to... [(one, 7), (like, 5), (l, 5), (hollywood, 4), ... {'current': 1, 'trends': 1, 'hollywood': 4, 'f... {'neg': 0.095, 'neu': 0.723, 'pos': 0.182, 'co... 0.9975 0.095 0.723 0.182
999 capsule : the director of cure brings a weird ... P [capsule : the director of cure brings a weird... 45 [capsule, the, director, of, cure, brings, a, ... 641 [capsule, director, cure, brings, weird, compl... 340 [(the, 33), (to, 28), (and, 21), (a, 18), (of,... [(computer, 11), (kurosawa, 8), (one, 5), (see... {'capsule': 1, 'director': 1, 'cure': 3, 'brin... {'neg': 0.134, 'neu': 0.798, 'pos': 0.068, 'co... -0.9914 0.134 0.798 0.068

2000 rows × 16 columns

DIY SUMMARY

In [171]:
def get_weighted_freq_dist(review, freq_dist):
    max_freq = max(freq_dist.values())
    for word in freq_dist.keys():
        freq_dist[word] = (freq_dist[word]/max_freq)
    return freq_dist

all_df['weighted_freq_dist'] = all_df.apply(lambda x: get_weighted_freq_dist(x['sentences'], x['freq_dist']),axis=1)
In [178]:
def get_sentence_score(review, freq_dist):
    sentence_scores = {}
    for sent in review:
        for word in nltk.word_tokenize(sent.lower()):
            if word in freq_dist.keys():
                if len(sent.split(' ')) < 30:
                    if sent not in sentence_scores.keys():
                        sentence_scores[sent] = freq_dist[word]
                    else:
                        sentence_scores[sent] += freq_dist[word]
    return sentence_scores

all_df['sentence_scores'] = all_df.apply(lambda x: get_sentence_score(x['sentences'], x['freq_dist']),axis=1)
In [190]:
def get_summary_sentences(sentence_scores):
    sorted_sentences = sorted(sentence_scores.items(), key=lambda kv: kv[1], reverse=True)
    return ''.join(sent[0] for sent in sorted_sentences[:5])

all_df['summary_sentences'] = all_df.apply(lambda x: get_summary_sentences(x['sentence_scores']), axis=1)
In [195]:
all_df
Out[195]:
0 PoN sentences num_sentences tokens num_tokens no_sw num_no_sw topwords_unfil topwords_fil freq_dist vader_all v_compound v_neg v_neu v_pos summary sentence_score sentence_scores summary_sentences
0 bad . bad . \nbad . \nthat one word seems to p... N [bad ., bad ., bad ., that one word seems to p... 67 [bad, bad, bad, that, one, word, seems, to, pr... 1071 [bad, bad, bad, one, word, seems, pretty, much... 515 [(the, 60), (a, 35), (to, 34), (of, 24), (this... [(movie, 17), (bad, 8), (one, 7), (meyer, 6), ... {'bad': 0.47058823529411764, 'one': 0.41176470... {'neg': 0.134, 'neu': 0.715, 'pos': 0.151, 'co... 0.9695 0.134 0.715 0.151 {'bad': 0.47058823529411764, 'one': 0.41176470... {'bad .': 1.4117647058823528, 'that one word s... {'bad .': 1.4117647058823528, 'that one word s... but the wretched dialogue goes along well with...
1 isn't it the ultimate sign of a movie's cinema... N [isn't it the ultimate sign of a movie's cinem... 32 [is, it, the, ultimate, sign, of, a, movie, ci... 553 [ultimate, sign, movie, cinematic, ineptitude,... 297 [(the, 28), (a, 18), (of, 16), (to, 14), (i, 1... [(movie, 7), (one, 6), (first, 5), (much, 4), ... {'ultimate': 0.14285714285714285, 'sign': 0.14... {'neg': 0.135, 'neu': 0.729, 'pos': 0.136, 'co... 0.1722 0.135 0.729 0.136 {'ultimate': 0.14285714285714285, 'sign': 0.14... {'isn't it the ultimate sign of a movie's cine... {'isn't it the ultimate sign of a movie's cine... the action scenes are just as bland , since th...
2 " gordy " is not a movie , it is a 90-minute-... N [ " gordy " is not a movie , it is a 90-minute... 23 [gordy, is, not, a, movie, it, is, a, sesame, ... 478 [gordy, movie, sesame, street, skit, bad, one,... 239 [(the, 25), (and, 21), (to, 18), (is, 17), (a,... [(gordy, 8), (movie, 5), (one, 4), (stupid, 4)... {'gordy': 1.0, 'movie': 0.625, 'sesame': 0.125... {'neg': 0.185, 'neu': 0.74, 'pos': 0.075, 'com... -0.9970 0.185 0.740 0.075 {'gordy': 1.0, 'movie': 0.625, 'sesame': 0.125... {' " gordy " is not a movie , it is a 90-minut... {' " gordy " is not a movie , it is a 90-minut... " gordy " is not a movie , it is a 90-minute-...
3 disconnect the phone line . \ndon't accept the... N [disconnect the phone line ., don't accept the... 37 [disconnect, the, phone, line, do, accept, the... 604 [disconnect, phone, line, accept, charges, any... 323 [(the, 41), (of, 17), (a, 17), (to, 16), (and,... [(hanging, 9), (sisters, 5), (ryan, 4), (time,... {'disconnect': 0.1111111111111111, 'phone': 0.... {'neg': 0.101, 'neu': 0.744, 'pos': 0.155, 'co... 0.9861 0.101 0.744 0.155 {'disconnect': 0.1111111111111111, 'phone': 0.... {'disconnect the phone line .': 0.444444444444... {'disconnect the phone line .': 0.444444444444... cell-phones ring every five minutes , and ever...
4 when robert forster found himself famous again... N [when robert forster found himself famous agai... 29 [when, robert, forster, found, himself, famous... 386 [robert, forster, found, famous, appearing, ja... 185 [(the, 21), (it, 11), (i, 10), (to, 10), (of, ... [(film, 5), (movie, 5), (american, 4), (perfek... {'robert': 0.4, 'forster': 0.6, 'found': 0.2, ... {'neg': 0.073, 'neu': 0.843, 'pos': 0.083, 'co... 0.7445 0.073 0.843 0.083 {'robert': 0.4, 'forster': 0.6, 'found': 0.2, ... {'when robert forster found himself famous aga... {'when robert forster found himself famous aga... when robert forster found himself famous again...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 one of the funniest carry on movies and the th... P [one of the funniest carry on movies and the t... 25 [one, of, the, funniest, carry, on, movies, an... 434 [one, funniest, carry, movies, third, medical,... 241 [(the, 26), (and, 21), (of, 11), (a, 10), (is,... [(nookey, 9), (hawtrey, 5), (carry, 4), (dr, 4... {'one': 0.1111111111111111, 'funniest': 0.1111... {'neg': 0.056, 'neu': 0.799, 'pos': 0.145, 'co... 0.9913 0.056 0.799 0.145 {'one': 0.1111111111111111, 'funniest': 0.1111... {'one of the funniest carry on movies and the ... {'one of the funniest carry on movies and the ... james nookey ( jim dale ) who is disliked by h...
996 i remember making a pact , right after `patch ... P [i remember making a pact , right after `patch... 40 [i, remember, making, a, pact, right, after, p... 652 [remember, making, pact, right, patch, adams, ... 361 [(the, 44), (of, 29), (and, 19), (a, 15), (it,... [(music, 8), (heart, 7), (craven, 6), (movie, ... {'remember': 0.125, 'making': 0.125, 'pact': 0... {'neg': 0.072, 'neu': 0.734, 'pos': 0.195, 'co... 0.9985 0.072 0.734 0.195 {'remember': 0.125, 'making': 0.125, 'pact': 0... {'i remember making a pact , right after `patc... {'i remember making a pact , right after `patc... the uplifting true story of roberta guaspari c...
997 barely scrapping by playing at a nyc piano bar... P [barely scrapping by playing at a nyc piano ba... 23 [barely, scrapping, by, playing, at, a, nyc, p... 345 [barely, scrapping, playing, nyc, piano, bar, ... 177 [(a, 23), (is, 16), (the, 13), (and, 10), (of,... [(like, 4), (hutton, 3), (old, 3), (high, 2), ... {'barely': 0.25, 'scrapping': 0.25, 'playing':... {'neg': 0.069, 'neu': 0.719, 'pos': 0.212, 'co... 0.9964 0.069 0.719 0.212 {'barely': 0.25, 'scrapping': 0.25, 'playing':... {'barely scrapping by playing at a nyc piano b... {'barely scrapping by playing at a nyc piano b... hutton arrives at his past and is greeted by h...
998 if the current trends of hollywood filmmaking ... P [if the current trends of hollywood filmmaking... 34 [if, the, current, trends, of, hollywood, film... 730 [current, trends, hollywood, filmmaking, conti... 428 [(the, 49), (of, 31), (and, 19), (in, 18), (to... [(one, 7), (like, 5), (l, 5), (hollywood, 4), ... {'current': 0.14285714285714285, 'trends': 0.1... {'neg': 0.095, 'neu': 0.723, 'pos': 0.182, 'co... 0.9975 0.095 0.723 0.182 {'current': 0.14285714285714285, 'trends': 0.1... {'contemporary movies are good in illustrating... {'contemporary movies are good in illustrating... one of such movies , one that probably brings ...
999 capsule : the director of cure brings a weird ... P [capsule : the director of cure brings a weird... 45 [capsule, the, director, of, cure, brings, a, ... 641 [capsule, director, cure, brings, weird, compl... 340 [(the, 33), (to, 28), (and, 21), (a, 18), (of,... [(computer, 11), (kurosawa, 8), (one, 5), (see... {'capsule': 0.09090909090909091, 'director': 0... {'neg': 0.134, 'neu': 0.798, 'pos': 0.068, 'co... -0.9914 0.134 0.798 0.068 {'capsule': 0.09090909090909091, 'director': 0... {'capsule : the director of cure brings a weir... {'capsule : the director of cure brings a weir... taguchi's computer seems to have been infected...

2000 rows × 20 columns

In [191]:
summaries = all_df['summary_sentences'].tolist()
In [194]:
summaries[3]
Out[194]:
"cell-phones ring every five minutes , and everyone hurriedly rushes along , leaving marginal time for the frustrated viewer to relate to the sisters' issues and problems .i figured i needed to get in touch with my feminine side , and `hanging up' seemed like an ideal opportunity to do so .ryan's convincing performance and diverting cuteness are two of the more agreeable aspects of `hanging up' .it's certainly a far cry from what one would label as a rewarding experience , but `hanging up' should have at least been enjoyable .maddy ( kudrow ) , the soap opera actress , spends time either contemplating her possible path to stardom or nursing her dog ."

Doing VADER on the Summary Section

In [196]:
all_df['vader_sum_all'] = all_df.apply(lambda x: get_vader_score(x['summary_sentences']),axis=1)
In [197]:
all_df
Out[197]:
0 PoN sentences num_sentences tokens num_tokens no_sw num_no_sw topwords_unfil topwords_fil ... vader_all v_compound v_neg v_neu v_pos summary sentence_score sentence_scores summary_sentences vader_sum_all
0 bad . bad . \nbad . \nthat one word seems to p... N [bad ., bad ., bad ., that one word seems to p... 67 [bad, bad, bad, that, one, word, seems, to, pr... 1071 [bad, bad, bad, one, word, seems, pretty, much... 515 [(the, 60), (a, 35), (to, 34), (of, 24), (this... [(movie, 17), (bad, 8), (one, 7), (meyer, 6), ... ... {'neg': 0.134, 'neu': 0.715, 'pos': 0.151, 'co... 0.9695 0.134 0.715 0.151 {'bad': 0.47058823529411764, 'one': 0.41176470... {'bad .': 1.4117647058823528, 'that one word s... {'bad .': 1.4117647058823528, 'that one word s... but the wretched dialogue goes along well with... {'neg': 0.186, 'neu': 0.685, 'pos': 0.129, 'co...
1 isn't it the ultimate sign of a movie's cinema... N [isn't it the ultimate sign of a movie's cinem... 32 [is, it, the, ultimate, sign, of, a, movie, ci... 553 [ultimate, sign, movie, cinematic, ineptitude,... 297 [(the, 28), (a, 18), (of, 16), (to, 14), (i, 1... [(movie, 7), (one, 6), (first, 5), (much, 4), ... ... {'neg': 0.135, 'neu': 0.729, 'pos': 0.136, 'co... 0.1722 0.135 0.729 0.136 {'ultimate': 0.14285714285714285, 'sign': 0.14... {'isn't it the ultimate sign of a movie's cine... {'isn't it the ultimate sign of a movie's cine... the action scenes are just as bland , since th... {'neg': 0.127, 'neu': 0.744, 'pos': 0.129, 'co...
2 " gordy " is not a movie , it is a 90-minute-... N [ " gordy " is not a movie , it is a 90-minute... 23 [gordy, is, not, a, movie, it, is, a, sesame, ... 478 [gordy, movie, sesame, street, skit, bad, one,... 239 [(the, 25), (and, 21), (to, 18), (is, 17), (a,... [(gordy, 8), (movie, 5), (one, 4), (stupid, 4)... ... {'neg': 0.185, 'neu': 0.74, 'pos': 0.075, 'com... -0.9970 0.185 0.740 0.075 {'gordy': 1.0, 'movie': 0.625, 'sesame': 0.125... {' " gordy " is not a movie , it is a 90-minut... {' " gordy " is not a movie , it is a 90-minut... " gordy " is not a movie , it is a 90-minute-... {'neg': 0.193, 'neu': 0.74, 'pos': 0.068, 'com...
3 disconnect the phone line . \ndon't accept the... N [disconnect the phone line ., don't accept the... 37 [disconnect, the, phone, line, do, accept, the... 604 [disconnect, phone, line, accept, charges, any... 323 [(the, 41), (of, 17), (a, 17), (to, 16), (and,... [(hanging, 9), (sisters, 5), (ryan, 4), (time,... ... {'neg': 0.101, 'neu': 0.744, 'pos': 0.155, 'co... 0.9861 0.101 0.744 0.155 {'disconnect': 0.1111111111111111, 'phone': 0.... {'disconnect the phone line .': 0.444444444444... {'disconnect the phone line .': 0.444444444444... cell-phones ring every five minutes , and ever... {'neg': 0.052, 'neu': 0.779, 'pos': 0.169, 'co...
4 when robert forster found himself famous again... N [when robert forster found himself famous agai... 29 [when, robert, forster, found, himself, famous... 386 [robert, forster, found, famous, appearing, ja... 185 [(the, 21), (it, 11), (i, 10), (to, 10), (of, ... [(film, 5), (movie, 5), (american, 4), (perfek... ... {'neg': 0.073, 'neu': 0.843, 'pos': 0.083, 'co... 0.7445 0.073 0.843 0.083 {'robert': 0.4, 'forster': 0.6, 'found': 0.2, ... {'when robert forster found himself famous aga... {'when robert forster found himself famous aga... when robert forster found himself famous again... {'neg': 0.0, 'neu': 0.914, 'pos': 0.086, 'comp...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 one of the funniest carry on movies and the th... P [one of the funniest carry on movies and the t... 25 [one, of, the, funniest, carry, on, movies, an... 434 [one, funniest, carry, movies, third, medical,... 241 [(the, 26), (and, 21), (of, 11), (a, 10), (is,... [(nookey, 9), (hawtrey, 5), (carry, 4), (dr, 4... ... {'neg': 0.056, 'neu': 0.799, 'pos': 0.145, 'co... 0.9913 0.056 0.799 0.145 {'one': 0.1111111111111111, 'funniest': 0.1111... {'one of the funniest carry on movies and the ... {'one of the funniest carry on movies and the ... james nookey ( jim dale ) who is disliked by h... {'neg': 0.041, 'neu': 0.811, 'pos': 0.148, 'co...
996 i remember making a pact , right after `patch ... P [i remember making a pact , right after `patch... 40 [i, remember, making, a, pact, right, after, p... 652 [remember, making, pact, right, patch, adams, ... 361 [(the, 44), (of, 29), (and, 19), (a, 15), (it,... [(music, 8), (heart, 7), (craven, 6), (movie, ... ... {'neg': 0.072, 'neu': 0.734, 'pos': 0.195, 'co... 0.9985 0.072 0.734 0.195 {'remember': 0.125, 'making': 0.125, 'pact': 0... {'i remember making a pact , right after `patc... {'i remember making a pact , right after `patc... the uplifting true story of roberta guaspari c... {'neg': 0.113, 'neu': 0.682, 'pos': 0.206, 'co...
997 barely scrapping by playing at a nyc piano bar... P [barely scrapping by playing at a nyc piano ba... 23 [barely, scrapping, by, playing, at, a, nyc, p... 345 [barely, scrapping, playing, nyc, piano, bar, ... 177 [(a, 23), (is, 16), (the, 13), (and, 10), (of,... [(like, 4), (hutton, 3), (old, 3), (high, 2), ... ... {'neg': 0.069, 'neu': 0.719, 'pos': 0.212, 'co... 0.9964 0.069 0.719 0.212 {'barely': 0.25, 'scrapping': 0.25, 'playing':... {'barely scrapping by playing at a nyc piano b... {'barely scrapping by playing at a nyc piano b... hutton arrives at his past and is greeted by h... {'neg': 0.07, 'neu': 0.784, 'pos': 0.146, 'com...
998 if the current trends of hollywood filmmaking ... P [if the current trends of hollywood filmmaking... 34 [if, the, current, trends, of, hollywood, film... 730 [current, trends, hollywood, filmmaking, conti... 428 [(the, 49), (of, 31), (and, 19), (in, 18), (to... [(one, 7), (like, 5), (l, 5), (hollywood, 4), ... ... {'neg': 0.095, 'neu': 0.723, 'pos': 0.182, 'co... 0.9975 0.095 0.723 0.182 {'current': 0.14285714285714285, 'trends': 0.1... {'contemporary movies are good in illustrating... {'contemporary movies are good in illustrating... one of such movies , one that probably brings ... {'neg': 0.07, 'neu': 0.783, 'pos': 0.147, 'com...
999 capsule : the director of cure brings a weird ... P [capsule : the director of cure brings a weird... 45 [capsule, the, director, of, cure, brings, a, ... 641 [capsule, director, cure, brings, weird, compl... 340 [(the, 33), (to, 28), (and, 21), (a, 18), (of,... [(computer, 11), (kurosawa, 8), (one, 5), (see... ... {'neg': 0.134, 'neu': 0.798, 'pos': 0.068, 'co... -0.9914 0.134 0.798 0.068 {'capsule': 0.09090909090909091, 'director': 0... {'capsule : the director of cure brings a weir... {'capsule : the director of cure brings a weir... taguchi's computer seems to have been infected... {'neg': 0.141, 'neu': 0.831, 'pos': 0.028, 'co...

2000 rows × 21 columns

In [198]:
all_df['v_compound_sum'] = all_df.apply(lambda x: separate_vader_score(x['vader_sum_all'], 'compound'),axis=1)
all_df['v_neg_sum'] = all_df.apply(lambda x: separate_vader_score(x['vader_sum_all'], 'neg'),axis=1)
all_df['v_neu_sum'] = all_df.apply(lambda x: separate_vader_score(x['vader_sum_all'], 'neu'),axis=1)
all_df['v_pos_sum'] = all_df.apply(lambda x: separate_vader_score(x['vader_sum_all'], 'pos'),axis=1)
In [199]:
all_df
Out[199]:
0 PoN sentences num_sentences tokens num_tokens no_sw num_no_sw topwords_unfil topwords_fil ... v_pos summary sentence_score sentence_scores summary_sentences vader_sum_all v_compound_sum v_neg_sum v_neu_sum v_pos_sum
0 bad . bad . \nbad . \nthat one word seems to p... N [bad ., bad ., bad ., that one word seems to p... 67 [bad, bad, bad, that, one, word, seems, to, pr... 1071 [bad, bad, bad, one, word, seems, pretty, much... 515 [(the, 60), (a, 35), (to, 34), (of, 24), (this... [(movie, 17), (bad, 8), (one, 7), (meyer, 6), ... ... 0.151 {'bad': 0.47058823529411764, 'one': 0.41176470... {'bad .': 1.4117647058823528, 'that one word s... {'bad .': 1.4117647058823528, 'that one word s... but the wretched dialogue goes along well with... {'neg': 0.186, 'neu': 0.685, 'pos': 0.129, 'co... -0.8201 0.186 0.685 0.129
1 isn't it the ultimate sign of a movie's cinema... N [isn't it the ultimate sign of a movie's cinem... 32 [is, it, the, ultimate, sign, of, a, movie, ci... 553 [ultimate, sign, movie, cinematic, ineptitude,... 297 [(the, 28), (a, 18), (of, 16), (to, 14), (i, 1... [(movie, 7), (one, 6), (first, 5), (much, 4), ... ... 0.136 {'ultimate': 0.14285714285714285, 'sign': 0.14... {'isn't it the ultimate sign of a movie's cine... {'isn't it the ultimate sign of a movie's cine... the action scenes are just as bland , since th... {'neg': 0.127, 'neu': 0.744, 'pos': 0.129, 'co... 0.0395 0.127 0.744 0.129
2 " gordy " is not a movie , it is a 90-minute-... N [ " gordy " is not a movie , it is a 90-minute... 23 [gordy, is, not, a, movie, it, is, a, sesame, ... 478 [gordy, movie, sesame, street, skit, bad, one,... 239 [(the, 25), (and, 21), (to, 18), (is, 17), (a,... [(gordy, 8), (movie, 5), (one, 4), (stupid, 4)... ... 0.075 {'gordy': 1.0, 'movie': 0.625, 'sesame': 0.125... {' " gordy " is not a movie , it is a 90-minut... {' " gordy " is not a movie , it is a 90-minut... " gordy " is not a movie , it is a 90-minute-... {'neg': 0.193, 'neu': 0.74, 'pos': 0.068, 'com... -0.9614 0.193 0.740 0.068
3 disconnect the phone line . \ndon't accept the... N [disconnect the phone line ., don't accept the... 37 [disconnect, the, phone, line, do, accept, the... 604 [disconnect, phone, line, accept, charges, any... 323 [(the, 41), (of, 17), (a, 17), (to, 16), (and,... [(hanging, 9), (sisters, 5), (ryan, 4), (time,... ... 0.155 {'disconnect': 0.1111111111111111, 'phone': 0.... {'disconnect the phone line .': 0.444444444444... {'disconnect the phone line .': 0.444444444444... cell-phones ring every five minutes , and ever... {'neg': 0.052, 'neu': 0.779, 'pos': 0.169, 'co... 0.8930 0.052 0.779 0.169
4 when robert forster found himself famous again... N [when robert forster found himself famous agai... 29 [when, robert, forster, found, himself, famous... 386 [robert, forster, found, famous, appearing, ja... 185 [(the, 21), (it, 11), (i, 10), (to, 10), (of, ... [(film, 5), (movie, 5), (american, 4), (perfek... ... 0.083 {'robert': 0.4, 'forster': 0.6, 'found': 0.2, ... {'when robert forster found himself famous aga... {'when robert forster found himself famous aga... when robert forster found himself famous again... {'neg': 0.0, 'neu': 0.914, 'pos': 0.086, 'comp... 0.7605 0.000 0.914 0.086
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 one of the funniest carry on movies and the th... P [one of the funniest carry on movies and the t... 25 [one, of, the, funniest, carry, on, movies, an... 434 [one, funniest, carry, movies, third, medical,... 241 [(the, 26), (and, 21), (of, 11), (a, 10), (is,... [(nookey, 9), (hawtrey, 5), (carry, 4), (dr, 4... ... 0.145 {'one': 0.1111111111111111, 'funniest': 0.1111... {'one of the funniest carry on movies and the ... {'one of the funniest carry on movies and the ... james nookey ( jim dale ) who is disliked by h... {'neg': 0.041, 'neu': 0.811, 'pos': 0.148, 'co... 0.8910 0.041 0.811 0.148
996 i remember making a pact , right after `patch ... P [i remember making a pact , right after `patch... 40 [i, remember, making, a, pact, right, after, p... 652 [remember, making, pact, right, patch, adams, ... 361 [(the, 44), (of, 29), (and, 19), (a, 15), (it,... [(music, 8), (heart, 7), (craven, 6), (movie, ... ... 0.195 {'remember': 0.125, 'making': 0.125, 'pact': 0... {'i remember making a pact , right after `patc... {'i remember making a pact , right after `patc... the uplifting true story of roberta guaspari c... {'neg': 0.113, 'neu': 0.682, 'pos': 0.206, 'co... 0.8776 0.113 0.682 0.206
997 barely scrapping by playing at a nyc piano bar... P [barely scrapping by playing at a nyc piano ba... 23 [barely, scrapping, by, playing, at, a, nyc, p... 345 [barely, scrapping, playing, nyc, piano, bar, ... 177 [(a, 23), (is, 16), (the, 13), (and, 10), (of,... [(like, 4), (hutton, 3), (old, 3), (high, 2), ... ... 0.212 {'barely': 0.25, 'scrapping': 0.25, 'playing':... {'barely scrapping by playing at a nyc piano b... {'barely scrapping by playing at a nyc piano b... hutton arrives at his past and is greeted by h... {'neg': 0.07, 'neu': 0.784, 'pos': 0.146, 'com... 0.6758 0.070 0.784 0.146
998 if the current trends of hollywood filmmaking ... P [if the current trends of hollywood filmmaking... 34 [if, the, current, trends, of, hollywood, film... 730 [current, trends, hollywood, filmmaking, conti... 428 [(the, 49), (of, 31), (and, 19), (in, 18), (to... [(one, 7), (like, 5), (l, 5), (hollywood, 4), ... ... 0.182 {'current': 0.14285714285714285, 'trends': 0.1... {'contemporary movies are good in illustrating... {'contemporary movies are good in illustrating... one of such movies , one that probably brings ... {'neg': 0.07, 'neu': 0.783, 'pos': 0.147, 'com... 0.8074 0.070 0.783 0.147
999 capsule : the director of cure brings a weird ... P [capsule : the director of cure brings a weird... 45 [capsule, the, director, of, cure, brings, a, ... 641 [capsule, director, cure, brings, weird, compl... 340 [(the, 33), (to, 28), (and, 21), (a, 18), (of,... [(computer, 11), (kurosawa, 8), (one, 5), (see... ... 0.068 {'capsule': 0.09090909090909091, 'director': 0... {'capsule : the director of cure brings a weir... {'capsule : the director of cure brings a weir... taguchi's computer seems to have been infected... {'neg': 0.141, 'neu': 0.831, 'pos': 0.028, 'co... -0.8977 0.141 0.831 0.028

2000 rows × 25 columns

In [ ]: