{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SENTIMENT ANALYSIS (PANDAS STYLE!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 1: Import ALL the things!\n", "#### Libraries and paths and files\n", "I'm sure there is a cleaner way to do this, plz lmk [via email](mailto:danielcaraway42@gmail.com)" ] }, { "cell_type": "code", "execution_count": 181, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "negative = os.listdir('NEG/')\n", "positive = os.listdir('POS/')" ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [], "source": [ "positive_alltext = []\n", "for file in positive:\n", " f=open('POS/'+file)\n", " content=f.read()\n", " positive_alltext.append(content)\n", " f.close()\n", "\n", "negative_alltext = []\n", "for file in negative:\n", " f=open('NEG/'+file)\n", " content=f.read()\n", " negative_alltext.append(content)\n", " f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 2: Turn that fresh text into a pandas DF and add a column to mark it as either positive or negative" ] }, { "cell_type": "code", "execution_count": 183, "metadata": {}, "outputs": [], "source": [ "positive_df = pd.DataFrame(positive_alltext)\n", "negative_df = pd.DataFrame(negative_alltext)" ] }, { "cell_type": "code", "execution_count": 184, "metadata": {}, "outputs": [], "source": [ "positive_df['PoN'] = 'P'\n", "negative_df['PoN'] = 'N'" ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [], "source": [ "# Combine the pos and neg dfs\n", "all_df = positive_df.append(negative_df)" ] }, { "cell_type": "code", "execution_count": 186, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "PoN | \n", "
---|---|---|
0 | \n", "films adapted from comic books have had plenty... | \n", "P | \n", "
1 | \n", "you've got mail works alot better than it dese... | \n", "P | \n", "
2 | \n", "\" jaws \" is a rare film that grabs your atten... | \n", "P | \n", "
3 | \n", "every now and then a movie comes along from a ... | \n", "P | \n", "
4 | \n", "moviemaking is a lot like being the general ma... | \n", "P | \n", "
0 | \n", "that's exactly how long the movie felt to me .... | \n", "N | \n", "
1 | \n", "\" quest for camelot \" is warner bros . ' firs... | \n", "N | \n", "
2 | \n", "so ask yourself what \" 8mm \" ( \" eight millime... | \n", "N | \n", "
3 | \n", "synopsis : a mentally unstable man undergoing ... | \n", "N | \n", "
4 | \n", "capsule : in 2176 on the planet mars police ta... | \n", "N | \n", "
\n", " | 0 | \n", "PoN | \n", "tokenized | \n", "tokenized_count | \n", "no_stopwords | \n", "no_stopwords_count | \n", "
---|---|---|---|---|---|---|
0 | \n", "films adapted from comic books have had plenty... | \n", "P | \n", "[films, adapted, from, comic, books, have, had... | \n", "673 | \n", "[films, adapted, comic, books, plenty, success... | \n", "387 | \n", "
1 | \n", "you've got mail works alot better than it dese... | \n", "P | \n", "[you, got, mail, works, alot, better, than, it... | \n", "412 | \n", "[got, mail, works, alot, better, deserves, ord... | \n", "203 | \n", "
2 | \n", "\" jaws \" is a rare film that grabs your atten... | \n", "P | \n", "[jaws, is, a, rare, film, that, grabs, your, a... | \n", "993 | \n", "[jaws, rare, film, grabs, attention, shows, si... | \n", "552 | \n", "
3 | \n", "every now and then a movie comes along from a ... | \n", "P | \n", "[every, now, and, then, a, movie, comes, along... | \n", "628 | \n", "[every, movie, comes, along, suspect, studio, ... | \n", "326 | \n", "
4 | \n", "moviemaking is a lot like being the general ma... | \n", "P | \n", "[moviemaking, is, a, lot, like, being, the, ge... | \n", "630 | \n", "[moviemaking, lot, like, general, manager, nfl... | \n", "345 | \n", "
0 | \n", "that's exactly how long the movie felt to me .... | \n", "N | \n", "[that, exactly, how, long, the, movie, felt, t... | \n", "550 | \n", "[exactly, long, movie, felt, even, nine, laugh... | \n", "308 | \n", "
1 | \n", "\" quest for camelot \" is warner bros . ' firs... | \n", "N | \n", "[quest, for, camelot, is, warner, bros, first,... | \n", "444 | \n", "[quest, camelot, warner, bros, first, attempt,... | \n", "247 | \n", "
2 | \n", "so ask yourself what \" 8mm \" ( \" eight millime... | \n", "N | \n", "[so, ask, yourself, what, eight, millimeter, i... | \n", "527 | \n", "[ask, eight, millimeter, really, wholesome, su... | \n", "283 | \n", "
3 | \n", "synopsis : a mentally unstable man undergoing ... | \n", "N | \n", "[synopsis, a, mentally, unstable, man, undergo... | \n", "706 | \n", "[synopsis, mentally, unstable, man, undergoing... | \n", "371 | \n", "
4 | \n", "capsule : in 2176 on the planet mars police ta... | \n", "N | \n", "[capsule, in, on, the, planet, mars, police, t... | \n", "649 | \n", "[capsule, planet, mars, police, taking, custod... | \n", "355 | \n", "
\n", " | 0 | \n", "PoN | \n", "tokenized | \n", "tokenized_count | \n", "no_stopwords | \n", "no_stopwords_count | \n", "most_common_unfiltered_word | \n", "most_common_filtered_word | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "films adapted from comic books have had plenty... | \n", "P | \n", "[films, adapted, from, comic, books, have, had... | \n", "673 | \n", "[films, adapted, comic, books, plenty, success... | \n", "387 | \n", "[(the, 46)] | \n", "[(comic, 5), (hell, 5), (film, 5), (like, 4), ... | \n", "
1 | \n", "you've got mail works alot better than it dese... | \n", "P | \n", "[you, got, mail, works, alot, better, than, it... | \n", "412 | \n", "[got, mail, works, alot, better, deserves, ord... | \n", "203 | \n", "[(the, 33)] | \n", "[(two, 3), (shop, 3), (much, 3), (fox, 3), (go... | \n", "
2 | \n", "\" jaws \" is a rare film that grabs your atten... | \n", "P | \n", "[jaws, is, a, rare, film, that, grabs, your, a... | \n", "993 | \n", "[jaws, rare, film, grabs, attention, shows, si... | \n", "552 | \n", "[(the, 63)] | \n", "[(shark, 16), (jaws, 8), (film, 7), (spielberg... | \n", "
3 | \n", "every now and then a movie comes along from a ... | \n", "P | \n", "[every, now, and, then, a, movie, comes, along... | \n", "628 | \n", "[every, movie, comes, along, suspect, studio, ... | \n", "326 | \n", "[(the, 35)] | \n", "[(even, 6), (gets, 6), (film, 5), (school, 5),... | \n", "
4 | \n", "moviemaking is a lot like being the general ma... | \n", "P | \n", "[moviemaking, is, a, lot, like, being, the, ge... | \n", "630 | \n", "[moviemaking, lot, like, general, manager, nfl... | \n", "345 | \n", "[(the, 41)] | \n", "[(jackie, 10), (like, 9), (chan, 8), (got, 4),... | \n", "
0 | \n", "that's exactly how long the movie felt to me .... | \n", "N | \n", "[that, exactly, how, long, the, movie, felt, t... | \n", "550 | \n", "[exactly, long, movie, felt, even, nine, laugh... | \n", "308 | \n", "[(the, 31)] | \n", "[(grant, 12), (movie, 7), (nine, 5), (hugh, 5)... | \n", "
1 | \n", "\" quest for camelot \" is warner bros . ' firs... | \n", "N | \n", "[quest, for, camelot, is, warner, bros, first,... | \n", "444 | \n", "[quest, camelot, warner, bros, first, attempt,... | \n", "247 | \n", "[(the, 21)] | \n", "[(quest, 5), (camelot, 4), (kayley, 4), (disne... | \n", "
2 | \n", "so ask yourself what \" 8mm \" ( \" eight millime... | \n", "N | \n", "[so, ask, yourself, what, eight, millimeter, i... | \n", "527 | \n", "[ask, eight, millimeter, really, wholesome, su... | \n", "283 | \n", "[(of, 21)] | \n", "[(like, 4), (schumacher, 4), (film, 4), (welle... | \n", "
3 | \n", "synopsis : a mentally unstable man undergoing ... | \n", "N | \n", "[synopsis, a, mentally, unstable, man, undergo... | \n", "706 | \n", "[synopsis, mentally, unstable, man, undergoing... | \n", "371 | \n", "[(the, 48)] | \n", "[(stalked, 12), (daryl, 7), (stalker, 6), (bro... | \n", "
4 | \n", "capsule : in 2176 on the planet mars police ta... | \n", "N | \n", "[capsule, in, on, the, planet, mars, police, t... | \n", "649 | \n", "[capsule, planet, mars, police, taking, custod... | \n", "355 | \n", "[(the, 30)] | \n", "[(mars, 14), (ghosts, 10), (carpenter, 8), (fi... | \n", "