NLPIA CH2

less than 1 minute read

2. Building Your Vocabulary (Word Tokenization)

2.1 Challenges (a preview of stemming)

2.2 Building your vocabulary with a tokenizer

2.2.1 Dot Product

2.2.2 Measuring bag-of-words overlap

2.2.3 A token improvement

How Regular Expressions Work

Improved Regular Expression for Separating Words

Contractions

2.2.4 Extending your vocabulary with n-grams

We all gram for n-grams

Stop Words

2.2.5 Normalizing Your Vocabulary

Case Folding

Stemming

Lemmatization

Use Cases

2.3 Sentiment

2.3.1 VADER – a rule based sentiment analyzer

2.3.2 Naive Bayes

NOTES

2. Building Your Vocabulary (Word Tokenization)

2.1 Challenges (a preview of stemming)

2.2 Building your vocabulary with a tokenizer

2.2.1 Dot Product

2.2.2 Measuring bag-of-words overlap

2.2.3 A token improvement

How Regular Expressions Work

Improved Regular Expression for Separating Words

Contractions

2.2.4 Extending your vocabulary with n-grams

We all gram for n-grams

Stop Words

2.2.5 Normalizing Your Vocabulary

Case Folding

Stemming

Lemmatization

Use Cases

2.3 Sentiment

2.3.1 VADER – a rule based sentiment analyzer

2.3.2 Naive Bayes

Share on

Twitter Facebook LinkedIn

Daniel Caraway

2. Building Your Vocabulary (Word Tokenization)

2.1 Challenges (a preview of stemming)

2.2 Building your vocabulary with a tokenizer

2.2.1 Dot Product

2.2.2 Measuring bag-of-words overlap

2.2.3 A token improvement

How Regular Expressions Work

Improved Regular Expression for Separating Words

Contractions

2.2.4 Extending your vocabulary with n-grams

We all gram for n-grams

Stop Words

2.2.5 Normalizing Your Vocabulary

Case Folding

Stemming

Lemmatization

Use Cases

2.3 Sentiment

2.3.1 VADER – a rule based sentiment analyzer

2.3.2 Naive Bayes

NOTES

2. Building Your Vocabulary (Word Tokenization)

2.1 Challenges (a preview of stemming)

2.2 Building your vocabulary with a tokenizer

2.2.1 Dot Product

2.2.2 Measuring bag-of-words overlap

2.2.3 A token improvement

How Regular Expressions Work

Improved Regular Expression for Separating Words

Contractions

2.2.4 Extending your vocabulary with n-grams

We all gram for n-grams

Stop Words

2.2.5 Normalizing Your Vocabulary

Case Folding

Stemming

Lemmatization

Use Cases

2.3 Sentiment

2.3.1 VADER – a rule based sentiment analyzer

2.3.2 Naive Bayes

Share on

You may also enjoy

daily log 03-25-21

How to use Data Science Superpowers for Useless Things: Getting a Job at Amazon, Take 2

How to use Data Science Superpowers for Useless Things: Getting a Job at Amazon

How to use Data Science Superpowers for Useless Things: Adding Text to Images (aka Cats Narrate the Big Lebowski)