NLPIA CH2

less than 1 minute read

2. Building Your Vocabulary (Word Tokenization)

2.1 Challenges (a preview of stemming)

2.2 Building your vocabulary with a tokenizer

2.2.1 Dot Product

2.2.2 Measuring bag-of-words overlap

2.2.3 A token improvement

How Regular Expressions Work

Improved Regular Expression for Separating Words

Contractions

2.2.4 Extending your vocabulary with n-grams

We all gram for n-grams

Stop Words

2.2.5 Normalizing Your Vocabulary

Case Folding

Stemming

Lemmatization

Use Cases

2.3 Sentiment

2.3.1 VADER – a rule based sentiment analyzer

2.3.2 Naive Bayes


NOTES

2. Building Your Vocabulary (Word Tokenization)

2.1 Challenges (a preview of stemming)

2.2 Building your vocabulary with a tokenizer

2.2.1 Dot Product

2.2.2 Measuring bag-of-words overlap

2.2.3 A token improvement

How Regular Expressions Work

Improved Regular Expression for Separating Words

Contractions

2.2.4 Extending your vocabulary with n-grams

We all gram for n-grams

Stop Words

2.2.5 Normalizing Your Vocabulary

Case Folding

Stemming

Lemmatization

Use Cases

2.3 Sentiment

2.3.1 VADER – a rule based sentiment analyzer

2.3.2 Naive Bayes

Updated: