NLPIA CH2

1 minute read

Math With Words (TF-IDF vectors)

book github

3.1 Bag of Words

3.2 Vectorizing

3.2.1 Vector Spaces

3.3 Zipf’s Law

3.4 Topic Modeling

3.4.1 Return of Zipf

3.4.2 Relevance ranking

3.4.3 Tools

3.4.4 Alternatives

3.4.5 Okapi BM25

3.4.6 What’s next

Math With Words (TF-IDF vectors)

Analyze meaning by counting words and term frequencies
Predict word occurence probabilities with Zipf’s Law
Represnt words as vectors
Find relevant documents using Inverse Document Frequencies
Estimate similarity of pairs of docs using cosine similarity and Okapi BM25

3.1 Bag of Words

3.2 Vectorizing

3.2.1 Vector Spaces

3.3 Zipf’s Law

3.4 Topic Modeling

3.4.1 Return of Zipf

3.4.2 Relevance ranking

3.4.3 Tools

3.4.4 Alternatives

3.4.5 Okapi BM25

3.4.6 What’s next

Web-scale search engines have TF-IDF term document matrix hidden under the hood
Term frequencies must be weighted by their inverse document frequency to ensure the most important, most meaningful words are bubbled to the top
Zipf’s law can help predict the frequencies of ALL the things! Words, characters, people, oh my!
Row of TF-IDF can be used as a vector represntation of the meanings of those individual words
Row of TF-IDF can be used to create a vector space model of word semantics (??)
Euclidean distance and similarity between pairs of high dimensional vectors doens’t adequately represnt their similarity for most NLP applications
Cosine Distance (the amount of overlap between vectors) can be calculated efficiently just by multiplying the elements of normalized vectors together and summing up those products (??)
Cosine distance is the go-to similarity score for most natural language vector representations

Share on

Twitter Facebook LinkedIn

Daniel Caraway

NLPIA CH2

Math With Words (TF-IDF vectors)

3.1 Bag of Words

3.2 Vectorizing

3.2.1 Vector Spaces

3.3 Zipf’s Law

3.4 Topic Modeling

3.4.1 Return of Zipf

3.4.2 Relevance ranking

3.4.3 Tools

3.4.4 Alternatives

3.4.5 Okapi BM25

3.4.6 What’s next

Math With Words (TF-IDF vectors)

3.1 Bag of Words

3.2 Vectorizing

3.2.1 Vector Spaces

3.3 Zipf’s Law

3.4 Topic Modeling

3.4.1 Return of Zipf

3.4.2 Relevance ranking

3.4.3 Tools

3.4.4 Alternatives

3.4.5 Okapi BM25

3.4.6 What’s next

Share on

You may also enjoy

daily log 03-25-21

How to use Data Science Superpowers for Useless Things: Getting a Job at Amazon, Take 2

How to use Data Science Superpowers for Useless Things: Getting a Job at Amazon

How to use Data Science Superpowers for Useless Things: Adding Text to Images (aka Cats Narrate the Big Lebowski)