daily log 11.20.19

less than 1 minute read

SO what do the scores mean?

Precision vs Recall

COUNT VEC

max_df

max_df is used for removing terms that appear too frequently, also known as “corpus-specific stop words”. For example:

max_df = 0.50 means “ignore terms that appear in more than 50% of the documents”. max_df = 25 means “ignore terms that appear in more than 25 documents”. The default max_df is 1.0, which means “ignore terms that appear in more than 100% of the documents”. Thus, the default setting does not ignore any terms.

min_df

min_df is used for removing terms that appear too infrequently. For example:

min_df = 0.01 means “ignore terms that appear in less than 1% of the documents”. min_df = 5 means “ignore terms that appear in less than 5 documents”.

SO on CountVec

Updated: