IST736 WK 6 – Async

1 minute read

WK 6 scikit-learn

6.1 Readings

NONE

6.2 Data Preparation

VIDEO: 13 m

NOTES:

  • Big X is training CONTENT
  • Little y is training LABEL

RESPONSE: In the sample script block Exercise A, follow the instructions to write code and test it. Post your code here.

6.3 Vectorization

VIDEO: 11 min

NOTES:

  • TFIDF
  • CountVectorizer – encoding can be latin-1 or utf-8 or anything, depends on dataset

QUESTION: Follow the instruction in the sample script block Exercise B. Write and test code for boolean vectorization. Post your code here.

RESPONSE:

This kept giving me this error

NotFittedError: CountVectorizer - Vocabulary wasn’t fitted.

test_bool = unigram_bool_vectorizer.transform(X_test)

train_bool = unigram_bool_vectorizer.transform(X_train)

So I did this

import sklearn vocabulary_to_load = X_test loaded_vectorizer = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(1, 2), binary = True, min_df=1, vocabulary=vocabulary_to_load) loaded_vectorizer._validate_vocabulary() print(‘loaded_vectorizer.get_feature_names(): {0}’. format(loaded_vectorizer.get_feature_names()))

6.4 Training and Explaining MultinomialNB Model

VIDEO: 7 min

RESPONSE: Follow the instruction in the sample script block Exercise C. Write and test code. Post your code here.

6.5 Use Model for Prediction and Explain Prediction Result

VIDEO: 10 min

QUESTION: Follow instructions in the sample script block Exercise D. Write and test code. Post your code and your error analysis comments here.

RESPONSE:

err_cnt = 0
for i in range(0, len(y_test)):
    if(y_test[i]==1 and y_pred[i]==4):
        print(X_test[i])
        err_cnt = err_cnt+1
print("errors:", err_cnt)

Negation seems to be a good next step for this one, too.

6.6 Kaggle Submission

VIDEO: 5 min

RESPONSE: Follow the instruction in the sample script block Exercise E. Report your result here.

6.7 Compare BernoulliNB and MultinomialNB With Cross Validation

VIDEO: 5 min

RESPONSE: Follow the instructions in the sample script block Exercise F. Post your code and result here.

Updated: