QUESTION:
Doc1:(book, book, music, video, video)
Doc2:(music, music, video)
Doc3:(book, book, video)
Use boolean representation; calculate cosine similarity between the documents. Use word frequency representation; calculate again. You can manually calculate them, or revise the sample code to do the calculation in Python.
from numpy import dot
from numpy.linalg import norm
X = [1,2]
Y = [2,2]
cos_sim = dot(X,Y) / (norm(X)*norm(Y))
print(cos_sim)
from numpy import dot
from numpy.linalg import norm
# BOOK is dimension 1
# MUSIC is dimension 2
# VIDEO is dimension 3
# Not sure why erroring?
# D1 = [2,1,2]
# D2 = [0,2,1]
# D3 = [2,0,1]
# cos_sim = dot(D1,D2,D3) / (norm(D1)*norm(D2)*norm(D3))
# print(cos_sim)
# This works though
D1 = [2,1,2]
D2 = [0,2,1]
# D3 = [2,0,1]
cos_sim = dot(D1,D2) / (norm(D1)*norm(D2))
print(cos_sim)