In [ ]:
 

QUESTION:

Doc1:(book, book, music, video, video)

Doc2:(music, music, video)

Doc3:(book, book, video)

Use boolean representation; calculate cosine similarity between the documents. Use word frequency representation; calculate again. You can manually calculate them, or revise the sample code to do the calculation in Python.

In [2]:
from numpy import dot
from numpy.linalg import norm
X = [1,2]
Y = [2,2]
cos_sim = dot(X,Y) / (norm(X)*norm(Y))
print(cos_sim)
0.9486832980505138
In [7]:
from numpy import dot
from numpy.linalg import norm

# BOOK is dimension 1
# MUSIC is dimension 2
# VIDEO is dimension 3

# Not sure why erroring?
# D1 = [2,1,2]
# D2 = [0,2,1]
# D3 = [2,0,1]
# cos_sim = dot(D1,D2,D3) / (norm(D1)*norm(D2)*norm(D3))
# print(cos_sim)

# This works though 
D1 = [2,1,2]
D2 = [0,2,1]
# D3 = [2,0,1]
cos_sim = dot(D1,D2) / (norm(D1)*norm(D2))
print(cos_sim)
0.5962847939999439
In [ ]: