TF-IDF使用餘弦相似度幾乎類似的句子

的文檔相似性，我使用TF-IDF與餘弦相似度計算描述TF-IDF使用餘弦相似度幾乎類似的句子

輸入字符串：

3/4x1/2x3/4 blk mi tee

下面是句子其中我需要找到類似的輸入字符串句子

 smith-cooper&reg; 33rt1 reducing pipe tee 3/4 x 1/2 x 3/4 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 1 x 1/2 x 3/4 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 1-1/4 x 1 x 3/4 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 1-1/2 x 3/4 x 1-1/2 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 1-1/2 x 1-1/4 x 1 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 2 x 2 x 3/4 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 2 x 1-1/2 x 1-1/4 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 2-1/2 x 2 x 2 in npt 150 lb malleable iron black 
     smith-cooper&reg; 33rt1 reducing pipe tee 3 x 3 x 2 in npt 150 lb malleable iron black

由於刑期幾乎相同，我使用TF-IDF的做法，給低分單詞出現在所有文件（Idf）中，並給予獨特單詞更多的分數，這有助於更容易地找到相似的文檔。

有沒有比這更好的方法？

來源

2017-10-19 Ranjana Girish