1
的文檔相似性,我使用TF-IDF與餘弦相似度計算描述TF-IDF使用餘弦相似度幾乎類似的句子
輸入字符串:
3/4x1/2x3/4 blk mi tee
下面是句子其中我需要找到類似的輸入字符串句子
smith-cooper® 33rt1 reducing pipe tee 3/4 x 1/2 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1 x 1/2 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1-1/4 x 1 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1-1/2 x 3/4 x 1-1/2 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1-1/2 x 1-1/4 x 1 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 2 x 2 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 2 x 1-1/2 x 1-1/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 2-1/2 x 2 x 2 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 3 x 3 x 2 in npt 150 lb malleable iron black
由於刑期幾乎相同,我使用TF-IDF的做法,給低分單詞出現在所有文件(Idf)中,並給予獨特單詞更多的分數,這有助於更容易地找到相似的文檔。
有沒有比這更好的方法?