例如,我有幾十億個短語,我想對它們進行類似的聚類。如何在R中找到類似的句子/短語?
> strings.to.cluster <- c("Best Toyota dealer in bay area. Drive out with a new car today",
"Largest Selection of Furniture. Stock updated everyday" ,
" Unique selection of Handcrafted Jewelry",
"Free Shipping for orders above $60. Offer Expires soon",
"XXXX is where smart men buy anniversary gifts",
"2012 Camrys on Sale. 0% APR for select customers",
"Closing Sale on office desks. All Items must go"
)
假設這個向量是數十萬行。 R中是否有一個包來將這些短語的含義分組? 或有人可能會建議一種方法來按照給定短語的含義來排列「相似」短語。
你如何建議定義「意義」?哪些示例短語應該聚集在一起? – tripleee