如何找到一列類似的值在PostgreSQL

我在SQL一個完整的新手，所以我不是很熟悉它的功能。
所以這是我的問題。
我有> 100.000公司如下表（姑且稱之爲「補償」）：

 
id | title    | name 
----+---------------------+-------------- 
1 | XYZ     | xyz 
----+---------------------+-------------- 
2 | Smarts    | smarts 
----+---------------------+-------------- 
3 | XYZ LTD    | xyzltd 
----+---------------------+-------------- 
4 | Outsmarts   | outsmarts 
----+---------------------+-------------- 
5 | XYZ Entertainment | xyzentertainment 
----+---------------------+-------------- 
6 | Smarts Entertainment| smartsentertainment

其中「標題」是一個公司的名字和「名」是一樣的冠軍，但低套管和不帶空格。有沒有辦法找到所有具有類似標題的公司（使用「標題」或「名稱」）？所以，基本上，我想收到：

 
id | title    | name 
----+---------------------+-------------- 
1 | XYZ     | xyz 
----+---------------------+-------------- 
3 | XYZ LTD    | xyzltd 
----+---------------------+-------------- 
5 | XYZ Entertainment | xyzentertainment 
----+---------------------+-------------- 
2 | Smarts    | smarts 
----+---------------------+-------------- 
6 | Smarts Entertainment| smartsentertainment

按類似於我的意思是：
1） 'XYZ'， '某某有限公司' 和 'XYZ娛樂'
2） '智能' 和 '智能娛樂'
但'XYZ娛樂'與'智能娛樂'不相似，'智能'與'Outsmarts'不相似。

我嘗試這樣做，也沒有工作：

SELECT set_limit(0.8); 

SELECT 
    similarity(c1.name, c2.name) AS sim, 
    c1.name, 
    c2.name 
FROM comp AS c1 
    JOIN comp AS c2 
    ON c1.name != c2.name 
     AND c1.name % c2.name 
ORDER BY sim DESC;

通過「沒有工作，」我的意思是7分鐘後仍沒有給我任何結果。我認爲，我完全搞砸了
它甚至有可能檢索到這種相似之處嗎？

來源

2016-11-23 L.Viek

在你的榜樣，類似值對應於同一行。你是否也需要在不同的行中找到兩個相似的值？ – FDavidov

這實際上是一個100k表上的交叉連接。預計它非常緩慢。但是，請張貼解釋輸出 – e4c5

@FDavidov我已經更新了問題 –

你可以嘗試Levenshtein distance功能，它給你編輯的數量達到從第一個參數第二：

SELECT levenshtein(c1.name, c2.name) AS sim, 0c1.name, c2.name 
FROM comp AS c1 JOIN comp AS c2 ON c1.name != c2.name ORDER BY sim DESC;

來源

2016-11-23 07:26:05 clemens

感謝您的回覆，但它也失敗了。我運行這個，它只是加載我的數據庫，並沒有給出任何結果： '選擇萊文斯坦（c1.name，c2.name）AS SIM卡， c1.name， c2.name 在COMP AS C1 JOIN comp AS c2 ON c1.name！= c2.name ORDER BY sim DESC; ' 我可能會失明，但錯誤在哪裏？也許我應該檢查'標題'，而不是'名字？ –

是的，'title'應該更好。我剛從第一篇文章中複製並修改了你的陳述。 – clemens

如何找到一列類似的值在PostgreSQL

回答

相關問題