2013-04-04 58 views
1

需要幫助在SQL數據庫中查找類似的值。表結構等:大表大小的SQL查詢

id   |  item_id_nm |  height | width |  length |  weight 
    ---------------------------------------------------------------------------------- 
    1   |  00000000001 |  1.0 |  1.0 |  1.0 |   1.0 
    2   |  00000000001 |  1.1 |  1.0 |  0.9 |   1.1 
    3   |  00000000001 |  2.0 |  1.0 |  1.0 |   1.0 
    4   |  00000000002 |  1.0 |  1.0 |  1.0 |   1.0 
    5   |  00000000002 |  1.0 |  1.1 |  1.1 |   1.0 
    6   |  00000000002 |  1.0 |  1.0 |  1.0 |   2.0 

ID顯然不能有重複,item_id_nm可以具有重複的(實際上可以出現多次又名> 2)。

如何構建SQL以查找重複的item_id_nm,但僅當高度或寬度或長度或重量的值相差大於30%時纔會出現這種情況。

我知道它需要遍歷表,但我該如何做檢查。謝謝您的幫助。

編輯:包含%30差異的示例。 id = 3,高度與id的1和2的1.0(或1.1)相差200%。所以對不清楚,對於高度,寬度,長度或重量的每個值可能有30%的差異。如果其中一個有30%的差異,它將被視爲其他的重複。

+0

你可以使用簡單的連接來做到這一點。如果你想找到重複計數,你應該使用group by item_id_nm – bksi 2013-04-04 21:26:48

+1

不同於什麼?每列的平均值? – Wolf 2013-04-04 21:32:50

+0

請給出一些具有30%差異的示例行,以便清楚您想要的是什麼。你需要在你的問題中提供更多細節以獲得準確的答案。 – 2013-04-04 22:30:10

回答

3

這應該給你從平均30%以上的不同行:

SELECT t1.* 
FROM tbl t1 
INNER JOIN (
    SELECT 
     item_id_nm, 
     AVG(width) awidth, AVG(height) aheight, 
     AVG(length) alength, AVG(weight) aweight 
    FROM tbl 
    GROUP BY item_id_nm) t2 
USING (item_id_nm) 
WHERE 
    width > awidth * 1.3 OR width < awidth * 0.7 
    OR height > aheight * 1.3 OR height < aheight * 0.7 
    OR length > alength * 1.3 OR length < alength * 0.7 
    OR weight > aweight * 1.3 OR weight < aweight * 0.7 

這一次應該給你對行了30%的不同:

SELECT t1.*,t2.* 
FROM tbl t1 
INNER JOIN tbl t2 
USING (item_id_nm) 
WHERE 
    (t1.width > t2.with * 1.3 OR t1.width < t2.width * 0.7) 
    OR (t1.height > t2.height * 1.3 OR t1.height < t2.height * 0.7) 
    OR (t1.length > t2.length * 1.3 OR t1.length < t2.length * 0.7) 
    OR (t1.weight > t2.weight * 1.3 OR t1.weight < t2.weight * 0.7) 
2

我想你可以使用這樣的事情:

SELECT item_id_nm 
FROM yourtable 
GROUP BY item_id_nm 
HAVING 
    MIN(height)*1.3 < MAX(height) OR 
    MIN(width)*1.3 < MAX(width) OR 
    MIN(length)*1.3 < MAX(length) OR 
    MIN(weight)*1.3 < MAX(weight) 
+1

不要忘記HAVING COUNT(item_id_nm)> 1 – 2013-04-04 21:36:52

+0

@DanLing如果count()= 1,那麼MIN()= MAX()因此MIN()* 1.3永遠不會是 fthiella 2013-04-04 21:45:48

+0

編輯的問題。 – user1799107 2013-04-05 01:43:40

2
SELECT 
    * 
FROM 
    TableName 
WHERE 
    (height > 1.3 * width OR height < 0.7 width) OR 
    (length > 1.3 * width OR length < 0.7 width) 
GROUP BY 
    item_id_nm 
HAVING 
    COUNT(item_id_nm) > 1 
+0

從問題中不清楚30%的差異是否需要在同一行的寬度和高度,寬度和長度或兩個重複對應的寬度,高度或長度列之間。如果問題中有例子,情況會更好。 – 2013-04-04 22:26:55

0

我會用:

SELECT s1.id AS id1, s2.id AS id2 
, s1.height AS h1, s2.height as h2 
, s1.width as width1, s2.width as width2 
, s1.length as l1, s2.length as l2 
, s1.weight as weight1, s2.weight as weight2 
FROM stack s1 
INNER JOIN stack s2 
ON s1.item_id_nm = s2.item_id_nm 
WHERE s1.id != s2.id 
AND s1.id < s2.id 
AND (abs(100-((s2.height*100)/s1.height)) > 30 
OR abs(100-((s2.width*100)/s1.width)) > 30 
OR abs(100-((s2.length*100)/s1.length)) > 30 
OR abs(100-((s2.weight*100)/s1.weight)) > 30) 

在PostgreSQL(http://sqlfiddle.com/#!12/e5f25/15)。此代碼不會返回重複的行。