2012-09-12 33 views
0

當我在MySQL中運行以下查詢時,我得到大量重複項。我知道我已經足夠清楚,我只需要不同的記錄,所以我不明白爲什麼它會爲我加倍。看起來,當我包含最後一個聯盟(importorders表)時,所有副本都會出現,因爲大多數客戶在客戶和訂單中具有相同的地址。任何人都可以幫助我理解爲什麼這會發生?MySQL從選擇distinct + union中獲取重複項

SELECT DISTINCT PostalCode, City, Region, Country 
FROM 
(select distinct postalcode, city, region, country 
from importemployees 
UNION 
select distinct postalcode, city, region, country 
from importcustomers 
UNION 
select distinct postalcode, city, region, country 
from importproducts 
UNION 
select distinct shippostalcode as postalcode, shipcity as city, shipregion as region, shipcountry as country 
from importorders) T 

Query and result

正如你所看到的。有些行是重複的。

如果我使用INSERT IGNORE先插入importcustomers,然後importorders,那麼它設法將記錄標識爲重​​復項。爲什麼選擇查詢不起作用?

+0

你是什麼意思重複?列或整行? –

+0

區分大小寫? –

+0

我的意思是重複的行。值在相同的行中看起來完全相同。可以前。表中的不同字符集/排序規則(或從csv導入時)會觸發此操作? –

回答

2

非常好奇的問題。當我放棄'國家'似乎解決了這個問題。

SELECT DISTINCT PostalCode, City, Region 

總共128個,查詢花費0.0066秒

SELECT DISTINCT PostalCode, City, Region, Country 

209總計,查詢花費0.0002秒

此外,行爲似乎隻影響ImportCustomersImportOrders

SELECT postalcode, city, region, country 
FROM 
    (SELECT postalcode, city, region, country FROM importcustomers 
    UNION 
    SELECT shippostalcode, shipcity, shipregion, shipcountry FROM importorders) t 

172總計,查詢花費0.0053秒

SELECT postalcode 
FROM 
    (SELECT postalcode FROM importcustomers 
    UNION 
    SELECT shippostalcode FROM importorders) t 

91總計,查詢花費0.0050秒

我然後它縮小到country列上importcusotmersimportorders

SELECT TRIM(country) AS country FROM importcustomers 
UNION 
SELECT TRIM(shipcountry) AS country FROM importorders 
Argentina 
Argentina 
Austria 
Austria 
Belgium 
Belgium 
...

Someth荷蘭國際集團有趣的事,當我投的列BINARY

SELECT BINARY country AS country FROM importcustomers 
UNION 
SELECT BINARY shipcountry AS country FROM importorders 
Argentina 
417267656e74696e610d 
Austria 
417573747269610d 
Belgium 
42656c6769756d0d 
...

ImportOrders導致了重複。

SELECT BINARY shipcountry AS country FROM importorders 
4765726d616e790d 
5553410d 
5553410d 
4765726d616e790d 
...

看着你提供的轉儲,不存在附加到該國的最後一個額外的\r(由0d中的值表示)。

-- 
-- Dumping data for table `importorders` 
-- 
INSERT INTO `importorders` VALUES 
...'Germany\r'), 
...'USA\r'), 
...'USA\r'), 
...'Germany\r'), 
...'Mexico\r'), 

importcustomerscountry看起來不錯:

-- 
-- Dumping data for table `importcustomers` 
-- 
INSERT INTO `importcustomers` VALUES 
...'Germany', ... , 
...'Mexico', ... , 
...'Mexico', ... , 
...'UK', ... , 
...'Sweden', ... ,

您可以通過運行該查詢刪除這些\r的(回車):

UPDATE importorders SET ShipCountry = REPLACE(ShipCountry, '\r', '') 

然後,您將得到如果您運行原始查詢,則需要的結果集。僅供參考,如果您使用UNION,則不需要DISTINCT

SELECT PostalCode, City, Region, Country 
FROM 
    (SELECT postalcode, city, region, country FROM importemployees 
    UNION 
    SELECT postalcode, city, region, country FROM importcustomers 
    UNION 
    SELECT postalcode, city, region, country FROM importproducts 
    UNION 
    SELECT shippostalcode as postalcode, shipcity as city, 
     shipregion as region, shipcountry as country FROM importorders) T 
+0

哇。好的抓住!我注意到了sqldump中的\ r,但由於我是sql新手,它沒有發出任何警報。什麼標誌是\ r?由於它不在GUI中顯示?非常感謝給我「教程」。學到的另一招:) 編輯:現在測試它(從sqldump刪除所有\ r)。 –

+0

@Graimer'\ r'是一個回車符,它會創建一個新行。你可以閱讀更多關於它[這裏](http://blog.sqlauthority.com/2009/07/01/sql-server-difference-between-line-feed-n-and-carriage-return-rt-sql-新行炭/)。您也可以運行查詢來刪除'\ r'。看到我更新的答案。 – Kermit

+0

您好,先生,是一位天才! :D一整天都卡住了。現在我可以刪除長插入忽略(和左連接...其中..爲空)腳本:D –