2011-11-29 63 views
1

我有三個表,其數據我需要基於共同的領域加入。模擬完全加入MYSQL與大型數據集

樣本僞表DEFS:

barometer_log(設備,壓力浮動,sampleTime時間戳)

temperature_log(裝置INT,溫度浮動,sampleTime時間戳)

magnitude_log(設備int,幅度浮點數,utcTime時間戳)

這些表格最終將包含數十億行,但目前每行包含大約500,000行。

我需要能夠將數據從表結合(FULL JOIN)以使得sampleTime被合併爲一個柱(COALESE)給我的行爲: 設備,sampleTime,壓力,溫度,量級

我需要能夠通過指定設備和開始和結束日期來查詢數據,例如 選擇....其中device = 1000和sampleTime '2011-10-11' 和 '2011-10-17' 之間

我嘗試不同UNION ALL技術與右和左連接 如MySql full join (union) and ordering on multiple date columns建議和MySql full join (union) and ordering on multiple date columns,但查詢需要很長時間,我必須停止它或運行幾個小時後引發有關臨時文件大小的錯誤。 什麼是我最好的方式來查詢三個表,並在可接受的時間範圍內合併他們的輸出?

下面是建議的全表定義。 注意:尚未包含設備表。

magnitude_log

CREATE TABLE magnitude_log (
    device int(11) NOT NULL, 
    magnitude float not NULL, 
    sampleTime timestamp NOT NULL, 
    PRIMARY KEY (device,sampleTime), 
    CONSTRAINT magnitudeLog_device 
    FOREIGN KEY (device) 
     REFERENCES device (id) 
     ON DELETE CASCADE 
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

barometer_log

CREATE TABLE barometer_log (
    device int(11) NOT NULL, 
    pressure float not NULL, 
    sampleTime timestamp NOT NULL, 
    PRIMARY KEY (device,sampleTime), 
    CONSTRAINT barometerLog_device 
    FOREIGN KEY (device) 
     REFERENCES device (id) 
     ON DELETE CASCADE 
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

temperature_log

CREATE TABLE temperature_log (
    device int(11) NOT NULL, 
    sampleTime timestamp NOT NULL, 
    temperature float default NULL, 
    PRIMARY KEY (device,sampleTime), 
    CONSTRAINT temperatureLog_device 
    FOREIGN KEY (device) 
     REFERENCES device (id) 
     ON DELETE CASCADE 
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 
+0

你有'設備'列(我猜你用於連接)索引? –

+0

我在所有三個表格上都有設備和sampleTime的複合索引 – anzaan

+0

請添加表格定義。 'device'是主鍵還是唯一鍵?或者'(device,sampleTime)'每個表中的PK? –

回答

1

首先,從我們所有的3個表中所需要的時間的(device, sampleTime)所有組合:

-------- Q -------- 
    SELECT device, sampleTime 
    FROM magnitude_log 
    WHERE device = 1000 
     AND sampleTime >= '2011-10-11' 
     AND sampleTime < '2011-10-18' 
UNION 
    SELECT device, sampleTime 
    FROM barometer_log 
    WHERE device = 1000 
     AND sampleTime >= '2011-10-11' 
     AND sampleTime < '2011-10-18' 
UNION 
    SELECT device, sampleTime 
    FROM temperature_log 
    WHERE device = 1000 
     AND sampleTime >= '2011-10-11' 
     AND sampleTime < '2011-10-18' 

然後用它來LEFT JOIN 3個表:

SELECT 
    q.device 
    , q.sampleTime 
    , b.pressure 
    , t.temperature 
    , m.magnitude 
FROM 
    (Q) AS q 
    LEFT JOIN 
    (SELECT * 
     FROM magnitude_log 
     WHERE device = 1000 
     AND sampleTime >= '2011-10-11' 
     AND sampleTime < '2011-10-18' 
    ) AS m 
     ON (m.device, m.sampleTime) = (q.device, q.sampleTime) 
    LEFT JOIN 
    (SELECT * 
     FROM barometer_log 
     WHERE device = 1000 
     AND sampleTime >= '2011-10-11' 
     AND sampleTime < '2011-10-18' 
    ) AS b 
     ON (b.device, b.sampleTime) = (q.device, q.sampleTime) 
    LEFT JOIN 
    (SELECT * 
     FROM temperature_log_log 
     WHERE device = 1000 
     AND sampleTime >= '2011-10-11' 
     AND sampleTime < '2011-10-18' 
    ) AS t 
     ON (t.device, t.sampleTime) = (q.device, q.sampleTime) 

的時間越長,你有周期,時間越長,查詢將與UNION子查詢鬥爭。你可以考慮把Q作爲一個單獨的表格,可能通過觸發器填充其中的三個其他表格的獨特(device, sampleTime)組合。

+0

感謝您的回答。我會測試它,並讓你知道它是怎麼回事 – anzaan

+0

儘可能從我運行的測試中得到這個查詢。 但有一個奇怪的問題,但。我還測試了@mikn答案,他的結果提取了73條記錄,而您的查詢返回了72條記錄。 當我在其中一個具有完整數據集的表上運行單獨查詢時,它還返回了72條記錄,這似乎是正確的記錄數。 任何想法可能發生什麼? – anzaan

0

假設表device包含你並不真的需要一個適當的全同所有的設備,你就必須離開加盟上device其他表和組上採樣時間是這樣的:

SELECT 
    d.id AS device, 
    COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime, 
    m.magnitude, 
    b.pressure, 
    t.temperature 
FROM device AS d 
    LEFT JOIN magnitude_log AS m ON d.id = m.device 
    LEFT JOIN barometer_log AS b ON d.id = b.device 
    LEFT JOIN temperature_log AS t ON d.id = t.device 
WHERE d.id = 1000 
GROUP BY device, sampleTime 
HAVING sampleTime BETWEEN '2011-10-11' AND '2011-10-17' 

然而,這可能會很慢,因爲它將在時間跨度上實際匹配之前進行分組,但是如果單個設備本身不會有數百萬行,那就不成問題。但是,如果是這樣,我建議將sampleTime放在每個連接上:

SELECT 
    d.id AS device, 
    COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime, 
    m.magnitude, 
    b.pressure, 
    t.temperature 
FROM device AS d 
    LEFT JOIN magnitude_log AS m ON d.id = m.device AND m.sampleTime BETWEEN '2011-10-11' AND '2011-10-17' 
    LEFT JOIN barometer_log AS b ON d.id = b.device AND b.sampleTime BETWEEN '2011-10-11' AND '2011-10-17' 
    LEFT JOIN temperature_log AS t ON d.id = t.device AND t.sampleTime BETWEEN '2011-10-11' AND '2011-10-17' 
WHERE d.id = 1000 
GROUP BY device, sampleTime 
HAVING sampleTime IS NOT NULL 

希望有所幫助!

+0

爲什麼'IFNULL(x,NULL)'? (與x不同的是什麼?) –

+0

應該沒有區別,你是對的!我有點偏執。 – mikn

+0

謝謝,我會給它一個 – anzaan

0

如果您正在查詢一個小時間範圍和很多設備,您可能需要考慮倒轉PK索引(timeRange,device)。

然後,您可能需要設備或(設備,時間範圍)上的二級索引。

+0

我一次查詢一個設備的數據並按時間範圍提取數據 – anzaan