慢MySQL查詢佔滿了我的磁盤空間

這是我當前正在運行的查詢（28小時過去了！）：慢MySQL查詢佔滿了我的磁盤空間

drop table if exists temp_codes; 
create temporary table temp_codes 
    select distinct CODE from Table1; 
alter table temp_codes 
    add primary key (CODE); 

drop table if exists temp_ids; 
create temporary table temp_ids 
    select distinct ID from Table1; 
alter table temp_ids 
    add primary key (ID); 

drop table if exists temp_ids_codes; 
create temporary table temp_ids_codes 
    select ID, CODE 
    from temp_ids, temp_codes; 

alter table temp_ids_codes 
    add index idx_id(ID), 
    add index idx_code(CODE); 

insert into Table2(ID,CODE,cnt) 
select 
    a.ID, a.CODE, coalesce(count(t1.ID), 0) 
from 
    temp_ids_codes as a 
    left join Table1 as t1 on (a.ID = t1.ID and a.CODE=t1.CODE) 
group by 
    a.ID, a.CODE;

我的表是這樣的（表1）：

ID   CODE 
----------------- 
0001  345 
0001  345 
0001  120 
0002  567 
0002  034 
0002  567 
0003  567 
0004  533 
0004  008 
...... 
(millions of rows)

而且我運行上面的查詢，以獲得本（表2）：

ID CODE CNT 
1 008  0 
1 034  0 
1 120  1 
1 345  2 
1 533  0 
1 567  0 
2 008  0 
2 034  1 
...

CNT是每個代碼的計數的每個ID .. 如何以最佳方式實現此目的以提高性能並且不使用磁盤空間？謝謝

來源

2013-08-06 user2578185

您確定只有6個編碼？我懷疑交叉連接產生的數據比你想象的要多得多。 –

不，我有成千上萬的代碼...這只是一個樣本 – user2578185

用LIMIT 1000開始查詢並查看結果有什麼問題 – jaczes

你是數以百萬計的id乘以數千碼，並想知道爲什麼你佔用磁盤空間。您正在生成數十億行。這將需要很長時間。

我可能會提出一些建議（應該重新啓動進程還是讓資源並行運行）。

首先，將中間結果保存在實際表格中，可能在另一個數據庫（「myTmp」）中，這樣您就可以監視進度。

其次，在最終查詢中加入前進行聚合。事實上，由於使用的是臨時表，把這個表中的第一：

select t1.ID, t1.CODE, count(*) as cnt 
from Table1 as t1 
group by t1.ID, t1.CODE;

現在，你是包括所有的額外代碼，然後乘以分組的原始數據。

然後從完整的表格加入到這個表格中。

另一種方法是給在原表上的索引，並試試這個：

insert into Table2(ID,CODE,cnt) 
select a.ID, a.CODE, 
     (select count(*) from Table1 t1 where a.ID = t1.ID and a.CODE=t1.CODE) as cnt 
from temp_ids_codes a 
group by a.ID, a.CODE;

這可能看起來有點反常，但它會使用索引表1上的相關子查詢。我不喜歡用SQL來玩這樣的遊戲，但這可能會導致查詢在我們有生之年完成。

來源

2013-08-06 12:08:52

哪裏是WHERE子句：

create temporary table temp_ids_codes 
select ID, CODE 
from temp_ids, temp_codes;

表應該有collumns PK ID, CODE

來源

2013-08-06 12:03:42 jaczes

我沒有一個...我只想獲得每個ID上的每個代碼的計數（包括零計數） – user2578185

如果我的查詢得到更快在這些列上有PK？ – user2578185

是的，但是@Gordon Linoff給出了更好的解決方案 - 對於他的解決方案，您可以添加PK – jaczes

你可以嘗試大意如下的東西（未經測試查詢）：

select a.ID, 
     a.CODE, 
     coalesce(b.countvalue), 0) 
from temp_ids_codes as a 
left join (select count(t1.ID) as countvalue 
      from Table1 as t1 
      group by a.ID, a.CODE 
      ) b

現在你的小組通過將只安裝在需要分組（而不是對所有的0計數記錄）記錄運行。正確的指數也可以產生巨大的差異。

來源

2013-08-06 12:11:20 Sam

慢MySQL查詢佔滿了我的磁盤空間

回答

相關問題