2014-04-26 45 views
1

我有這個sql查詢,這需要很長時間才能完成。我如何加快速度?SQL IN子句更有效的方法

t_inter_specises_interaction有60k行,t_pathway有100k。 uniprot_id_1,uniprot_id_2,uniprot_id是varchar類型。

在該查詢中,我想選擇uniprot_id_1和uniprot_id_2其在t_pathway兩者都存在:

select distinct uniprot_id_1,uniprot_id_2 from t_intra_species_interaction 
where uniprot_id_1 in (select uniprot_id from t_pathway) and 
    uniprot_id_2 in (select uniprot_id from t_pathway) 

在這其中,我想選擇uniprot_id其存在於由所述第一查詢返回的集合uniprot_ids以上。

select distinct uniprot_id,id from t_pathway as t 
where uniprot_id in 
(
    select distinct uniprot_id_2 from t_intra_species_interaction 
    where uniprot_id_1 in (select uniprot_id from t_pathway) and 
     uniprot_id_2 in (select uniprot_id from t_pathway) 
    union 
    select distinct uniprot_id_1 from t_intra_species_interaction 
    where uniprot_id_1 in (select uniprot_id from t_pathway) and 
     uniprot_id_2 in (select uniprot_id from t_pathway) 
) 

謝謝。

+3

考慮提供適當的DDL(和/或sqlfiddle)連同所需的結果集 – Strawberry

+1

使用exists代替 –

+0

這是緩慢的,因爲在那些'IN()'子句中的子查詢是正在爲't_intra_species_interaction'中的每個匹配行運行。 – DanMan

回答

2

的子查詢是相同的,因此它們可以被合併成一個,並移動到加入

SELECT DISTINCT i.uniprot_id_1, i.uniprot_id_2 
FROM t_intra_species_interaction i 
     INNER JOIN t_pathway p ON p.uniprot_id IN (i.uniprot_id_1, i.uniprot_id_2) 

第二個查詢
它必須是更好地打開一個新的問題,指的是這個,而是看我以前的查詢應該很容易地看到,讓你的第二個答案,你只需要得到從列t_pathway代替t_intra_species_interaction

SELECT DISTINCT p.uniprot_id, p.id 
FROM t_intra_species_interaction i 
     INNER JOIN t_pathway p ON p.uniprot_id IN (i.uniprot_id_1, i.uniprot_id_2) 
+0

謝謝,這個解決方案快得多:) – Viplime

2

您可能需要使用INNER JOIN:

select distinct uniprot_id_1,uniprot_id_2 from t_intra_species_interaction i 
inner join t_pathway p1 
    on p1.uniprod_id = t.uniprot_id_1 
inner join t_pathway p2 
    on p2.uniprod_id = t_uniprot_id_2 
+1

您必須每兩次加入't_pathway',一次'uniprod_id'。 –

+0

@JoachimIsaksson你說得對,對不起,我的壞。 –

+0

爲什麼這個表現要比在? –

1

有一般guildline:

創建三個指標,一個在t_pathway.uniport_id,一個在t_intra_species_interaction.uniport_id1,另一個在t_intra_species_interaction.uniport_id2

通過這種所有你需要的數據都在你的索引中,並且它應該很快

另外c將你的子句反轉爲左連接Tomas在他的回答中提到了

2

EXISTSJOIN會更有效率。

1

如何:

select distinct uniprot_id_1, uniprot_id_2 
from t_intra_species_interaction 
where exists (select uniprot_id from t_pathway 
       where uniprot_id_1 = uniprot_id) and 
     exists (select uniprot_id from t_pathway 
       where uniprot_id_2 = uniprot_id) 
+1

你需要兩個'存在'。 –

2

試試這個:

select distinct uniprot_id_1,uniprot_id_2 
from t_intra_species_interaction I 
    join t_pathway P1 
    on I.uniprot_id_1 = P1.uniprot_id 
    join t_pathway P2 
    on I.uniprot_id_2 = P2.uniprot_id 

select distinct uniprot_id_1,uniprot_id_2 
from t_intra_species_interaction I 
where exists (select 1 from t_pathway where uniprot_id = I.uniprot_id_1) 
    and exists (select 1 from t_pathway where uniprot_id = I.uniprot_id_2) 
+0

如果使用或代替2個子查詢,存在會更快 –

+0

@MhalidJunaid但是,這對應於條件之間的「OR」而不是「AND」操作。 –

+0

@JoachimIsaksson,你的意思是?你可以解釋嗎? –