我有一個mysql「問題」,我無法環繞我的頭。mysql - 在組的子組中找到唯一的字符串匹配
我有一個數據庫中的字符串表(實際上是基因型,但不應該是相關的),可以存在於任何一個到三個樣本中。我想爲每個樣本(c_id)的每個樣本(s_id)計算唯一等位基因的數量。例如給這個見下表:
id batch_id catalog_id sample_id tag_id allele depth
309 1 324 1 323 TCGC 244
1449616 1 324 2 7961 TCGC 192
2738325 1 324 2 1168472 CCGG 31
3521555 1 324 3 221716 TAAC 29
到目前爲止,我已經能夠構建以下代碼:
CREATE TABLE danumbers2
SELECT catalog_id,
count(case when sample_id = '1' and allele != 'consensus' then sample_id end) as SAMPLE1,
count(case when sample_id = '2' and allele != 'consensus' then sample_id end) as SAMPLE2,
count(case when sample_id = '3' and allele != 'consensus' then sample_id end) as SAMPLE3,
sum(case when sample_id = '1' and allele != 'consensus' then depth end) as DEPTH1,
sum(case when sample_id = '2' and allele != 'consensus' then depth end) as DEPTH2,
sum(case when sample_id = '3' and allele != 'consensus' then depth end) as DEPTH3,
count(distinct allele) AS ALLELECOUNT
from matches as danumbers
group by catalog_id
CREATE TABLE thehitlist_all
SELECT catalog_id,SAMPLE1,SAMPLE2,SAMPLE3,DEPTH1,DEPTH2,DEPTH3,ALLELECOUNT
FROM danumbers
WHERE(SAMPLE1>1 SAMPLE2>1 AND SAMPLE3>1 AND ALLELECOUNT>1 AND DEPTH2>10 AND DEPTH3>10)
其中給出這樣的結果:
catalog_id SAMPLE1 SAMPLE2 SAMPLE3 DEPTH1 DEPTH2 DEPTH3 ALLELECOUNT
324 1 2 1 244 223 29 4
結果基本上是每個樣品中等位基因總數的catalog_id排序計數,每個目錄編號爲的總不同等位基因的計數爲。我感興趣的是計算(但似乎無法弄清!)是樣本間不共享的「獨特」等位基因。換句話說,要爲每個樣本ID中的每個樣本找到診斷「等位基因」。
因此,對於上述以上數據的例子,我想表看起來這樣:
catalog_id SAMPLE1 SAMPLE2 SAMPLE3 ALLELECOUNT
324 0 1 1 2
任何想法,將不勝感激!請讓我知道如果我可以提供更多的信息,等等。
也許是一個嵌套在count不同的條件語句? – jasongallant 2012-07-30 18:33:21