bigQuery - 如何使用行值爲新表創建列

我在BigQuery中有以下基因表（超過12K行）。在PIK3CA_features（列2）的長列表中與同一sample_id（第1列）bigQuery - 如何使用行值爲新表創建列

Row sample_id PIK3CA_features 
1 hu011C57 chr3_3930069__TGT  
2 hu011C57 chr3_3929921_TC 
3 hu011C57 chr3_3929739_TC 
4 hu011C57 chr3_3929813__T 
5 hu011C57 chr3_3929897_GA 
6 hu011C57 chr3_3929977_TC 
7 hu011C57 chr3_3929783_TC

我想生成如下表：

Row sample_id chr3_3930069__TGT chr3_3929921_TC chr3_3929739_TC 
1 hu011C57 1     1    0 
2 hu011C58 0

含義，一個排對於每個樣本ID和1/0，如果PIK3CA_feature存在於此樣本。

任何想法如何輕鬆地生成此表？

非常感謝您的任何想法！

來源

2017-06-15 eilalan

您可以通過對示例ID進行分組來完成此操作。

SELECT 
    sample_id, 
    COUNTIF(PIK3CA_features = 'chr3_3930069__TGT') as chr3_3930069__TGT, 
    COUNTIF(PIK3CA_features = 'chr3_3929921_TC') as chr3_3929921_TC, 
    COUNTIF(PIK3CA_features = 'chr3_3929739_TC') as chr3_3929739_TC 
FROM [your_table] 
GROUP BY sample_id;

假設您沒有每個樣品ID的重複PIK3CA_features，這應該給你你所需要的。

來源

2017-06-15 01:44:20

想到的是使用ARRAYS and STRUCTS概念得到一定程度接近你需要什麼，像這樣唯一的想法：

WITH data AS(
SELECT 'hu011C57' sample_id, 'chr3_3930069__TGT' PIK3CA_features union all 
SELECT 'hu011C57', 'chr3_3929921_TC' union all 
SELECT 'hu011C57', 'chr3_3929739_TC' union all 
SELECT 'hu011C57', 'chr3_3929813__T' union all 
SELECT 'hu011C57', 'chr3_3929897_GA' union all 
SELECT 'hu011C57', 'chr3_3929977_TC' union all 
SELECT 'hu011C57', 'chr3_3929783_TC' union all 
SELECT 'hu011C58', 'chr3_3929783_TC' union all 
SELECT 'hu011C58', 'chr3_3929921_TC' 
), 

all_features AS (
    SELECT DISTINCT PIK3CA_features FROM data 
), 

aggregated_samples AS(
    SELECT 
    sample_id, 
    ARRAY_AGG(DISTINCT PIK3CA_features) features 
FROM data 
GROUP BY sample_id 
) 

SELECT 
    sample_id, 
    ARRAY(SELECT AS STRUCT PIK3CA_features, PIK3CA_features IN (SELECT feature FROM UNNEST(features) feature) FROM all_features AS present ORDER BY PIK3CA_features) features 
FROM aggregated_samples

這將返回給你每sample_id一個行的記者陣列每個特徵結構都在sample_id中存在。

由於BigQuery原生支持這種類型的數據結構，因此您可以在不丟失高級分析（如使用分析函數，子查詢等）的任何容量的情況下擁有這種數據表示。

來源

2017-06-15 04:44:06

bigQuery - 如何使用行值爲新表創建列

回答

相關問題