2011-10-19 64 views
4

我有一個大的數據幀,即時通訊與工作中提取重複行,前幾行如下:從數據幀

 Assay Genotype Sample Result 
1  001  G   1   0 
2  001  A   2   1 
3  001  G   3   0 
4  001  NA  1   NA 
5  002  T   1   0 
6  002  G   2   1 
7  002  T   2   0 
8  002  T   4   0 
9  003  NA  1   NA 

我總共將有2000個樣品和168個測定爲合作每個樣品。

我喜歡用相同的Assay和Sample來提取我有多個條目的行。我希望生成的數據位於包含所有重複條目的數據框中,按照重複條件彼此相鄰排序。從結果上面的例子是這樣的:

 Assay Genotype Sample Result 
1  001  G   1   0 
4  001  NA  1   NA 
6  002  G   2   1 
7  002  T   2   0 

回答

5

演示數據,便於裝載:

df <- structure(list(Assay = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L), Genotype = structure(c(2L, 1L, 2L, NA, 3L, 2L, 3L, 3L, NA), .Label = c("A", "G", "T"), class = "factor"), Sample = c(1L, 2L, 3L, 1L, 1L, 2L, 2L, 4L, 1L), Result = c(0L, 1L, 0L, NA, 0L, 1L, 0L, 0L, NA)), .Names = c("Assay", "Genotype", "Sample", "Result"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9")) 

你可以很容易地duplicated得到dupicated分析/樣品對:

vars <- c('Assay', 'Sample') 
dup <- df[duplicated(x[, vars]), vars] 

產生於:

> dup 
    Assay Sample 
4  1  1 
7  2  2 

需要簡單merge所需結果:

> merge(dup, df) 
    Assay Sample Genotype Result 
1  1  1  <NA>  NA 
2  1  1  G  0 
3  2  2  G  1 
4  2  2  T  0