- [R data.table計算在其他列在多行空調

我已經尋找數據與此類似：- [R data.table計算在其他列在多行空調

Sample.Name Marker Height Size 
1: Sample01  A 450 100 
2: Sample01  A 420 120 
3: Sample01  B 700 140 
4: Sample01  C 750 160 
5: Sample01  D 300 180 
6: Sample01  D 340 200

其可以用下面的代碼被複制：

# Some example data. 
require(data.table) 
DT <- data.table(Sample.Name=rep("Sample01", 6), 
      Marker=c("A","A","B","C","D","D"), 
      Height=c(450,420,700,750,300,340), 
      Size=c(seq(from=100, to=200,length.out = 6)))

有一個或每個標記兩行高度和大小（可以是NA）。實際上，還有額外的列有等位基因值以及該例子不需要的其他信息。數據不一定按大小排序。

我想計算每個標記峯的高度（如果只有一個峯值，則爲NA）之間的比率（Hb）。血紅蛋白可以以幾種方式來計算：

1）較小的（即更低）峯高由較大的（即，更高的）峯高

2）更短的片段由峯值高度劃分的峯高除以更長的片段

3）與2）相反，但可以用與2）相同的策略來解決，所以我們不需要在這裏考慮它。

我正在寫一個函數，應該能夠執行所有三個計算，使用data.table。到目前爲止，我已經編寫的代碼使用兩步驟的方法來計算1）：

# Identify the smaller and larger peak height and count number of peaks. 
DT2 <- DT[, list(Small=min(Height), Large=max(Height), Peaks=.N), 
     by=list(Sample.Name, Marker)] 

# Divide only where there are two observed peaks. 
DT2[Peaks==2, Hb:=Small/Large, by=list(Sample.Name, Marker)]

這生成所期望的輸出：

>DT2 
    Sample.Name Marker Small Large Peaks  Hb 
1: Sample01  A 420 450  2 0.9333333 
2: Sample01  B 700 700  1  NA 
3: Sample01  C 750 750  1  NA 
4: Sample01  D 300 340  2 0.8823529

然而，我停留在如何計算2）。我會在尺寸看，以確定哪些兩個高度值分別分配給「短」與「長」。我已經諮詢了data.table幫助頁面和搜索計算器。遠離data.table語法的專家，我一直無法找到/識別針對這個特定問題的解決方案。 2期望的輸出）是與第一行除外其中血紅蛋白將四百二十零分之四百五= 1.071429

來源

2016-08-21 Oskar Hansson

對於第二計算等同於1），就可以這樣做：

DT[, .(Hb = ifelse(.N == 2, Height[Size == min(Size)]/Height[Size == max(Size)], NA_real_)) 
    , .(Sample.Name, Marker)]  # where you pick up the Height at the smaller size divided 
           # by the Height at the larger size. Note that you have to 
           # explicitly specify the NA type to be real here since data.table 
           # requires column type to be consistent 

# Sample.Name Marker  Hb 
# 1: Sample01  A 1.0714286 
# 2: Sample01  B  NA 
# 3: Sample01  C  NA 
# 4: Sample01  D 0.8823529

來源

2016-08-21 14:01:14 Psidom

此解決方案效果很好！另外，我學會了如何使用'ifelse'一次性計算。謝謝！ –

- [R data.table計算在其他列在多行空調

回答

相關問題