2014-10-02 30 views
0

不知道這是所有的R編碼,我知道將消除基於總和值整行,但這裏怎麼詞是什麼,我想做一個例子行內的列。保持了比給定值等於或大於

我想借此從個別網站採取的分類信息,但只保留其中的代表總體樣本中最低的三倍水平。

例如下表中,雖然在河英里15雙翅目被認定爲存在一旦訂單 - 雙翅目整體出現在樣品的38倍,所以我想保留該行。同樣的Chaetocladius,雖然它出現在RM0.7一次在樣品中出現5次,所以我會保留它。

此外,對於在一個水平似乎足夠的時間,以保持情況下,有那些正確的是罕見的,需要拆除,並用NA來替換。例如,在RM15的情況下,訂購Blattoidea或RM80的情況下,Chironomus atroviridis物種只出現一次,但昆蟲綱和搖蚊屬現在有足夠的時間保存,因此我想保留這些水平,但用NAs替代其餘水平。

RM phylum  class order family   genus   species    Sum 
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius Chaetocladius mel 1 
15 Arthropoda Insecta Diptera NA    NA    NA 1 
15 Arthropoda Insecta Blattoidea NA   NA    NA 1 
0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus atroviridis 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1 
0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29 

新的輸出應該是這樣的 -

RM phylum class order family genus species Sum 
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
15 Arthropoda Insecta Diptera NA    NA    NA 1 
15 Arthropoda Insecta NA  NA    NA    NA 1 
0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus  NA 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1 
0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29 

我已經彙總列出了這些類羣的每個級別爲3或更大的價值,我想也許我可以工作,我的方式通過每個(從門到物種),但無法弄清楚如何去做。

請幫忙。

回答

0

有可能做到這一點更簡單的方法,但是這提供你想要的輸出。它被包裝在一個函數clean_data中,您可以指定必須存在多少次保留。在這種情況下,所提供的數據中不出現兩次以上的數據將被NA所取代。這是否符合您的需求?

dat <- read.table(header=T, text=' 
RM phylum  class order family   genus   species    Sum 
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius "Chaetocladius mel" 1 
        15 Arthropoda Insecta Diptera NA    NA    NA 1 
        15 Arthropoda Insecta Blattoidea NA   NA    NA 1 
        0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
        54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
        35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
        80 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus atroviridis" 2 
        80 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus bifurcatus" 1 
        0.5 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus bifurcatus" 29 
        ') 

clean_data <- function(dat, repeats){ 
    # get the counts of each level within each column 
    counts <- sapply(dat[,colnames(dat) != c("RM", "Sum")], table) 

    # convert data to matrix for indexing 
    dat <- as.matrix(dat) 

    indices <- unlist(
    # get indices of where the elements are in data matrix 
    lapply(
     # remove list elements that are character(0) 
     Filter(length, 
        # find which levels are only present 'repeats' times 
        lapply(counts,FUN = function(x) names(which(x < repeats)))), 
     FUN = function(y) which(dat %in% y))) 

    # set indices to NA 
    dat[indices] <- NA 
    return(as.data.frame(dat)) 
} 

clean_data(dat, 2) 

> clean_data(dat, 2) 
    RM  phylum class order  family   genus    species Sum 
1 0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 1 
2 15.0 Arthropoda Insecta Diptera   <NA>   <NA>     <NA> 1 
3 15.0 Arthropoda Insecta <NA>   <NA>   <NA>     <NA> 1 
4 0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 1 
5 54.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 2 
6 35.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 2 
7 80.0 Arthropoda Insecta Diptera Chironomidae Chironomus     <NA> 2 
8 80.0 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1 
9 0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29 
+0

感謝您的幫助。這工作,我同意必須有一個更簡單的方法,但我還沒有找到它。 – 2014-10-16 14:41:37

相關問題