2017-06-29 59 views
0

我有一個數據幀像這樣:使用dplyr更新給定因素的匹配空白水平等因子水平

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
text = " 
plantfam,lepfam,lepsp\n 
      Asteraceae,Geometridae,Eois sp\n 
      Asteraceae,Erebidae,\n 
      Poaceae,Erebidae,\n 
      Poaceae,Noctuidae,\n 
      Asteraceae,Saturnidae,Polyphemous sp\n 
      Melastomaceae,Noctuidae,\n 
      Asteraceae,,\n 
      Melastomaceae,,\n 
      ,Noctuidae,\n 
      ,Erebidae,\n 
      Poaceae, Erebidae,\n") 

我想作唯一lepsp名稱上的plantfamlepfam獨特的組合條件。每個lepfam必須首先被子集化。並且對於該lepfam子集內的每個獨特組合,指定一個morpho物種名稱。對於那些plantfam或lepfam是空白的,沒有指定morpho物種。重複plantfamlepfam組合應給予相同的形態物種名稱。輸出應該是這樣的:

output<- 
plantfam  lepfam      lepsp 
Asteraceae  Geometridae     Eois sp   
Asteraceae  Erebidae     Erebidae_morphosp1     
Poaceae   Erebidae     Erebidae_morphosp2 
Poaceae   Noctuidae     Noctuidae_morphosp1  
Asteraceae  Saturnidae     Polyphemous sp   
Melastomaceae Noctuidae     Noctuidae_morphosp2 
Asteraceae    
Melastomaceae 
       Noctuidae 
       Erebidae 
Poaceae   Erebidae     Erebidae_morphosp2 

我曾嘗試:

condition <- quote(lepsp == "" & plantfam != "" & lepfam != "") 
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>% 
mutate(lepsp= 
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam)))) 
subset2 <- df %>% filter(condition) %>% setdiff(df, .) 
union(subset1, subset2) %>% arrange(lepsp) 

然而,兩行PoaceaeErebidae回報不同morphosp號Erebidae_morphosp1Erebidae_morphosp2時,他們應該是相同的。

Source: local data frame [11 x 3] 
Groups: lepfam [6] 

        plantfam  lepfam    lepsp 
         <chr>  <chr>    <chr> 
1     Melastomaceae         
2      Asteraceae         
3       Poaceae Erebidae Erebidae_morphosp1 
4      Asteraceae Geometridae    Eois sp 
5      Asteraceae Erebidae Erebidae_morphosp1 
6       Poaceae Erebidae Erebidae_morphosp2 
7         Erebidae Erebidae_morphosp3 
8       Poaceae Noctuidae Noctuidae_morphosp1 
9     Melastomaceae Noctuidae Noctuidae_morphosp2 
10         Noctuidae Noctuidae_morphosp3 
11      Asteraceae Saturnidae  Polyphemous sp 
+0

什麼'condition'? – Masoud

+0

對於那些空白並且有'plantfam'和'lepfam'名字的'lepsp' – Danielle

回答

0

我認爲這個問題可能僅是在你df,最後一行有Erebidae前的空間,從而導致R鍵認爲這是從另外一個不同的。

我發現,當我正在完成我的答案。這裏'我將如何做你想做的事情。我先介紹一組lepfam_number之前的mutate來粘貼。

library(dplyr) 
df %>% 
    group_by(lepfam) %>% 
    mutate(lepfam_number= match(plantfam, unique(plantfam)), 
     lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="", 
       paste0(lepfam,"_morphosp",lepfam_number), 
       lepsp) 
) 

        plantfam  lepfam    lepsp lepfam_number 
         <chr>  <chr>    <chr>   <int> 
1     Asteraceae Geometridae    Eois sp    1 
2     Asteraceae Erebidae Erebidae_morphosp1    1 
3      Poaceae Erebidae Erebidae_morphosp2    2 
4      Poaceae Noctuidae Noctuidae_morphosp1    1 
5     Asteraceae Saturnidae  Polyphemous sp    1 
6    Melastomaceae Noctuidae Noctuidae_morphosp2    2 
7     Asteraceae            1 
8    Melastomaceae            2 
9        Noctuidae         3 
10        Erebidae         3 
11     Poaceae Erebidae Erebidae_morphosp2    2 

數據

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
       text = " 
plantfam,lepfam,lepsp\n 
      Asteraceae,Geometridae,Eois sp\n 
      Asteraceae,Erebidae,\n 
      Poaceae,Erebidae,\n 
      Poaceae,Noctuidae,\n 
      Asteraceae,Saturnidae,Polyphemous sp\n 
      Melastomaceae,Noctuidae,\n 
      Asteraceae,,\n 
      Melastomaceae,,\n 
      ,Noctuidae,\n 
      ,Erebidae,\n 
      Poaceae,Erebidae,\n") 
+0

好的!如果你有一點時間,我試着去了解'匹配'在這裏工作的方式。據我瞭解,*禾本科*在'獨特(plantfam)'中位置2。在第3和第4行中,它被認爲是2和1-是因爲前面的'group_by(lepfam)'?也許我誤解了'group_by'?謝謝您的幫助。 –

+0

@LukeC是的,因爲我首先由lepfam分組,在該特定組中,禾本科的獨特(plantfam)總是2。 –

+0

@P Lapointe明白了,這很有道理 - 謝謝澄清! –