使用dplyr更新給定因素的匹配空白水平等因子水平

我有一個數據幀像這樣：使用dplyr更新給定因素的匹配空白水平等因子水平

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
text = " 
plantfam,lepfam,lepsp\n 
      Asteraceae,Geometridae,Eois sp\n 
      Asteraceae,Erebidae,\n 
      Poaceae,Erebidae,\n 
      Poaceae,Noctuidae,\n 
      Asteraceae,Saturnidae,Polyphemous sp\n 
      Melastomaceae,Noctuidae,\n 
      Asteraceae,,\n 
      Melastomaceae,,\n 
      ,Noctuidae,\n 
      ,Erebidae,\n 
      Poaceae, Erebidae,\n")

我想作唯一lepsp名稱上的plantfam和lepfam獨特的組合條件。每個lepfam必須首先被子集化。並且對於該lepfam子集內的每個獨特組合，指定一個morpho物種名稱。對於那些plantfam或lepfam是空白的，沒有指定morpho物種。重複plantfamlepfam組合應給予相同的形態物種名稱。輸出應該是這樣的：

output<- 
plantfam  lepfam      lepsp 
Asteraceae  Geometridae     Eois sp   
Asteraceae  Erebidae     Erebidae_morphosp1     
Poaceae   Erebidae     Erebidae_morphosp2 
Poaceae   Noctuidae     Noctuidae_morphosp1  
Asteraceae  Saturnidae     Polyphemous sp   
Melastomaceae Noctuidae     Noctuidae_morphosp2 
Asteraceae    
Melastomaceae 
       Noctuidae 
       Erebidae 
Poaceae   Erebidae     Erebidae_morphosp2

我曾嘗試：

condition <- quote(lepsp == "" & plantfam != "" & lepfam != "") 
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>% 
mutate(lepsp= 
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam)))) 
subset2 <- df %>% filter(condition) %>% setdiff(df, .) 
union(subset1, subset2) %>% arrange(lepsp)

然而，兩行Poaceae和Erebidae回報不同morphosp號Erebidae_morphosp1和Erebidae_morphosp2時，他們應該是相同的。

Source: local data frame [11 x 3] 
Groups: lepfam [6] 

        plantfam  lepfam    lepsp 
         <chr>  <chr>    <chr> 
1     Melastomaceae         
2      Asteraceae         
3       Poaceae Erebidae Erebidae_morphosp1 
4      Asteraceae Geometridae    Eois sp 
5      Asteraceae Erebidae Erebidae_morphosp1 
6       Poaceae Erebidae Erebidae_morphosp2 
7         Erebidae Erebidae_morphosp3 
8       Poaceae Noctuidae Noctuidae_morphosp1 
9     Melastomaceae Noctuidae Noctuidae_morphosp2 
10         Noctuidae Noctuidae_morphosp3 
11      Asteraceae Saturnidae  Polyphemous sp

來源

2017-06-29 Danielle

什麼'condition'？ – Masoud

對於那些空白並且有'plantfam'和'lepfam'名字的'lepsp' – Danielle

我認爲這個問題可能僅是在你df，最後一行有Erebidae前的空間，從而導致R鍵認爲這是從另外一個不同的。

我發現，當我正在完成我的答案。這裏'我將如何做你想做的事情。我先介紹一組lepfam_number之前的mutate來粘貼。

library(dplyr) 
df %>% 
    group_by(lepfam) %>% 
    mutate(lepfam_number= match(plantfam, unique(plantfam)), 
     lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="", 
       paste0(lepfam,"_morphosp",lepfam_number), 
       lepsp) 
) 

        plantfam  lepfam    lepsp lepfam_number 
         <chr>  <chr>    <chr>   <int> 
1     Asteraceae Geometridae    Eois sp    1 
2     Asteraceae Erebidae Erebidae_morphosp1    1 
3      Poaceae Erebidae Erebidae_morphosp2    2 
4      Poaceae Noctuidae Noctuidae_morphosp1    1 
5     Asteraceae Saturnidae  Polyphemous sp    1 
6    Melastomaceae Noctuidae Noctuidae_morphosp2    2 
7     Asteraceae            1 
8    Melastomaceae            2 
9        Noctuidae         3 
10        Erebidae         3 
11     Poaceae Erebidae Erebidae_morphosp2    2

數據

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
       text = " 
plantfam,lepfam,lepsp\n 
      Asteraceae,Geometridae,Eois sp\n 
      Asteraceae,Erebidae,\n 
      Poaceae,Erebidae,\n 
      Poaceae,Noctuidae,\n 
      Asteraceae,Saturnidae,Polyphemous sp\n 
      Melastomaceae,Noctuidae,\n 
      Asteraceae,,\n 
      Melastomaceae,,\n 
      ,Noctuidae,\n 
      ,Erebidae,\n 
      Poaceae,Erebidae,\n")

來源

2017-06-29 18:04:55

好的！如果你有一點時間，我試着去了解'匹配'在這裏工作的方式。據我瞭解，*禾本科*在'獨特（plantfam）'中位置2。在第3和第4行中，它被認爲是2和1-是因爲前面的'group_by（lepfam）'？也許我誤解了'group_by'？謝謝您的幫助。 –

@LukeC是的，因爲我首先由lepfam分組，在該特定組中，禾本科的獨特（plantfam）總是2。 –

@P Lapointe明白了，這很有道理 - 謝謝澄清！ –

使用dplyr更新給定因素的匹配空白水平等因子水平

回答

相關問題