我有一個數據幀像這樣:使用dplyr更新給定因素的匹配空白水平等因子水平
df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "
plantfam,lepfam,lepsp\n
Asteraceae,Geometridae,Eois sp\n
Asteraceae,Erebidae,\n
Poaceae,Erebidae,\n
Poaceae,Noctuidae,\n
Asteraceae,Saturnidae,Polyphemous sp\n
Melastomaceae,Noctuidae,\n
Asteraceae,,\n
Melastomaceae,,\n
,Noctuidae,\n
,Erebidae,\n
Poaceae, Erebidae,\n")
我想作唯一lepsp
名稱上的plantfam
和lepfam
獨特的組合條件。每個lepfam必須首先被子集化。並且對於該lepfam子集內的每個獨特組合,指定一個morpho物種名稱。對於那些plantfam或lepfam是空白的,沒有指定morpho物種。重複plantfam
lepfam
組合應給予相同的形態物種名稱。輸出應該是這樣的:
output<-
plantfam lepfam lepsp
Asteraceae Geometridae Eois sp
Asteraceae Erebidae Erebidae_morphosp1
Poaceae Erebidae Erebidae_morphosp2
Poaceae Noctuidae Noctuidae_morphosp1
Asteraceae Saturnidae Polyphemous sp
Melastomaceae Noctuidae Noctuidae_morphosp2
Asteraceae
Melastomaceae
Noctuidae
Erebidae
Poaceae Erebidae Erebidae_morphosp2
我曾嘗試:
condition <- quote(lepsp == "" & plantfam != "" & lepfam != "")
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>%
mutate(lepsp=
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam))))
subset2 <- df %>% filter(condition) %>% setdiff(df, .)
union(subset1, subset2) %>% arrange(lepsp)
然而,兩行Poaceae
和Erebidae
回報不同morphosp號Erebidae_morphosp1
和Erebidae_morphosp2
時,他們應該是相同的。
Source: local data frame [11 x 3]
Groups: lepfam [6]
plantfam lepfam lepsp
<chr> <chr> <chr>
1 Melastomaceae
2 Asteraceae
3 Poaceae Erebidae Erebidae_morphosp1
4 Asteraceae Geometridae Eois sp
5 Asteraceae Erebidae Erebidae_morphosp1
6 Poaceae Erebidae Erebidae_morphosp2
7 Erebidae Erebidae_morphosp3
8 Poaceae Noctuidae Noctuidae_morphosp1
9 Melastomaceae Noctuidae Noctuidae_morphosp2
10 Noctuidae Noctuidae_morphosp3
11 Asteraceae Saturnidae Polyphemous sp
什麼'condition'? – Masoud
對於那些空白並且有'plantfam'和'lepfam'名字的'lepsp' – Danielle