2017-06-12 71 views
0

我試圖爲子集中的值匹配條件的因子lepsp的空白級別指定名稱。數據的例子包括:根據數據框子集內的值匹配重命名因子的級別

df<- 
    plantfam  lepfam   lepsp    lepcn 
    Asteraceae  Geometridae Eois sp   green/spikes 
    Asteraceae  Erebidae  Anoba sp   green/nospikes      
    Asteraceae  Erebidae       green/nospikes    
    Melastomaceae Noctuidae  Balsinae sp    
    Poaceae   Erebidae  Deinopa sp   black/orangespots 
    Poaceae   Erebidae       black/orangespots 
    Poaceae   Erebidae  Cocytia sp   black/yellowspots 
    Poaceae           black/yellowspots 

下面是以下數據框代碼:

df<-data.frame(plantfam= c("Asteraceae","Asteraceae","Asteraceae", 
"Melastomaceae","Poaceae","Poaceae","Poaceae","Poaceae"), lepfam= 
c("Geometridae", "Erebidae","Erebidae", 
"Noctuidae","Erebidae","Erebidae","Erebidae",""), lepsp= c("Eois sp", 
"Anoba sp", "", "Balsinae sp", "Deinopa sp", "", "Cocytia sp", ""), 
lepcn= c("green/spikes","green/nospikes", "green/nospikes","", 
"black/orangespots", "black/orangespots", "black/yellowspots", 
"black/yellowspots")) 

如果lepsp是空白的,但有一個lepcnlepcn比賽另一個lepsp在同一plantfam爲食, lepsp的空白將被賦予lepsp這些條件匹配的名稱。因此,每個lepfam子集飼餵相同的plantfam與相同lepcn將被指定爲相同的名稱。

output<- 
    plantfam  lepfam   lepsp    lepcn 
    Asteraceae  Geometridae Eois sp   green/spikes 
    Asteraceae  Erebidae  Anoba sp   green/nospikes      
    Asteraceae  Erebidae  Anoba sp   green/nospikes    
    Melastomaceae Noctuidae  Balsinae sp    
    Poaceae   Erebidae  Deinopa sp  black/orangespots 
    Poaceae   Erebidae  Deinopa sp  black/orangespots 
    Poaceae   Erebidae  Cocytia sp  black/yellowspots 
    Poaceae      Cocytia sp  black/yellowspots 

我曾嘗試沒有成功以下的變化: 與檢查組合的益處https://stackoverflow.com/a/44479195/8061255

+0

您能否提供一個數據集樣本,以便我們能夠生成可重現的解決方案? –

+0

我在印象之下,上面是數據集的一個例子。我能提供什麼可以進一步幫助?感謝您的時間。 – Danielle

+0

我已經爲示例數據框添加了代碼,這可能是您要求的內容。再次感謝您的幫助。 – Danielle

回答

0

直截了當基礎R進行重命名。在本質上,你得到的plantfam/lepfam/lepcn組合的一個單獨的列表,你與原始數據集將其合併在:

讀取數據,並作出預期確定的格式:

df<- read.csv(text = 
'plantfam,lepfam,lepsp,lepcn 
Asteraceae,Geometridae,Eois sp,green/spikes 
Asteraceae,Erebidae,Anoba sp,green/nospikes 
Asteraceae,Erebidae,NA,green/nospikes 
Melastomaceae,Noctuidae,Balsinae sp,NA 
Poaceae,Erebidae,Deinopa sp,black/orangespots 
Poaceae,Erebidae,NA,black/orangespots 
Poaceae,Erebidae,NA,balck/yellowspots') 

# assumes blanks are NA 
# if blanks are actually empty strings "" then turn those into NA's 

# make sure everything is a character, not a factor 
df <- as.data.frame(apply(df,2,as.character),stringsAsFactors = F) 

解決方案:

# get a unique list of all combinations that don't have missing data 
dflookup <- unique(na.omit(df)) 

# inspect combinations to be renamed, there should be no duplicate plantfam/lepfam/lepcn combinations 
dflookup 

# use the lookup to merge in all known names 
newdf <- merge(df,dflookup,by = c('plantfam','lepfam','lepcn'),all.x = T,suffixes = c('old','new')) 

# use original lepsp when new lepsp is NA 
newdf$lepsp <- ifelse(is.na(newdf$lepspnew),newdf$lepspold,newdf$lepspnew) 

# remove unneeded columns 
newdf$lepspold <- newdf$lepspnew <- NULL 

# turn back into factors if desired 
newdf <- as.data.frame(apply(newdf,2,as.factor)) 
相關問題