2017-05-27 29 views
3

說我有以下數據集:創建粘貼或刪除基於不同的場景元素的循環

mydf <- data.frame("MemberID"=c("111","0111A","0111B","112","0112A","113","0113B"), 
        "resign.date"=c("2013/01/01",NA,NA,"2014/03/01",NA,NA,NA))            

注:111,112和113是爲家庭代表的ID。

我希望做兩件事情:

一)如果我有在111的情況下,一個家庭代表實例的辭職日期,我想粘貼0111A和0111B同樣的辭職日期(這些如果你想知道的話,代表111的配偶和子女)
b)如果我沒有家庭代表的辭職日期,比如113,我只想刪除113和0113B行。

我得到的數據幀應該是這樣的:

mydf <- data.frame("MemberID"=c("111","0111A","0111B","112","0112A"), 
        "resign.date"=c("2013/01/01","2013/01/01","2013/01/01","2014/03/01","2014/03/01")) 

在此先感謝。

+0

你有隻爲MemberIDs''resign.date無尾的信嗎? – simone

+0

@simone是resign.date僅適用於無後綴字母的memberID。 –

+0

在這種情況下,看看下面的解決方案是否是你要找的 – simone

回答

1

resign.date如果只存在於(部分)MembersID無拖尾字母,溶液使用data.table

library(data.table) 

df <- data.table("MemberID"=c("0111","0111A","0111B","0112","0112A","0113","0113B"), 
       "resign.date"=c("2013/01/01",NA,NA,"2014/03/01",NA,NA,NA)) 

df <- df[order(MemberID)] ## order data : MemberIDs w/out trailing letters first by ID 
df[, myID := gsub("\\D+", "", MemberID)] ## create myID col : MemberID w/out trailing letters 

df[ , my.resign.date := resign.date[1L], by = myID] ##assign first occurrence of resign date by myID 
df <- df[!is.na(my.resign.date)] ##drop rows if my.resign.date is missing 

EDIT

如果MemberID不一致性(有的已前導0有的沒有)你可以嘗試一些工作,如下所示

df <- data.table("MemberID"=c("111","0111A","0111B","112","0112A","113","0113B"), 
       "resign.date"=c("2013/01/01",NA,NA,"2014/03/01",NA,NA,NA)) 

df[, myID := gsub("(?<![0-9])0+", "", gsub("\\D+", "", MemberID), perl = TRUE)] 
df <- df[order(myID, -MemberID)] 

df[ , my.resign.date := resign.date[1L], by = myID] 
df <- df[!is.na(my.resign.date)] 
+0

是的,它的工作原理。我從來沒有使用過data.table。你能告訴我第3行和第4行嗎? –

+0

查看編輯答案中的註釋 – simone

+0

另外,實際文件有一些不一致之處,例如有時ID爲'113',配偶/子女ID爲'0113A'和'0113B'。一個更好的代碼可能是搜索「113」,並對帶有後綴字母的ID執行粘貼和/或刪除操作。我編輯了我的問題。 –

1

我們也可以使用tidyverse

library(tidyverse) 
mydf %>% 
    group_by(grp = parse_number(MemberID)) %>% 
    mutate(resign.date = first(resign.date)) %>% 
    na.omit() %>% 
    ungroup() %>% 
    select(-grp) 
# A tibble: 5 x 2 
# MemberID resign.date 
# <fctr>  <fctr> 
#1  0111 2013/01/01 
#2 0111A 2013/01/01 
#3 0111B 2013/01/01 
#4  0112 2014/03/01 
#5 0112A 2014/03/01