2017-10-18 55 views
1

我有一個數據框(df2),帶有兩個變量,Mood和PartOfTown,其中Mood是多選(即允許的選項的任意組合)問題評分人的快樂, PartOfTown描述了地理位置。在R中記錄逗號分隔條目

問題是,中心代碼的情緒不同,城市北部的中心使用NorthCode,南部的中心使用SouthCode(df1)。

我想將數據集(df2)中的所有條目重新編碼到SouthCode中,以便最終生成一個類似df3的數據集。我想要一個通用的解決方案,因爲可能會有新的組合,目前在數據集中不包含新的組合。任何想法都會非常感激。

中心代碼和定義的情緒:

df1 <- data.frame(NorthCode=c(4,5,6,7,99),NorthDef=c("happy","sad","tired","energetic","other"),SouthCode=c(7,8,9,5,99),SouthDef=c("happy","sad","tired","energetic","other")) 

起點:

df2 <- data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"),Region=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south")) 

期望的結果:

df3 <- data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"),PartofTown=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south")) 

當前的嘗試:試圖通過拆分項目開始的,但無法讓它工作。

unlist(strsplit(df2$Mood, ",")) 

回答

1

你跟strsplit在正確的道路上,但你需要添加stringsAsFactors = F到as.data.frame(),以確保這種情緒是一個特徵向量,不是一個因素。 之後,您可以將分隔的元素保存爲列表,並使用lapply()將舊代碼與新代碼進行匹配。

df1 <- 
    data.frame(NorthCode=c(4,5,6,7,99), 
      NorthDef=c("happy","sad","tired","energetic","other"), 
      SouthCode=c(7,8,9,5,99), 
      SouthDef=c("happy","sad","tired","energetic","other"), 
      stringsAsFactors = F) 

df2 <- 
    data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"), 
      Region=c("north","north","north","north","north","north","north","south","south","south","south" ,"south","south","south"), 
      stringsAsFactors = F) 

df3 <- 
    data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"), 
      PartofTown=c("north","north","north","north","north","north","north","south","south","south","south" ,"south","south","south"), 
      stringsAsFactors = F) 

# Split the Moods into separate values 
splitCodes <- strsplit(df2$Mood, ",") 
# Add the Region as the name of each element in the new list 
names(splitCodes) <- df2$Region 

# Recode the values by matching the north values to the south values 
recoded <- 
    lapply(
    seq_along(splitCodes), 
    function(x){ 
     ifelse(rep(names(splitCodes[x]) == "north", length(splitCodes[[x]])), 
      df1$SouthCode[match(splitCodes[[x]], df1$NorthCode)], 
      splitCodes[[x]]) 
    } 
) 

# Add the recoded values back to df2 
df2$recoded <- 
    sapply(recoded, 
     paste, 
     collapse = ",") 

# Check if the recoded values match your desired values  
identical(df2$recoded, df3$Mood)