在R中記錄逗號分隔條目

我有一個數據框（df2），帶有兩個變量，Mood和PartOfTown，其中Mood是多選（即允許的選項的任意組合）問題評分人的快樂， PartOfTown描述了地理位置。在R中記錄逗號分隔條目

問題是，中心代碼的情緒不同，城市北部的中心使用NorthCode，南部的中心使用SouthCode（df1）。

我想將數據集（df2）中的所有條目重新編碼到SouthCode中，以便最終生成一個類似df3的數據集。我想要一個通用的解決方案，因爲可能會有新的組合，目前在數據集中不包含新的組合。任何想法都會非常感激。

中心代碼和定義的情緒：

df1 <- data.frame(NorthCode=c(4,5,6,7,99),NorthDef=c("happy","sad","tired","energetic","other"),SouthCode=c(7,8,9,5,99),SouthDef=c("happy","sad","tired","energetic","other"))

起點：

df2 <- data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"),Region=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south"))

期望的結果：

df3 <- data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"),PartofTown=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south"))

當前的嘗試：試圖通過拆分項目開始的，但無法讓它工作。

unlist(strsplit(df2$Mood, ","))

來源

2017-10-18 LLL

你跟strsplit在正確的道路上，但你需要添加stringsAsFactors = F到as.data.frame（），以確保這種情緒是一個特徵向量，不是一個因素。之後，您可以將分隔的元素保存爲列表，並使用lapply（）將舊代碼與新代碼進行匹配。

df1 <- 
    data.frame(NorthCode=c(4,5,6,7,99), 
      NorthDef=c("happy","sad","tired","energetic","other"), 
      SouthCode=c(7,8,9,5,99), 
      SouthDef=c("happy","sad","tired","energetic","other"), 
      stringsAsFactors = F) 

df2 <- 
    data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"), 
      Region=c("north","north","north","north","north","north","north","south","south","south","south" ,"south","south","south"), 
      stringsAsFactors = F) 

df3 <- 
    data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"), 
      PartofTown=c("north","north","north","north","north","north","north","south","south","south","south" ,"south","south","south"), 
      stringsAsFactors = F) 

# Split the Moods into separate values 
splitCodes <- strsplit(df2$Mood, ",") 
# Add the Region as the name of each element in the new list 
names(splitCodes) <- df2$Region 

# Recode the values by matching the north values to the south values 
recoded <- 
    lapply(
    seq_along(splitCodes), 
    function(x){ 
     ifelse(rep(names(splitCodes[x]) == "north", length(splitCodes[[x]])), 
      df1$SouthCode[match(splitCodes[[x]], df1$NorthCode)], 
      splitCodes[[x]]) 
    } 
) 

# Add the recoded values back to df2 
df2$recoded <- 
    sapply(recoded, 
     paste, 
     collapse = ",") 

# Check if the recoded values match your desired values  
identical(df2$recoded, df3$Mood)

來源

2017-10-18 19:35:48 jpshanno

在R中記錄逗號分隔條目

回答

相關問題