2017-08-13 69 views
1

我試圖重塑使用tidyR。下面一個數據幀是數據幀:重塑重複行的列標題

data <- data.frame(class_name=c("date","date","educational","qualif","date","date",    "educational","qualif"), 
     text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY", 
      "Master of Science, Computer Science","1996","2000", 
      "MAHARASHTRA INSTITUTE OF TECHNOLOGY", 
      "Bachelor of Science, Mechanical Engineering")) 

我想數據看起來像下面的圖片:

1

回答

3

這是一個使用tidyverse的想法。我們基本上每4行分組並進行傳播。然而,我們需要在class_name獨特率先做出的名字,即

library(tidyverse) 

data %>% 
    group_by(grp = rep(seq(n()/4), each = 4)) %>% 
    mutate(class_name = make.unique(as.character(class_name))) %>% 
    spread(class_name, text_val) %>% 
    ungroup() %>% 
    select(educational, qualif, date, date.1) 

其中給出,

# A tibble: 2 x 4 
          educational          qualif date date.1 
*        <fctr>          <fctr> <fctr> <fctr> 
1 ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 2000 2003 
2 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996 2000 
+0

這是輝煌!答案接受。我對tidyverse很陌生,看起來很棒。感謝您提出。 – Vishnu

1

使用reshape(比索托斯的解決方案那麼優雅),另一種解決方案:

data <- data.frame(class_name=c("date","date","educational","qualif","date","date",    "educational","qualif"), 
     text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY", 
      "Master of Science, Computer Science","1996","2000", 
      "MAHARASHTRA INSTITUTE OF TECHNOLOGY", 
      "Bachelor of Science, Mechanical Engineering")) 
nrec <- 4 
data$id <- rep(1:2, each=nrec) 
data$time <- rep(1:4, nrow(data)/nrec) 

df <- reshape(data, v.names="text_val", idvar="id", direction="wide")[,-1] 
names(df) <- c("id","date1","date2","educational","qualif") 
df 

# id date1 date2       educational          qualif 
# 1 1 2000 2003 ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 
# 5 2 1996 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 
+0

請注意,基本的R'reshape'函數對你的代碼工作正常,所以你不需要加載任何庫。 – lmo

+0

@lmo對!謝謝 ! –

+0

@MarcoSandri:感謝分享答案。 – Vishnu

0

爲了完整起見,這裏也是一個解決方案,使用dcast()data.table包:

library(data.table) 
setDT(data)[, rn := .I + 3L][ 
    , dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")] 
rn  date       educational          qualif 
1: 1 2000, 2003 ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 
2: 2 1996, 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 

注意toString()用作聚合功能,使得重複的日期串接在一列。這是由於OP的預期輸出中的兩個date列共享相同的名稱,這可能表明預期的輸出僅用於顯示,並且不需要對date值進一步處理。


如果列順序事宜,rn不是必需的,輸出可以被美化,以更好地匹配OP的期望的結果:

lvl <- c("educational", "qualif", "date") 
setDT(data)[, rn := .I + 3L][, class_name := factor(class_name, levels = lvl)][ 
    , dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")][, rn := NULL][] 
      educational          qualif  date 
1: ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 2000, 2003 
2: MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996, 2000 
+0

感謝發佈。 – Vishnu