2015-05-19 79 views
3

我有以下結構的數據幀:轉換成列排不指定列名

bad_df <- data.frame(
id = c("id001", "id002", "id003"), 
participant.1 = c("Jana", "Marina", "Vasilei"), 
participant.2 = c("Niko", "Micha", "Niko"), 
role.1 = c("writer", "writer", "speaker"), 
role.2 = c("observer", "observer", "observer"), 
stringsAsFactors = F 
) 
bad_df 

我需要把它收集到這樣的事情。每行應包含一個ID,參與者和角色。

good_df <- data.frame(
id = c("id001", "id001", "id002", "id002", "id003", "id003"), 
participant = c("Jana", "Niko", "Marina", "Micha", "Vasilei", "Niko"), 
role = c("writer", "observer", "writer", "observer", "speaker", "observer"), 
stringsAsFactors = F 
) 
good_df 

我看到有無數的問題非常喜歡這一點,但我覺得很難理解如何應用tidyrreshape2到這種情況。我明白這必須以某種方式與聚會()。

但是,數據框可能包含大量的參與者和相應的角色,所以理想情況下該方法不需要指定列名稱。我想到了一個解決方案,但我認爲這不是最優雅的方法。我仍然需要處理一些含有參與者的數據幀。3,role.3等。

good_df2 <- rbind(bad_df %>% select(id, participant.1, role.1) %>% 
        rename(participant = participant.1, role = role.1), 
       bad_df %>% select(id, participant.2, role.2) %>% 
        rename(participant = participant.2, role = role.2)) 
good_df2 

謝謝!

回答

4

你可以試試data.table的開發版。 v1.9.5。說明安裝有here

library(data.table) 
melt(setDT(bad_df), measure=list(grep('participant', names(bad_df)), 
    grep('role', names(bad_df))))[order(id)][, variable:= NULL] 
#  id value1 value2 
#1: id001 Jana writer 
#2: id001 Niko observer 
#3: id002 Marina writer 
#4: id002 Micha observer 
#5: id003 Vasilei speaker 
#6: id003 Niko observer 

或者我們可以使用merged.stack,我們只需要提供獨特的列的前綴。根據前綴值,它會將具有相同前綴的列組合在一起。

library(splitstackshape) 
merged.stack(bad_df, var.stubs=c('participant', 'role'), 
         sep='var.stubs')[, 2:= NULL] 
#  id participant  role 
#1: id001  Jana writer 
#2: id001  Niko observer 
#3: id002  Marina writer 
#4: id002  Micha observer 
#5: id003  Vasilei speaker 
#6: id003  Niko observer 

或者用dplyr/tidyr

library(dplyr) 
library(tidyr) 
gather(bad_df, Var, Val, -id) %>% 
     separate(Var, into=c('Var1', 'Var2')) %>% 
     spread(Var1, Val) %>% 
     select(-Var2) 
# id participant  role 
#1 id001  Jana writer 
#2 id001  Niko observer 
#3 id002  Marina writer 
#4 id002  Micha observer 
#5 id003  Vasilei speaker 
#6 id003  Niko observer 
+0

@Frank我不知道如何解釋這句話。但是,我們需要至少指定列測量列。在可以完成的'merged.stack'中。 – akrun

+1

我的意思是說你寫的是「participant.1」「participant.2」......這樣做似乎違反了OP的前提,即避免枚舉列(像分裂堆棧解決方案那樣),因爲參與者不會提前知道(或類似的東西)。 – Frank

+1

使用'grep' ..... – Arun

3

我會去這樣在base R:

#find the participant columns 
partCol<-grep("part",colnames(bad_df)) 
#... and the role columns 
roleCol<-grep("role",colnames(bad_df)) 
data.frame(id=rep(bad_df$id,each=length(partCol)), 
      partecipant=as.vector(as.matrix(t(bad_df[,partCol]))), 
      role=as.vector(as.matrix(t(bad_df[,roleCol]))))