2016-05-26 17 views
0

標題似乎有點混亂,所以讓我看看,如果我可以用一個小例子闡明數據幀分割和順序列這R:</p> <p>我有3列看上去就像一個數據幀:根據另一列

col1  col2  col3 
1 A,D,C sd,dg,ds 5,26,1 
2 D,F fh,we 85,41 
3  H  hr  27 
4 C,A,D ds,sd,dg 235,65,3 
5 Q,G,J rt,gh,we 34,98,65 

我想字母順序COL1的每一個元素,然後訂購COL2和COL3的每個元素按照COL1的順序,得到這樣的:

col1  col2  col3 
1 A,C,D sd,ds,dg 5,1,26 
2 D,F fh,we 85,41 
3  H  hr  27 
4 A,C,D sd,ds,dg 65,235,3 
5 G,J,Q gh,we,rt 98,65,34 

這是後來我重要原因想COL1聚集,我需要的元件1,4的例子等於(A,C,d)

到目前爲止,我被困在這裏:

MWE

my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65')) 
my.df 
my.df$col1 <- sapply(sapply(strsplit(as.character(my.df$col1), ','), sort), paste, collapse=',') 
my.df 

任何幫助讚賞!謝謝!!

回答

1

你可以把每一行成數據幀,重新排序基於data.frame在列1上,然後將它們全部粘貼在一起:

# split the entries by commas and 
# turn each row of my.df into a data frame 
# storing each data frame in a list element 
dfList <- lapply(
    apply(my.df, 1, strsplit, ","), 
    function(x) data.frame(x)) 

# sort each data frame by col1 
dfSortedList <- lapply(dfList, function(x) x[with(x, order(col1)), ]) 

# paste columns back together and arrange as desired 
t(sapply(dfSortedList, function(x) apply(x, 2, paste, collapse = ","))) 

#  col1 col2  col3  
#[1,] "A,C,D" "sd,ds,dg" "5,1,26" 
#[2,] "D,F" "fh,we" "85,41" 
#[3,] "H"  "hr"  "27"  
#[4,] "A,C,D" "sd,ds,dg" "65,235,3" 
#[5,] "G,J,Q" "gh,we,rt" "98,65,34" 

如果需要,可以轉換回數據框。

+0

真的很優雅,最好避免循環! – DaniCee

1

在這裏你去:

my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65'),stringsAsFactors = F) 

for (k in 1:dim(my.df)[1]){ 
    tempdf <- data.frame(strsplit(my.df[k,1],","),strsplit(my.df[k,2],","),strsplit(my.df[k,3],","),stringsAsFactors = F) 
    tempdf <- tempdf[order(tempdf[,1]),] 
    my.df[k,] <- sapply(tempdf,paste,collapse=",") 
} 

正如你所看到的,我去了由逗號分隔的字符串的每一行轉換成一個臨時的數據幀。那麼您只需要按第一列對臨時數據幀進行排序。並從那裏你崩潰tempdf的每一列轉換爲字符串原始my.df更換

結果:

> my.df 
    col1  col2  col3 
1 A,C,D sd,ds,dg 5,1,26 
2 D,F fh,we 85,41 
3  H  hr  27 
4 A,C,D sd,ds,dg 65,235,3 
5 G,J,Q gh,we,rt 98,65,34 
1

我們可以使用cSplitsplitstackshapedata.table這樣做。

library(splitstackshape) 
na.omit(cSplit(setDT(my.df, keep.rownames=TRUE), 2:4, ",","long"))[ 
     , {i1 <- order(col1) 
     lapply(.SD, function(x) paste(x[i1], collapse=",")) 
    }, rn][, rn:= NULL][] 
# col1  col2  col3 
#1: A,C,D sd,ds,dg 5,1,26 
#2: D,F fh,we 85,41 
#3:  H  hr  27 
#4: A,C,D sd,ds,dg 65,235,3 
#5: G,J,Q gh,we,rt 98,65,34 

或者稍微更長的選項會分裂「COL1」和數據集轉換爲與cSplit「長」格式,然後通過「COL2」和「COL3」分組,我們創建了一個order列('i1')和sort ed'col1'。然後,指定.SDcols爲「COL2」和「COL3」,遍歷那些lapply,使用,拆分中的列,輸出一起改變基於「I1」列與Maporderpaste它和分配(:=)回原來的專欄。如果需要,將'i1'分配給NULL。

d1 <- cSplit(my.df, "col1", ",", "long")[, 
.(i1 = list(order(col1)), col1 = toString(sort(col1))) ,.(col2, col3)] 
d1[, c('col2', 'col3') := lapply(.SD, function(x) 
    Map(function(x, y) x[y], strsplit(as.character(x), ","), d1$i1)), .SDcols = col2:col3] 
d1[, i1:= NULL] 
d1[, names(my.df), with = FALSE] 
#  col1  col2  col3 
#1: A, C, D sd,ds,dg 5,1,26 
#2: D, F fh,we 85,41 
#3:  H  hr  27 
#4: A, C, D sd,ds,dg 65,235,3 
#5: G, J, Q gh,we,rt 98,65,34 
相關問題