2015-12-08 103 views
1

合併兩個數據集後,我得到一個包含300個變量的數據(其中一些變量以.x結尾,一些以.y結尾,一些不以.x和.y結尾)。如何將所有不以.x和.y結尾的變量帶到數據集的前100列。另外,我想讓col 101像(day.x,day.y,city.x,city.y,number.x,number.y等等)一樣排列。也就是說,具有相同名稱的變量,比如城市,但具有不同的擴展名,彼此相鄰/相鄰。 例如:對變量重新排序

city.y<- c(1,2,3,5,5,7,7,NA,NA,3,4,5) 
B<-c(3,4,5,6,1,2,7,6,7,NA,NA,6) 
number.x<-c(1,2,3,4,5,6,7,NA,NA,5,5,6) 
day.x<-c(1,3,4,5,6,7,8,1,NA,3,5,3) 
Z<-c(1,2,3,4,5,6,7,NA,NA,5,5,6) 
day.y<-c(4,5,6,7,8,7,8,1,2,3,5,NA) 
number.y<-c(3,4,5,6,1,2,7,6,7,NA,NA,6) 
school.x<-c("a","b","b","c","n","f","h","NA","F","G","z","h") 
S<-c(5,2,3,4,5,6,5,NA,NA,5,6,6) 
school.y<-c("a","b","b","c","m","g","h","NA","NA","G","H","T") 
city.x<- c(1,2,3,7,5,8,7,5,6,7,5,1) 
df<- data.frame(city.y,B,number.x,day.x,Z,day.y,number.y,school.x,S,school.y,city.x) 

我要重新排序以這種格式變量:B,S,Z,city.x,city.y,number.x,number.y,day.x,day.y和...

回答

3

添加一列,以創造更多的一般使用情況:

df$ZZZZZ = 1:6 

然後,裝入dplyr包(用於鏈接運營商%>%select功能):

library(dplyr) 

排序將得到列的每個子分組在正確的相對順序:

names(df) = sort(names(df)) 

現在用正則表達式-matches("\\.[xy]$")捕捉到所有的列沒有「.X」或「.Y」末並把這些列放在開頭。然後把所有其他列放在他們後面。

df = df %>% select(-matches("\\.[xy]$"), everything()) 

df 

    A B C ZZZZZ city.x city.y day.x day.y number.x number.y school.x school.y 
1 1 3 1  1  1  1  4  3  a  5  a  1 
2 2 4 2  2  3  2  5  4  b  2  b  2 
... 
11 4 NA 5  5  5  5  5 NA  z  6  H  5 
12 5 6 6  6  3  6 NA  6  h  6  T  1 

如果你喜歡,你還可以設置在merge功能(而不是默認的「.X」和「.Y」)這樣你自己的後綴:

merge(df1, df2, by="col", suffixes=c("_df1", "_df2")) 

如果你這樣做那你當然也需要調整對列重新排序的正則表達式。

2

這應該這樣做

extCols <- grepl("\\.", colnames(df)) 
df[, c(colnames(df)[(!extCols)], 
    sort(colnames(df)[extCols]))]