2016-02-01 100 views
1

我有使用代碼轉換列成行r中

test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344)) 

dis dur method to_lon to_lat from_lon from_lat 
1 10 30  car -1.9800 55.3009 -1.4565 76.8888 
2 20 40  car -1.5678 55.3416 -1.3424 65.8999 
3 30 60 Bicycle -1.3240 55.1123 -1.4566 76.9088 
4 40 90 Bicycle -1.4560 55.2234 -1.1111 25.3344 

形成的下面數據欲這個數據幀轉換,使得它具有一排用於to_lat和to_lon和下一行中它具有from_lat和from_lon。其餘細節不需要更改,可以複製。期望的結果應該如下

dis dur method longitude latitude 
from 10 30 car -1.98 55.3009 
to 10 30 car -1.4565 76.8888 
from 20 40 car -1.5678 55.3416 
to 20 40 car -1.3424 65.8999 
from 30 60 Bicycle -1.324 55.1123 
to 30 60 Bicycle -1.4566 76.9088 
from 40 90 Bicycle -1.456 55.2234 
to 40 90 Bicycle -1.1111 25.3344 

任何幫助將不勝感激。

謝謝。

+1

除了@ akrun的答案,看看這個頁面'reshape2'和'tidyr'解決方案(給你的標籤):http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/ – Laterow

+0

我好沒有發現任何有用的功能來轉換上述在帖子中提到的數據。有任何想法嗎? – syebill

回答

2

我們可以使用中的melt,它可以包含多個measure列。

library(data.table) 
dM <- melt(setDT(test), measure=patterns('lon', 'lat'), 
      value.name=c('longitude', 'latitude')) 
#change the 'variable' column from numeric index to 'from/to' 
dM[, variable:= c('from', 'to')[variable]] 
#create a sequence column grouped by 'variable' 
dM[,i1:= 1:.N ,variable] 
#order based on the 'i1' 
res <- dM[order(i1)][,i1:=NULL] 
res 
# dis dur method variable longitude latitude 
#1: 10 30  car  from -1.9800 55.3009 
#2: 10 30  car  to -1.4565 76.8888 
#3: 20 40  car  from -1.5678 55.3416 
#4: 20 40  car  to -1.3424 65.8999 
#5: 30 60 Bicycle  from -1.3240 55.1123 
#6: 30 60 Bicycle  to -1.4566 76.9088 
#7: 40 90 Bicycle  from -1.4560 55.2234 
#8: 40 90 Bicycle  to -1.1111 25.3344 
0

這可能不是最完美的解決方案,但它應該工作,並希望可以理解的:

我們將數據分成兩個dataframes:一個與「從」經度和緯度數據(稱之爲testF),另一個與'to'數據(稱之爲測試)。然後我們使用rbind將「testF」的行插入到'test'中的適當位置。

test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344)) 

testF <- test[,c(1:3,6,7)] 
names(testF)[4:5] <- c("lonitude", "latitude") 
test <- test[,1:5] 
names(test)[4:5] <- c("lonitude", "latitude") 

for(i in dim(test)[1]:1) { 
    test <- rbind(test[1:i,], testF[i,], test[-(1:i),]) 
} 
+0

我同意你的解決方案,但如果你有超過一百萬行處理比循環可以很快增加你的日記時間。感謝您的解決方案。 – syebill

0

下面是使用包tidyr(一種流行的軟件包進行數據改寫(munging)),這避免了for環的另一種方法。

library(tidyr) 

test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344)) 
test$id <- 1:dim(test)[1] 

# gather latitude columns 
d1 <- gather(data = test, 
      key = direction, 
      value = latitude, 
      to_lat, from_lat) 

# gather longitude columns 
d2 <- gather(data = test, 
      key = direction, 
      value = longitude, 
      to_lon, from_lon) 

d3 <- cbind(d1[,c("direction","dis","dur","method","latitude")],d2[,c("longitude","id"),drop=FALSE]) 

# Create names 
dir <- unlist(strsplit(d3$direction,"_")) 
dir <- dir[seq(from = 1, to = length(dir), by = 2)] 

# Factor and sort 
d3$direction <- factor(dir) 
d3[order(d3$id),]