如何在R中對此進行矢量化處理

我有兩個關於交通流量的數據表。我試圖（最終）通過里程碑將它們組合成線性的交通情節。例如：如何在R中對此進行矢量化處理

mileposts <- structure(list(city = c("city1", "city2", "city3", "city4"), 
milepost = c(0L, 50L, 120L, 250L)), .Names = c("city", "milepost" 
), class = "data.frame", row.names = c("1", "2", "3", "4")) 

    city milepost 
1 city1  0 
2 city2  50 
3 city3  120 
4 city4  250 


traffic <- structure(list(citypair = c("city1-city2", "city2-city4", "city1-city3", 
"city1-city4", "city3-city4"), traffic = c(610L, 23L, 139L, 88L, 
17L), origmp = c(0L, 50L, 0L, 0L, 120L), destmp = c(50L, 250L, 
120L, 250L, 250L)), .Names = c("citypair", "traffic", "origmp", 
"destmp"), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5")) 

    citypair  traffic origmp destmp 
1 city1-city2  610  0  50 
2 city2-city4  23  50  250 
3 city1-city3  139  0  120 
4 city1-city4  88  0  250 
5 city3-city4  17  120  250

我想是添加到「里程碑」表中，列出所有的交通開始在或經過該城市一列「體積」（城市爲了去1-2 -3-4）。例如，city3的音量將是來自流量[c（2,4,5），2]的值的總和。

我該怎麼做？我知道它一定是某種循環。我嘗試了一個循環，在traffic$origmp[i] >= mileposts$milepost和traffic$destmp[i] <= mileposts$milepost的條件下添加traffic$traffic to mileposts$vol中的值，但我得到的錯誤是"the condition has length > 1 and only the first element will be used"。但是，如果我將整個事件圍繞[j]維度包裝在mileposts$milepost上，整個運行變得非常慢。有關如何有效加速/代碼的任何建議？更多地一般來說，我想問的是如何以有效的方式（即不循環遍歷兩個數據幀的每一行）使用兩個數據幀之間的數據執行條件操作。謝謝！

來源

2014-02-27 user3358547

我們究竟如何，都應該知道所有的城市「開始在或穿過那個城市。「？ –

@IShouldBuyABoat - 我對city1-city2-city3-city4做了一個很大的假設：理想情況下這應該被指定爲使其成爲一個通用的解決方案。 – thelatemail

@thelatemail是的，他們從城市1-4順序進行。對不起，沒有更清楚。 – user3358547

有了您的兩個表 - mileposts和traffic已經在內存中，我能得到你想要用下面的代碼的結果 -

library(data.table) 

# building index of which route traffic is to be associated with which city 
uniquecities <- unique(mileposts$milepost) 
uniqueCityCombns <- data.table(expand.grid(uniquecities,uniquecities,uniquecities)) 
setnames(uniqueCityCombns, c('origmp','destmp','milepost')) 
uniqueCityCombns <- uniqueCityCombns[origmp < destmp & milepost < destmp] 
uniqueCityCombns <- data.table(uniqueCityCombns <- uniqueCityCombns[origmp <= milepost]) 

# calculating traffic passing through the city 
uniqueCityCombnsTrf <- merge(uniqueCityCombns,traffic, by = c('origmp','destmp')) 
uniqueCityCombnsTrf <- uniqueCityCombnsTrf [,list(traffic = sum(traffic)), by = 'milepost'] 
uniqueCityCombnsTrf <- merge(uniqueCityCombnsTrf , mileposts, by = 'milepost')

輸出 -

> uniqueCityCombnsTrf 
    milepost traffic city 
1:  0  837 city1 
2:  50  250 city2 
3:  120  128 city3

來源

2014-02-27 03:09:43 TheComeOnMan

這有點令人費解，但它的工作原理：

cityorder <- c("city1","city2","city3","city4") 
through <- lapply(strsplit(traffic$citypair,"-"),match,cityorder) 
through <- lapply(through,function(x) seq(x[1],x[2]-1)) 

citymatch <- sapply(mileposts$city, grep, cityorder) 
sum.ids <- lapply(citymatch, function(x) sapply(through, function(y) x %in% y)) 
mileposts$traffic <- sapply(sum.ids, function(x) sum(traffic$traffic[x])) 

# city milepost traffic 
#1 city1  0  837 
#2 city2  50  250 
#3 city3  120  128 
#4 city4  250  0

結果檢查出與預期結果「請分享幫助的音量會從流量值的總和[C（2，4，5），2]」

sum(traffic[c(2, 4, 5),2]) 
#[1] 128

來源

2014-02-27 03:09:57 thelatemail

traffic$start <- as.numeric(gsub("city|-city.+$", "", traffic$citypair)) 
traffic$end <- as.numeric(gsub("city[[:digit:]]*|-city", "", traffic$citypair)) 
sapply(mileposts$city, function(cit) {n=as.numeric(sub("city","",cit)) 
        sum(traffic$traffic*((n >= traffic$start) & n < traffic$end))}) 
#--------- 
city1 city2 city3 city4 
    837 250 128  0

來源

2014-02-27 03:14:33

如何在R中對此進行矢量化處理

回答

相關問題