哪個（）函數在R - 按降序排序後，與重複值匹配的問題

大家好：我早些時候發佈了，得到了80％的幫助，但希望爲其他人開放與R中的which（）函數類似的問題：哪個（）函數在R - 按降序排序後，與重複值匹配的問題

我正試圖從每個郵政編碼的商店ID，郵政編碼和長/緯度座標矩陣中找到下一個最近的商店。每個郵政編碼超過1個商店時會發生問題，並且腳本不知道如何訂購2個相同的值（商店x距離10英里，存儲y爲10英里，並且遇到x和y，並返回（c（x，y）），而不是x，y或y，x）。我需要找到一種方法讓我的代碼弄清楚如何列出它們（阿根廷訂單，因爲它們距商店相同的距離，基於郵政編碼）。

我在想那裏可能會修改which（）函數，但我沒有任何運氣。

請注意，所有的商店運行，只有100個左右的商店，與另一家商店有相同的郵編被絆倒了 - 我不想手動通過和編輯csv。

library(data.table) 
library(zipcode) 
library(geosphere) 
source<-read.csv("C:\\Users\\mcan\Desktop\\Projects\\Closest Store\\Site and Zip.csv",header=TRUE, sep=",") #open 
zip<-source[,2] #break apart the source zip codes 
ID<-source[,1] #break apart the IDs 
zip<-clean.zipcodes(zip) #clean up the zipcodes 
CleanedData<-data.frame(ID,zip) #combine the IDs and cleaned Zip codes 
CleanedData<-merge(x=CleanedData,y=zipcode,by="zip",all.x=TRUE) #dataset of store IDs, zipcodes, and their long/lat positions 
setDT(CleanedData) #set data frame to data table 
storeDistances <- distm(CleanedData[,.(longitude,latitude)],CleanedData[,.(longitude,latitude)]) #matrix between long/lat points of all stores in list 
colnames(storeDistances) <- rownames(storeDistances) <- CleanedData[,ID] 
whatsClosest <- function(number=1){ 
    apply(storeDistances,1,function(x) (colnames(storeDistances)[which(x==sort(x)[number+1])])) #sorts in descending order and picks the 2nd closest distance, matches with storeID 
} 
CleanedData[,firstClosestSite:=whatsClosest(1)] #looks for 1st closest store 
CleanedData[,secondClosestSite:=whatsClosest(2)] #looks for 2nd closest store 
CleanedData[,thirdClosestSite:=whatsClosest(3)] #looks for 3rd closest store

數據集格式：

Classes ‘data.table’ and 'data.frame': 1206 obs. of 9 variables: 
    $ zip    : Factor w/ 1182 levels "","02345",..: 1 2 3 4 5 6 7 8 9 10 ... 
    $ ID    : int 11111 12222 13333 10528 ... 
    $ city    : chr "Boston" "Somerville" "Cambridge" "Weston" ... 
    $ state   : chr "MA" "MA" "MA" "MA" ... 
    $ latitude   : num 40.0 41.0 42.0 43.0 ... 
    $ longitude  : num -70.0 -70.1 -70.2 -70.3 -70.4 ... 
    $ firstClosestSite :List of 1206 
     ..$ : chr "12345" 
    $ secondClosestSite :List of 1206 
     ..$ : chr "12344" 
    $ thirdClosestSite :List of 1206 
     ..$ : chr "12343"

問題自帶firstClosestSite和secondClosest網站，他們按距離排序，但如果距離是相同的，因爲在同一個郵政編碼存在兩種專賣店，其中（）函數（我認爲）不知道如何解決這個問題，所以你得到的CSV這種尷尬串聯：

StoreID  Zip  City  State Longitude Latitude FirstClosestSite 
11222  11000  Boston  MA  40.0  -70.0 c("11111""12222") 

SecondClosestSite  ThirdClosestSite 
c("11111" "12222") 13333

如何形成的距離矩陣示例（在第一行和列存儲的ID，與矩陣值被存儲的ID之間的距離）：

11111 22222  33333 44444 55555 66666 
11111 0  6000 32000 36000 28000 28000 
22222 6000 0  37500 40500 32000 32000 
33333 32000 37500 0  11000 6900 6900 
44444 36000 40500 11000 0  8900 8900 
55555 28000 32000 6900 8900 0  0 
66666 28000 32000 6900 8900 0  0

問題是每一行中的重複...的哪個（）不知道哪個商店最接近到11111（55555或66666）。

來源

2017-07-31 mcando

你能提供一個你的數據集樣本和你正在尋找的最終結果嗎？ –

@OriolMirosa剛剛編輯！例如，需要在站點11111或12222之間做出決定 - 通過CSV並修復這些行太麻煩了〜有什麼想法？：P – mcando

@OriolMirosa我正在尋找哪個（）函數的替代方案 – mcando

這是我的解決方案。一直到colnames(storeDistances) <- ...的行保持不變。在這之後，你應該替換爲以下代碼：

whatsClosestList <- sapply(as.data.frame(storeDistances), function(x) list(data.frame(distance = x, store = rownames(storeDistances), stringsAsFactors = F))) 

# Get the names of the stores 
# this step is necessary because lapply doesn't allow us 
# to access the list names 
storeNames = names(whatsClosestList) 

# Iterate through each store's data frame using storeNames 
# and delete the distance to itself 
whatsClosestListRemoveSelf <- lapply(storeNames, function(name) { 
    df <- whatsClosestList[[name]] 
    df <- df[!df$store == name,] 
}) 

# The previous step got rid of the store names in the list, 
# so we add them again here 
names(whatsClosestListRemoveSelf) <- storeNames 

whatsClosestOrderedList <- lapply(whatsClosestListRemoveSelf, function(df) { df[order(df$distance),] }) 

whatsClosestTopThree <- lapply(whatsClosestOrderedList, function(df) { df$store[1:3] }) 

firstClosestSite <- lapply(whatsClosestTopThree, function(x) { x[1]}) 
secondClosestSite <- lapply(whatsClosestTopThree, function(x) { x[2]}) 
thirdClosestSite <- lapply(whatsClosestTopThree, function(x) { x[3]}) 

CleanedData[,firstClosestSite:=firstClosestSite] #looks for 1st closest store in list 
CleanedData[,secondClosestSite:=secondClosestSite] #looks for 2nd closest store in list 
CleanedData[,thirdClosestSite:=thirdClosestSite] #looks for 3rd closest store in list

基本上，而不是隻對（第一，第二，第三）最近的站點搜索，我創建dataframes的列表一起的距離每家店所有其他商店。然後，我訂購這些數據框，並提取最近的三家商店，這些商店有時包括關係（如果並列，則按商店名稱排序）。然後，您只需要爲每個商店提供第一個關閉網站，第二個關閉網站等的列表，這就是爲什麼您在CleanedData的搜索中使用的原因。希望它有效！

來源

2017-08-02 02:34:18

是的，當我複製這個時，我忘記了等號。我很高興它現在可以工作！ –

哪個（）函數在R - 按降序排序後，與重複值匹配的問題

回答

相關問題