我試圖將人口普查的FIPS代碼,縣級唯一標識符「鄰接列表」轉換爲實際鄰接列表或邊緣列表,然後最終轉換爲鄰接矩陣。以下是人口普查FIPS代碼數據:http://www2.census.gov/geo/docs/reference/county_adjacency.txt。如何將R(雜亂)列表轉換爲R中的多個鄰接列表或邊界列表?
問題:如何將一個難纏的列表轉換爲多個邏輯鄰接表,然後最終是一個矩陣?
問題在於,它不是任何常規理解短語時的「鄰接表」。我對R非常陌生,請原諒任何錯誤或缺乏最佳做法...
我的直覺告訴我,通過列表進行循環,將數據分爲唯一的鄰接列表,將每個列表轉換爲矩陣,然後將矩陣綁定成一個大的二進制矩陣。我在網上搜索如何做到這一點,但所有的例子包含非常簡單,清潔的數據。 :(
人口普查顯示這樣的FIPS碼:
"Bullock County, AL" 01011 "Barbour County, AL" 01005
"Bullock County, AL" 01011
"Macon County, AL" 01087
"Montgomery County, AL" 01101
"Pike County, AL" 01109
"Russell County, AL" 01113
"Butler County, AL" 01013 "Butler County, AL" 01013
"Conecuh County, AL" 01035
"Covington County, AL" 01039
"Crenshaw County, AL" 01041
"Lowndes County, AL" 01085
"Monroe County, AL" 01099
"Wilcox County, AL" 01131
當我讀鏈接成R的文本文件數據被顯示這樣的:
[1] "\"Autauga County, AL\"\t01001\t\"Autauga County, AL\"\t01001" "\t\t\"Chilton County, AL\"\t01021" "\t\t\"Dallas County, AL\"\t01047"
[4] "\t\t\"Elmore County, AL\"\t01051" "\t\t\"Lowndes County, AL\"\t01085" "\t\t\"Montgomery County, AL\"\t01101"
[7] "\"Baldwin County, AL\"\t01003\t\"Baldwin County, AL\"\t01003" "\t\t\"Clarke County, AL\"\t01025" "\t\t\"Escambia County, AL\"\t01053"
[10] "\t\t\"Mobile County, AL\"\t01097"
我用stringr包的正則表達式現在數據如下:
> str(cleaner)
List of 100
$ : chr [1:2] "01001" "01001"
$ : chr "01021"
$ : chr "01047"
$ : chr "01051"
$ : chr "01085"
$ : chr "01101"
$ : chr [1:2] "01003" "01003"
$ : chr "01025"
$ : chr "01053"
$ : chr "01097"
$ : chr "01099"
$ : chr "01129"
$ : chr "12033"
我可以分組跟在鄰接列表的「第一個」項目之後的元素,如下所示:
#function that groups FIPS codes, displays them by index value
reduce_fips <- function(locations, vect) {
out <- list()
for (i in 1:length(locations)) {
if (i == length(locations)) {
out[[i]] <- locations[i]:length(vect)
} else {
out[[i]] <- locations[i]:(locations[i + 1] - 1)
}
}
out
}
out <- reduce_fips(adj_list_start, fips_codes) #produces adj list values
#problem: some adj list start points contain 2 different values of fips codes
fips_adj_df <- data.frame(cleaner = sapply(out, function(x) x[1]))
fips_adj_df
fips_adj_df$adjacent <- out
#problem: how to transform this into a matrix or connected nodes
這會產生如下所示的輸出。然而,它在邏輯上不正確,並且通過記憶方式進行搜索會很昂貴。
cleaner adjacent
1 1 1, 2, 3, 4, 5, 6
2 7 7, 8, 9, 10, 11, 12, 13
3 14 14, 15, 16, 17, 18, 19, 20, 21, 22
4 23 23, 24, 25, 26, 27, 28, 29
5 30 30, 31, 32, 33, 34, 35, 36
6 37 37, 38, 39, 40, 41, 42
7 43 43, 44, 45, 46, 47, 48, 49
8 50 50, 51, 52, 53, 54, 55
9 56 56, 57, 58, 59, 60, 61
10 62 62, 63, 64, 65, 66, 67, 68, 69
最終,我想要一個這樣的二進制矩陣,顯示FIPS代碼是否在地理上彼此相鄰。例如,假設100,101和102彼此相鄰,而103僅與102相鄰,我希望矩陣顯示這樣的信息。
FIPS
FIPS 100 101 102 103
102 1 1 1 1
101 1 1 1 0
100 1 1 1 0