2016-07-14 97 views
0

我試圖將人口普查的FIPS代碼,縣級唯一標識符「鄰接列表」轉換爲實際鄰接列表或邊緣列表,然後最終轉換爲鄰接矩陣。以下是人口普查FIPS代碼數據:http://www2.census.gov/geo/docs/reference/county_adjacency.txt如何將R(雜亂)列表轉換爲R中的多個鄰接列表或邊界列表?

問題:如何將一個難纏的列表轉換爲多個邏輯鄰接表,然後最終是一個矩陣?

問題在於,它不是任何常規理解短語時的「鄰接表」。我對R非常陌生,請原諒任何錯誤或缺乏最佳做法...

我的直覺告訴我,通過列表進行循環,將數據分爲唯一的鄰接列表,將每個列表轉換爲矩陣,然後將矩陣綁定成一個大的二進制矩陣。我在網上搜索如何做到這一點,但所有的例子包含非常簡單,清潔的數據。 :(

人口普查顯示這樣的FIPS碼:

"Bullock County, AL" 01011 "Barbour County, AL" 01005 
     "Bullock County, AL" 01011 
     "Macon County, AL" 01087 
     "Montgomery County, AL" 01101 
     "Pike County, AL" 01109 
     "Russell County, AL" 01113 
"Butler County, AL" 01013 "Butler County, AL" 01013 
     "Conecuh County, AL" 01035 
     "Covington County, AL" 01039 
     "Crenshaw County, AL" 01041 
     "Lowndes County, AL" 01085 
     "Monroe County, AL" 01099 
     "Wilcox County, AL" 01131 

當我讀鏈接成R的文本文件數據被顯示這樣的:

[1] "\"Autauga County, AL\"\t01001\t\"Autauga County, AL\"\t01001" "\t\t\"Chilton County, AL\"\t01021"       "\t\t\"Dallas County, AL\"\t01047"        
[4] "\t\t\"Elmore County, AL\"\t01051"        "\t\t\"Lowndes County, AL\"\t01085"       "\t\t\"Montgomery County, AL\"\t01101"       
[7] "\"Baldwin County, AL\"\t01003\t\"Baldwin County, AL\"\t01003" "\t\t\"Clarke County, AL\"\t01025"        "\t\t\"Escambia County, AL\"\t01053"       
[10] "\t\t\"Mobile County, AL\"\t01097" 

我用stringr包的正則表達式現在數據如下:

> str(cleaner) 
List of 100 
$ : chr [1:2] "01001" "01001" 
$ : chr "01021" 
$ : chr "01047" 
$ : chr "01051" 
$ : chr "01085" 
$ : chr "01101" 
$ : chr [1:2] "01003" "01003" 
$ : chr "01025" 
$ : chr "01053" 
$ : chr "01097" 
$ : chr "01099" 
$ : chr "01129" 
$ : chr "12033" 

我可以分組跟在鄰接列表的「第一個」項目之後的元素,如下所示:

#function that groups FIPS codes, displays them by index value 
reduce_fips <- function(locations, vect) { 
    out <- list() 
    for (i in 1:length(locations)) { 
    if (i == length(locations)) { 
     out[[i]] <- locations[i]:length(vect) 
    } else { 
     out[[i]] <- locations[i]:(locations[i + 1] - 1) 
    } 
    } 
    out 
} 

out <- reduce_fips(adj_list_start, fips_codes) #produces adj list values 
#problem: some adj list start points contain 2 different values of fips codes 

fips_adj_df <- data.frame(cleaner = sapply(out, function(x) x[1])) 
fips_adj_df 
fips_adj_df$adjacent <- out 
#problem: how to transform this into a matrix or connected nodes 

這會產生如下所示的輸出。然而,它在邏輯上不正確,並且通過記憶方式進行搜索會很昂貴。

cleaner       adjacent 
1  1     1, 2, 3, 4, 5, 6 
2  7   7, 8, 9, 10, 11, 12, 13 
3  14 14, 15, 16, 17, 18, 19, 20, 21, 22 
4  23   23, 24, 25, 26, 27, 28, 29 
5  30   30, 31, 32, 33, 34, 35, 36 
6  37    37, 38, 39, 40, 41, 42 
7  43   43, 44, 45, 46, 47, 48, 49 
8  50    50, 51, 52, 53, 54, 55 
9  56    56, 57, 58, 59, 60, 61 
10  62  62, 63, 64, 65, 66, 67, 68, 69 

最終,我想要一個這樣的二進制矩陣,顯示FIPS代碼是否在地理上彼此相鄰。例如,假設100,101和102彼此相鄰,而103僅與102相鄰,我希望矩陣顯示這樣的信息。

   FIPS 
FIPS  100 101 102 103 
    102  1 1 1 1 
    101  1 1 1 0 
    100  1 1 1 0 

回答

0

你在這個問題上有很多事情要做,所以我會盡力把它分解。

首先,您可以使用read.csv從文本文件獲取信息。

df <- read.csv("county_adjacency.txt", sep="\t", stringsAsFactors = FALSE, header = FALSE) 

    # Drop the names for the counties, you don't need them  
    df <- df[,c("V2","V4")] 

使用動物園圖書館的na.locf填充na值。

library(zoo) 
    df$V2 <- na.locf(df$V2) 

列出你的fips。用它來製作你的矩陣。

fips <-unique(df$V2) 

    fips.matrix <- matrix(data=0, nrow = length(fips), ncol = length(fips), dimnames = list(fips,fips)) 

根據txt文件中的座標向1填充矩陣。

df <- as.character(df) 

    fips.matrix[as.matrix(df)] <-1