2016-04-06 145 views
0
library(RCurl) 
library(rjson) 
json <- getURL('https://extraction.import.io/query/runtime/17d882b5-c118-4f27-8ce1-90085ec0b116?_apikey=d5a8a01e20174e95887dc0f385e4e3f6d7ef5ca1428d5a029f2aa352509948ade8e5d7fb0dc941f4769a32b541ca6b38a7cd6578dfd81b357fbc4f2e008f5154f1dbfcff31878798fa887b70b1ff59dd&url=http%3A%2F%2Fwww.numbeo.com%2Fcost-of-living%2Fcompare_cities.jsp%3Fcountry1%3DSingapore%26country2%3DAustralia%26city1%3DSingapore%26city2%3DMelbourne') 
obj <- fromJSON(json) 

我想將數據轉換爲漂亮的數據列,但列表中的許多步驟都是「無名稱」。有關如何組織數據的任何想法?從列表中提取數據R

+0

你是這個數據集的所有者? json有不必要的數組和鍵。我可以提出一些改進嗎? – pauljeba

回答

1

看看這個區別,讓我知道你在想什麼。這是你的對象看起來像:

library(RCurl) 
library(rjson) 
json <- getURL('https://extraction.import.io/query/runtime/17d882b5-c118-4f27-8ce1-90085ec0b116?_apikey=d5a8a01e20174e95887dc0f385e4e3f6d7ef5ca1428d5a029f2aa352509948ade8e5d7fb0dc941f4769a32b541ca6b38a7cd6578dfd81b357fbc4f2e008f5154f1dbfcff31878798fa887b70b1ff59dd&url=http%3A%2F%2Fwww.numbeo.com%2Fcost-of-living%2Fcompare_cities.jsp%3Fcountry1%3DSingapore%26country2%3DAustralia%26city1%3DSingapore%26city2%3DMelbourne') 
obj <- rjson::fromJSON(json) 
str(obj) 

List of 2 
$ extractorData:List of 3 
    ..$ url  : chr "http://www.numbeo.com/cost-of-living/compare_cities.jsp?country1=Singapore&country2=Australia&city1=Singapore&city2=Melbourne" 
    ..$ resourceId: chr "b1250747011ee774e7c881617c86a5a9" 
    ..$ data  :List of 1 
    .. ..$ :List of 1 
    .. .. ..$ group:List of 52 
    .. .. .. ..$ :List of 6 
    .. .. .. .. ..$ COL VALUE  :List of 1 
    .. .. .. .. .. ..$ :List of 1 
    .. .. .. .. .. .. ..$ text: chr "Meal, Inexpensive Restaurant" 

確實很多列表之間,你不需要。現在試試jsonlite包的fromJSON功能:

library(jsonlite) 
obj2<- jsonlite::fromJSON(json) 

List of 2 
$ extractorData:List of 3 
    ..$ url  : chr "http://www.numbeo.com/cost-of-living/compare_cities.jsp?country1=Singapore&country2=Australia&city1=Singapore&city2=Melbourne" 
    ..$ resourceId: chr "b1250747011ee774e7c881617c86a5a9" 
    ..$ data  :'data.frame': 1 obs. of 1 variable: 
    .. ..$ group:List of 1 
    .. .. ..$ :'data.frame': 52 obs. of 6 variables: 
    .. .. .. ..$ COL VALUE  :List of 52 
    .. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable: 
    .. .. .. .. .. ..$ text: chr "Meal, Inexpensive Restaurant" 
    .. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable: 
    .. .. .. .. .. ..$ text: chr "Meal for 2 People, Mid-range Restaurant, Three-course" 
    .. .. .. .. ..$ :'data.frame': 1 obs. of 1 variable: 

不過雖然,這只是JSON是不漂亮,我們需要解決這個問題。 我認爲你需要那裏的數據幀。因此,從

df <- obj2$extractorData$data$group[[1]] 

並且有您的數據框。但問題是:每個單元格都在列表中。包括NULL值,你不能只是將它們取消,它們將消失,並且它們所在的列將變得更短...

編輯:以下是如何處理list(NULL)值的列。

df[sapply(df[,2],is.null),2] <- NA 
df[sapply(df[,3],is.null),3] <- NA 
df[sapply(df[,4],is.null),4] <- NA 
df[sapply(df[,5],is.null),5] <- NA 
df2 <- sapply(df, unlist) %>% as.data.frame 

它可以被寫得更優雅肯定,但這會讓你去,這是可以理解的。