2016-01-06 57 views
2

以下字符串將JSON是根據http://jsonlint.com/但tidyjson對象正確的JSON:什麼是「記錄值不是對象」在tidyjson意味着

library(dplyr) 
library(tidyjson) 

json <- ' 
    [{"country":"us","city":"Portland","topics":[{"urlkey":"videogame","name":"Video Games","id":4471},{"urlkey":"board-games","name":"Board Games","id":19585},{"urlkey":"computer-programming","name":"Computer programming","id":48471},{"urlkey":"opensource","name":"Open Source","id":563}],"joined":1416349237000,"link":"http://www.meetup.com/members/156440062","bio":"Analytics engineer. Primarily work in the Hadoop space.","lon":-122.65,"other_services":{},"name":"Aaron Wirick","visited":1443078098000,"self":{"common":{}},"id":156440062,"state":"OR","lat":45.56,"status":"active"}] 
    ' 
    json %>% as.tbl_json %>% gather_keys 

我得到:

Error in gather_keys(.) : 1 records are values not objects 
+0

從哪裏來'%>%'? –

+0

@pascal - 當然是dplyr – thelatemail

+0

@Pascal - 它在標題中 - 'tidyjson'包 – thelatemail

回答

1

正如其中一條評論所述,gather_keys正在尋找有陣列的對象。你應該在這裏使用的是gather_array

此外,另一個答案使用更強大的方法來解析tidyjson軟件包創建的JSON屬性。 tidyjson提供了用於如果需要在一個位吸塵器管道與該處理:

library(dplyr) 
library(tidyjson) 

json <- ' 
[{"country":"us","city":"Portland" 
,"topics":[ 
{"urlkey":"videogame","name":"Video Games","id":4471} 
,{"urlkey":"board-games","name":"Board Games","id":19585} 
,{"urlkey":"computer-programming","name":"Computer programming","id":48471} 
,{"urlkey":"opensource","name":"Open Source","id":563} 
] 
,"joined":1416349237000 
,"link":"http://www.meetup.com/members/156440062" 
,"bio":"Analytics engineer. Primarily work in the Hadoop space." 
,"lon":-122.65,"other_services":{} 
,"name":"Aaron Wirick","visited":1443078098000 
,"self":{"common":{}} 
,"id":156440062,"state":"OR","lat":45.56,"status":"active" 
}] 
' 

mydf <- json %>% as.tbl_json %>% gather_array %>% 
spread_values(
country=jstring('country') 
, city=jstring('city') 
, joined=jnumber('joined') 
, bio=jstring('bio') 
) %>% 
enter_object('topics') %>% 
gather_array %>% 
spread_values(urlkey=jstring('urlkey')) 

此管道真正的亮點,如果有在陣列中的多個這樣的對象。希望這是有益的,即使事後很長時間!

+0

太棒了,它有幫助! – user1836270

0

對象由as.tbl_json產生有點奇怪我的思維方式,與一個單一項目的名稱,document.id,值爲1.它的屬性之一叫JSON

json <- ' 
    [{"country":"us","city":"Portland","topics":[{"urlkey":"videogame","name":"Video Games","id":4471},{"urlkey":"board-games","name":"Board Games","id":19585},{"urlkey":"computer-programming","name":"Computer programming","id":48471},{"urlkey":"opensource","name":"Open Source","id":563}],"joined":1416349237000,"link":"http://www.meetup.com/members/156440062","bio":"Analytics engineer. Primarily work in the Hadoop space.","lon":-122.65,"other_services":{},"name":"Aaron Wirick","visited":1443078098000,"self":{"common":{}},"id":156440062,"state":"OR","lat":45.56,"status":"active"}] 
    ' 
    obj <- json %>% as.tbl_json 

> dput(obj) 
structure(list(document.id = 1L), .Names = "document.id", row.names = 1L, class = c("tbl_json", 
"tbl", "data.frame"), JSON = list(list(structure(list(country = "us", 
    city = "Portland", topics = list(structure(list(urlkey = "videogame", 
     name = "Video Games", id = 4471L), .Names = c("urlkey", 
    "name", "id")), structure(list(urlkey = "board-games", name = "Board Games", 
     id = 19585L), .Names = c("urlkey", "name", "id")), structure(list(
     urlkey = "computer-programming", name = "Computer programming", 
     id = 48471L), .Names = c("urlkey", "name", "id")), structure(list(
     urlkey = "opensource", name = "Open Source", id = 563L), .Names = c("urlkey", 
    "name", "id"))), joined = 1416349237000, link = "http://www.meetup.com/members/156440062", 
    bio = "Analytics engineer. Primarily work in the Hadoop space.", 
    lon = -122.65, other_services = structure(list(), .Names = character(0)), 
    name = "Aaron Wirick", visited = 1443078098000, self = structure(list(
     common = structure(list(), .Names = character(0))), .Names = "common"), 
    id = 156440062L, state = "OR", lat = 45.56, status = "active"), .Names = c("country", 
"city", "topics", "joined", "link", "bio", "lon", "other_services", 
"name", "visited", "self", "id", "state", "lat", "status"))))) 

看,你可以看到,爲了獲得在嵌入列表對象的名稱,那就是屬性的,你需要做的這個值:

names(attr(obj, "JSON")[[1]][[1]]) 
#------------ 
[1] "country"  "city"   "topics"   "joined"   "link"   
[6] "bio"   "lon"   "other_services" "name"   "visited"  
[11] "self"   "id"    "state"   "lat"   "status"  

希望我能更多的幫助,但至少你明白錯誤來自哪裏。 (我也希望在該包的幫助頁面上有更多示例。)

+0

謝謝!這爲我的診斷工具添加了一個竅門,讓我可以在引擎蓋下看看,但它不能解決我的問題! –

+0

'dput'爲您提供了比只使用'str'更多的信息。 –