2016-09-28 66 views
7

我不知道如何正確地將我的JSON數據轉換爲有用的數據框。這是顯示了我的數據結構的一些示例數據:如何將複雜的JSON數據轉換爲單個數據框?

{ 
"data":[ 
{"track":[ 
{"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":37}}, 
{"time":"2016","midpoint":{"x":6,"y":9},"realworld":{"x":2,"y":3},"coordinate":{"x":16,"y":38}} 
]}, 
{"track":[ 
{"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2016","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2015","midpoint":{"x":3,"y":15},"realworld":{"x":-9,"y":2},"coordinate":{"x":17,"y":38}} 
]}, 
{"track":[ 
{"time":"2015","midpoint":{"x":6,"y":7},"realworld":{"x":-2,"y":3},"coordinate":{"x":16,"y":39}} 
]}]} 

我有很多的曲目,我想數據集看起來像這樣:

track time midpoint realworld coordinate 
1 
1 
1 
2 
2 
2 
2 
3 

到目前爲止,我有這個:

json_file <- "testdata.json" 
data <- fromJSON(json_file) 
data2 <- list.stack(data, fill=TRUE) 

現在它出來是這樣的:

data output

我怎樣才能以適當的格式得到這個?

回答

4

當用fromJSON讀取時,添加flatten = TRUE參數。這將爲您提供一個嵌套列表,其中包含最深層次的三個數據框列表。使用:

library(jsonlite) 
# read the json 
jsondata <- fromJSON(txt, flatten = TRUE) 

# bind the dataframes in the nested 'track' list together  
dat <- do.call(rbind, jsondata$data$track) 

# add a track variable 
dat$track <- rep(1:length(jsondata$data$track), sapply(jsondata$data$track, nrow)) 

給出:

> dat 
    time midpoint.x midpoint.y realworld.x realworld.y coordinate.x coordinate.y track 
1 2015   6   8   1   3   16   38  1 
2 2015   6   8   1   3   16   37  1 
3 2016   6   9   2   3   16   38  1 
4 2015   5   9   -1   3   16   38  2 
5 2015   5   9   -1   3   16   38  2 
6 2016   5   9   -1   3   16   38  2 
7 2015   3   15   -9   2   17   38  2 
8 2015   6   7   -2   3   16   39  3 

另一個,更短的,方法是在組合使用jsonliterbindlistdata.table包:

library(jsonlite) 
library(data.table) 
# read the json 
jsondata <- fromJSON(txt, flatten = TRUE) 
# bind the dataframes in the nested 'track' list together 
# and include an id-column at the same time 
dat <- rbindlist(jsondata$data$track, idcol = 'track') 

,或者在bind_rowsdplyr包類似的方式:

library(dplyr) 
dat <- bind_rows(jsondata$data$track, .id = 'track') 

使用的數據

txt <- '{ 
"data":[ 
{"track":[ 
{"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":37}}, 
{"time":"2016","midpoint":{"x":6,"y":9},"realworld":{"x":2,"y":3},"coordinate":{"x":16,"y":38}} 
]}, 
{"track":[ 
{"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2016","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, 
{"time":"2015","midpoint":{"x":3,"y":15},"realworld":{"x":-9,"y":2},"coordinate":{"x":17,"y":38}} 
]}, 
{"track":[ 
{"time":"2015","midpoint":{"x":6,"y":7},"realworld":{"x":-2,"y":3},"coordinate":{"x":16,"y":39}} 
]}]}' 
+1

IMO它本來的超級信息在你的迭代留到最後解決方案,但最終的結果是++優雅。 – hrbrmstr

+0

@hrbrmstr thx :-)但是你的意思是什麼?「在迭代中留下最後的解決方案」*? – Jaap

+0

啊,你說得對。我誤解了最終版本,並認爲你已經刪除了sapply(它只是格式不同)。我現在試圖弄清楚爲什麼'purrr :: flatten_df()' - 它只是真的調用'dplyr :: bind_rows()' - 不適用於此。我還設法通過一些'purrr'調用來對R進行分段,所以這對OP來說是一個很好的職位(需要提交一些GH問題)。 – hrbrmstr

2

薩赫勒的回答(如果它不是已經被刪除)是一種誤導,因爲stream_in是ndjson和你沒有ndjson。你只需要對嵌套列表進行一番討論。我認爲有以下可以變得更小,但它是一個快速,直接攻擊黑客:

library(jsonlite) 
library(purrr) 
library(readr) 

dat <- fromJSON(txt, simplifyVector=FALSE) # read in your JSON 
map(dat$data, "track") %>%     # move past the top-level "data" element and iterate over the "track"s 
    map_df(function(track) {     # iterate over each element of "track" 
    map_df(track, ~as.list(unlist(track))) # convert it to a data frame 
    }, .id="track") %>%      # add in the track "id" 
    type_convert()       # convert mangled types 
## # A tibble: 8 × 8 
## track time midpoint.x midpoint.y realworld.x realworld.y coordinate.x coordinate.y 
## <int> <int>  <int>  <int>  <int>  <int>  <int>  <int> 
## 1  1 2016   6   9   2   3   16   38 
## 2  1 2016   6   9   2   3   16   38 
## 3  1 2016   6   9   2   3   16   38 
## 4  2 2015   3   15   -9   2   17   38 
## 5  2 2015   3   15   -9   2   17   38 
## 6  2 2015   3   15   -9   2   17   38 
## 7  2 2015   3   15   -9   2   17   38 
## 8  3 2015   6   7   -2   3   16   39 

這也讓你體面的列類型,雖然你可能想使用col_types參數readr::type_converttime成一個字符向量。

或者:

library(jsonlite) 
library(purrr) 
library(tibble) 

dat <- fromJSON(txt, flatten=TRUE) # read in your JSON 
map_df(dat$data$track, as_tibble, .id="track")