2016-12-04 37 views
1

我試圖拉平深深地/不規則嵌套列表/ JSON對象到數據幀中R.如何扁平化深深和不規則嵌套列表/ JSON R中

鍵名稱是一致的,但數量嵌套元素的不同從一個元素到下一個元素。

我試過使用jsonlitetidyr::unnest函數展平列表,但tidyr::unnest不能展開包含多個新列的列表列。我也嘗試使用purrr包中的map函數,但無法獲取任何內容。

JSON數據的子集位於下方,列表對象包含在本文末尾。

[ 
    { 
    "name": ["Hillary Clinton"], 
    "type": ["PERSON"], 
    "metadata": { 
     "mid": ["/m/0d06m5"], 
     "wikipedia_url": ["http://en.wikipedia.org/wiki/Hillary_Clinton"] 
    }, 
    "salience": [0.2883], 
    "mentions": [ 
     { 
     "text": { 
      "content": ["Clinton"], 
      "beginOffset": [132] 
     }, 
     "type": ["PROPER"] 
     }, 
     { 
     "text": { 
      "content": ["Mrs."], 
      "beginOffset": [127] 
     }, 
     "type": ["COMMON"] 
     }, 
     { 
     "text": { 
      "content": ["Clinton"], 
      "beginOffset": [403] 
     }, 
     "type": ["PROPER"] 
     }, 
     { 
     "text": { 
      "content": ["Mrs."], 
      "beginOffset": [398] 
     }, 
     "type": ["COMMON"] 
     }, 
     { 
     "text": { 
      "content": ["Hillary Clinton"], 
      "beginOffset": [430] 
     }, 
     "type": ["PROPER"] 
     } 
    ] 
    }, 
    { 
    "name": ["Trump"], 
    "type": ["PERSON"], 
    "metadata": { 
     "mid": ["/m/0cqt90"], 
     "wikipedia_url": ["http://en.wikipedia.org/wiki/Donald_Trump"] 
    }, 
    "salience": [0.245], 
    "mentions": [ 
     { 
     "text": { 
      "content": ["Trump"], 
      "beginOffset": [24] 
     }, 
     "type": ["PROPER"] 
     }, 
     { 
     "text": { 
      "content": ["Mr."], 
      "beginOffset": [20] 
     }, 
     "type": ["COMMON"] 
     } 
    ] 
    } 
] 

而期望的輸出將是一個像下面的數據框,其中外部元素重複,每個最內部的元素都有自己的行。

name    type  metadata.mid metadata.wikipedia_url       salience mentions.text.content mentions.text.beginOffset mentions.type 
Hillary Clinton  PERSON  /m/0d06m5  http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883  Clinton     132       PROPER 
Hillary Clinton  PERSON  /m/0d06m5  http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883  Mrs.     127       COMMON 
Hillary Clinton  PERSON  /m/0d06m5  http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883  Clinton     403       PROPER 
Hillary Clinton  PERSON  /m/0d06m5  http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883  Mrs.     398       COMMON 
Hillary Clinton  PERSON  /m/0d06m5  http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883  Hillary Clinton   430       PROPER 
Trump    PERSON  /m/0cqt90  http://en.wikipedia.org/wiki/Donald_Trump  0.245  Trump     24       PROPER 
Trump    PERSON  /m/0cqt90  http://en.wikipedia.org/wiki/Donald_Trump  0.245  Mr.      20       COMMON 

是否有一種通用/可擴展的方法來平滑這種類型的數據?


的R列表對象:

nested_list <- list(structure(list(name = "Hillary Clinton", type = "PERSON", 
    metadata = structure(list(mid = "/m/0d06m5", wikipedia_url = "http://en.wikipedia.org/wiki/Hillary_Clinton"), .Names = c("mid", 
    "wikipedia_url")), salience = 0.28831193, mentions = list(
     structure(list(text = structure(list(content = "Clinton", 
      beginOffset = 132L), .Names = c("content", "beginOffset" 
     )), type = "PROPER"), .Names = c("text", "type")), structure(list(
      text = structure(list(content = "Mrs.", beginOffset = 127L), .Names = c("content", 
      "beginOffset")), type = "COMMON"), .Names = c("text", 
     "type")), structure(list(text = structure(list(content = "Clinton", 
      beginOffset = 403L), .Names = c("content", "beginOffset" 
     )), type = "PROPER"), .Names = c("text", "type")), structure(list(
      text = structure(list(content = "Mrs.", beginOffset = 398L), .Names = c("content", 
      "beginOffset")), type = "COMMON"), .Names = c("text", 
     "type")), structure(list(text = structure(list(content = "Hillary Clinton", 
      beginOffset = 430L), .Names = c("content", "beginOffset" 
     )), type = "PROPER"), .Names = c("text", "type")))), .Names = c("name", 
"type", "metadata", "salience", "mentions")), structure(list(
    name = "Trump", type = "PERSON", metadata = structure(list(
     mid = "/m/0cqt90", wikipedia_url = "http://en.wikipedia.org/wiki/Donald_Trump"), .Names = c("mid", 
    "wikipedia_url")), salience = 0.24501903, mentions = list(
     structure(list(text = structure(list(content = "Trump", 
      beginOffset = 24L), .Names = c("content", "beginOffset" 
     )), type = "PROPER"), .Names = c("text", "type")), structure(list(
      text = structure(list(content = "Mr.", beginOffset = 20L), .Names = c("content", 
      "beginOffset")), type = "COMMON"), .Names = c("text", 
     "type")))), .Names = c("name", "type", "metadata", "salience", 
"mentions"))) 

回答

3

一種方法:

map_df(nested_list, function(x) { 

    df <- flatten_df(x[c("name", "type", "metadata", "salience")]) 

    map_df(x$mentions, ~c(as.list(.$text), mentions_type=.$type)) %>% 
    mutate(name=df$name, type=df$type, mid=df$mid, 
      wikipedia_url=df$wikipedia_url, salience=df$salience) 

}) %>% glimpse() 
## Observations: 7 
## Variables: 8 
## $ content  <chr> "Clinton", "Mrs.", "Clinton", "Mrs.", "Hillary Clinton", "Trump", "Mr." 
## $ beginOffset <int> 132, 127, 403, 398, 430, 24, 20 
## $ mentions_type <chr> "PROPER", "COMMON", "PROPER", "COMMON", "PROPER", "PROPER", "COMMON" 
## $ name   <chr> "Hillary Clinton", "Hillary Clinton", "Hillary Clinton", "Hillary Clinton", "Hillary Clinton", "Trump", "Trump" 
## $ type   <chr> "PERSON", "PERSON", "PERSON", "PERSON", "PERSON", "PERSON", "PERSON" 
## $ mid   <chr> "/m/0d06m5", "/m/0d06m5", "/m/0d06m5", "/m/0d06m5", "/m/0d06m5", "/m/0cqt90", "/m/0cqt90" 
## $ wikipedia_url <chr> "http://en.wikipedia.org/wiki/Hillary_Clinton", "http://en.wikipedia.org/wiki/Hillary_Clinton", "http://en.wikipedia.org/wiki/Hillary_Clinton", "http://en.wikiped... 
## $ salience  <dbl> 0.2883119, 0.2883119, 0.2883119, 0.2883119, 0.2883119, 0.2450190, 0.2450190 
+0

謝謝!要檢索'mentions> type'字段,我會做一些類似於map_df(x $ mentions,...'line?或者,您是否有足夠的資源來理解如何使用'purrr :: map * '函數? – Brian

+1

Aye。答案已更新,以添加該字段。我在1月份在rstudioconf上對'purrr'(更一般地說,管道)進行了演示,所以我會嘗試在此處添加鏈接,但是您可以觀察在他們的網站或http://rud.is/b在一月之後。 – hrbrmstr