1
我試圖拉平深深地/不規則嵌套列表/ JSON對象到數據幀中R.如何扁平化深深和不規則嵌套列表/ JSON R中
鍵名稱是一致的,但數量嵌套元素的不同從一個元素到下一個元素。
我試過使用jsonlite
和tidyr::unnest
函數展平列表,但tidyr::unnest
不能展開包含多個新列的列表列。我也嘗試使用purrr
包中的map
函數,但無法獲取任何內容。
JSON數據的子集位於下方,列表對象包含在本文末尾。
[
{
"name": ["Hillary Clinton"],
"type": ["PERSON"],
"metadata": {
"mid": ["/m/0d06m5"],
"wikipedia_url": ["http://en.wikipedia.org/wiki/Hillary_Clinton"]
},
"salience": [0.2883],
"mentions": [
{
"text": {
"content": ["Clinton"],
"beginOffset": [132]
},
"type": ["PROPER"]
},
{
"text": {
"content": ["Mrs."],
"beginOffset": [127]
},
"type": ["COMMON"]
},
{
"text": {
"content": ["Clinton"],
"beginOffset": [403]
},
"type": ["PROPER"]
},
{
"text": {
"content": ["Mrs."],
"beginOffset": [398]
},
"type": ["COMMON"]
},
{
"text": {
"content": ["Hillary Clinton"],
"beginOffset": [430]
},
"type": ["PROPER"]
}
]
},
{
"name": ["Trump"],
"type": ["PERSON"],
"metadata": {
"mid": ["/m/0cqt90"],
"wikipedia_url": ["http://en.wikipedia.org/wiki/Donald_Trump"]
},
"salience": [0.245],
"mentions": [
{
"text": {
"content": ["Trump"],
"beginOffset": [24]
},
"type": ["PROPER"]
},
{
"text": {
"content": ["Mr."],
"beginOffset": [20]
},
"type": ["COMMON"]
}
]
}
]
而期望的輸出將是一個像下面的數據框,其中外部元素重複,每個最內部的元素都有自己的行。
name type metadata.mid metadata.wikipedia_url salience mentions.text.content mentions.text.beginOffset mentions.type
Hillary Clinton PERSON /m/0d06m5 http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883 Clinton 132 PROPER
Hillary Clinton PERSON /m/0d06m5 http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883 Mrs. 127 COMMON
Hillary Clinton PERSON /m/0d06m5 http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883 Clinton 403 PROPER
Hillary Clinton PERSON /m/0d06m5 http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883 Mrs. 398 COMMON
Hillary Clinton PERSON /m/0d06m5 http://en.wikipedia.org/wiki/Hillary_Clinton 0.2883 Hillary Clinton 430 PROPER
Trump PERSON /m/0cqt90 http://en.wikipedia.org/wiki/Donald_Trump 0.245 Trump 24 PROPER
Trump PERSON /m/0cqt90 http://en.wikipedia.org/wiki/Donald_Trump 0.245 Mr. 20 COMMON
是否有一種通用/可擴展的方法來平滑這種類型的數據?
的R列表對象:
nested_list <- list(structure(list(name = "Hillary Clinton", type = "PERSON",
metadata = structure(list(mid = "/m/0d06m5", wikipedia_url = "http://en.wikipedia.org/wiki/Hillary_Clinton"), .Names = c("mid",
"wikipedia_url")), salience = 0.28831193, mentions = list(
structure(list(text = structure(list(content = "Clinton",
beginOffset = 132L), .Names = c("content", "beginOffset"
)), type = "PROPER"), .Names = c("text", "type")), structure(list(
text = structure(list(content = "Mrs.", beginOffset = 127L), .Names = c("content",
"beginOffset")), type = "COMMON"), .Names = c("text",
"type")), structure(list(text = structure(list(content = "Clinton",
beginOffset = 403L), .Names = c("content", "beginOffset"
)), type = "PROPER"), .Names = c("text", "type")), structure(list(
text = structure(list(content = "Mrs.", beginOffset = 398L), .Names = c("content",
"beginOffset")), type = "COMMON"), .Names = c("text",
"type")), structure(list(text = structure(list(content = "Hillary Clinton",
beginOffset = 430L), .Names = c("content", "beginOffset"
)), type = "PROPER"), .Names = c("text", "type")))), .Names = c("name",
"type", "metadata", "salience", "mentions")), structure(list(
name = "Trump", type = "PERSON", metadata = structure(list(
mid = "/m/0cqt90", wikipedia_url = "http://en.wikipedia.org/wiki/Donald_Trump"), .Names = c("mid",
"wikipedia_url")), salience = 0.24501903, mentions = list(
structure(list(text = structure(list(content = "Trump",
beginOffset = 24L), .Names = c("content", "beginOffset"
)), type = "PROPER"), .Names = c("text", "type")), structure(list(
text = structure(list(content = "Mr.", beginOffset = 20L), .Names = c("content",
"beginOffset")), type = "COMMON"), .Names = c("text",
"type")))), .Names = c("name", "type", "metadata", "salience",
"mentions")))
謝謝!要檢索'mentions> type'字段,我會做一些類似於map_df(x $ mentions,...'line?或者,您是否有足夠的資源來理解如何使用'purrr :: map * '函數? – Brian
Aye。答案已更新,以添加該字段。我在1月份在rstudioconf上對'purrr'(更一般地說,管道)進行了演示,所以我會嘗試在此處添加鏈接,但是您可以觀察在他們的網站或http://rud.is/b在一月之後。 – hrbrmstr