2017-05-25 46 views
0

我想用tidyJSON從JSON中提取信息,但是我對任何可以實現我目的的R包都是開放的。我看了一下文件和vignittes,發現complex example很有幫助。但是,我想要的信息嵌套在非鍵值對中,我不確定如何訪問它。我感興趣的是得到appidnamedeveloper等,但這些信息是內570730R:Web抓取JSON,從嵌套中提取信息

{"570":{"appid":570,"name":"Dota 2","developer":"Valve","publisher":"Valve","score_rank":71,"owners":102151578,"owners_variance":259003,"players_forever":102151578,"players_forever_variance":259003,"players_2weeks":9436299,"players_2weeks_variance":89979,"average_forever":11727,"average_2weeks":1229,"median_forever":277,"median_2weeks":662,"ccu":811259,"price":"0","tags":{"Free to Play":22678,"MOBA":7808,"Strategy":7415,"Multiplayer":6757,"Team-Based":4848,"Action":4602,"e-sports":4089,"Online Co-Op":3669,"Competitive":3553,"PvP":2655,"RTS":2267,"Difficult":2129,"RPG":2114,"Fantasy":2044,"Tower Defense":2024,"Co-op":1898,"Character Customization":1514,"Replay Value":1487,"Action RPG":1397,"Simulation":1024}}, 

"730":{"appid":730,"name":"Counter-Strike: Global Offensive","developer":"Valve","publisher":"Valve","score_rank":78,"owners":29225079,"owners_variance":154335,"players_forever":28552354,"players_forever_variance":152685,"players_2weeks":9102348,"players_2weeks_variance":88410,"average_forever":17648,"average_2weeks":791,"median_forever":5030,"median_2weeks":358,"ccu":543626,"price":"1499","tags":{"FPS":17082,"Multiplayer":13744,"Shooter":12833,"Action":10881,"Team-Based":10369,"Competitive":9664,"Tactical":8529,"First-Person":7329,"e-sports":6716,"PvP":6383,"Online Co-Op":5714,"Military":4621,"Co-op":4435,"Strategy":4424,"War":4361,"Realistic":3196,"Trading":3191,"Difficult":3158,"Fast-Paced":3100,"Moddable":2496}} 

有成千上萬這樣的條目。有沒有一種方法可以跳過「頂級」並在窩內查找?
的JSON信息是從http://steamspy.com/api.php?request=top100in2weeks

+2

你可以試試'listviewer :: jsonedit'幫你先查看一下數據。可能[jsonlite](https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html)可幫助您提取所需內容。 – RobertMc

回答

1

這可能是你所需要的:

library(jsonlite) 
data = fromJSON("http://steamspy.com/api.php?request=top100in2weeks") 

appid = lapply(data, function(x){x$appid}) 
name = lapply(data, function(x){x$name}) 

df = data.frame(appid = unlist(appid), 
       name = unlist(name), 
       stringsAsFactors = F) 

結果:

> head(df) 
     appid        name 
570  570       Dota 2 
730  730 Counter-Strike: Global Offensive 
578080 578080 PLAYERUNKNOWN'S BATTLEGROUNDS 
440  440     Team Fortress 2 
271590 271590    Grand Theft Auto V 
433850 433850   H1Z1: King of the Kill 

我就讓你添加的其餘信息

編輯:將數組添加到數據框

可以在數據框中添加每個遊戲的標籤信息。時間標記爲好。對於每個遊戲,您必須在一列中存儲一組標籤名稱,並在另一列中存儲標籤數量。

後的df定義添加下列行:

for(k in 1:nrow(d)){ 
    d$tags[k] = list(names(data[[k]]$tags)) 
    d$tagsQ[k] = list(unlist(data[[k]]$tags)) 
} 

這會給你:

> d["570",] 
    appid name 
570 570 Dota 2 

tags 
570 Free to Play, MOBA, Strategy, Multiplayer, Team-Based, Action, e-sports, Online Co-Op, Competitive, PvP, RTS, Difficult, RPG, Fantasy, Tower Defense, Co-op, Character Customization, Replay Value, Action RPG, Simulation 

tagsQ 
570 22686, 7810, 7420, 6759, 4850, 4603, 4092, 3672, 3555, 2657, 2267, 2130, 2116, 2045, 2024, 1898, 1514, 1487, 1397, 1023 

在這種情況下,列tagstagsQ包含列表。爲了獲得第二標籤和數量appid 570做:

> df["570","tags"][[1]][2] 
[1] "MOBA" 

> d["570","tagsQ"][[1]][2] 
MOBA 
7810 
+0

謝謝。我也在努力將「標籤」字段轉換爲可以放入數據框的數據結構。我最終得到一個無法插入數據框的命名列表。是否有一種簡單的方法將標記轉換爲數據框中的虛擬布爾列,或者將數據連接成數據框字段中的逗號分隔值?我對列表結構非常不滿。 – user2205916

+0

我試過了:http://www.r-tutor.com/r-introduction/list/named-list-members和https://stackoverflow.com/questions/32059798/list-of-named-lists-to數據框架和谷歌 – user2205916