2016-04-13 120 views
3

我想讀取具有json內容的文件並將其轉換爲基於某些字段的表格數據。如何將文件中的json條目轉換爲數據框?

該文件包括這樣的內容:

{"senderDateTimeStamp":"2016/04/08 10:03:18","senderHost":null,"senderCode":"web_app","senderUsecase":"appinternalstats_prod","destinationTopic":"web_app_appinternalstats_realtimedata_topic","correlatedRecord":false,"needCorrelationCacheCleanup":false,"needCorrelation":false,"correlationAttributes":null,"correlationRecordCount":0,"correlateTimeWindowInMills":0,"lastCorrelationRecord":false,"realtimeESStorage":true,"receiverDateTimeStamp":1460124283554,"payloadData":{"timestamp":"2016-04-08T10:03:18.244","status":"get","source":"MSG1","ITEM":"TEST1","basis":"","pricingdate":"","content":"","msgname":"","idlreqno":"","host":"web01","Webservermember":"Web"},"payloadDataText":"","key":"web_app:appinternalstats_prod","destinationTopicName":"web_app_appinternalstats_realtimedata_topic","esindex":"web_app","estype":"appinternalstats_prod","useCase":"appinternalstats_prod","Code":"web_app"} 

我需要能夠轉換時間戳,源主機,狀態字段withing payloadData部每行到數據幀中R.

我已經試過這樣:

庫(rjson) d < -fromJSON(文件= 「file.txt的」)

dput(d) 
structure(list(senderDateTimeStamp = "2016/04/08 10:03:18", senderHost = NULL, 
        senderAppcode = "web", senderUsecase = "appinternalstats_prod", 
        destinationTopic = "web_appinternalstats_realtimedata_topic", 
        correlatedRecord = FALSE, needCorrelationCacheCleanup = FALSE, 
        needCorrelation = FALSE, correlationAttributes = NULL, correlationRecordCount = 0, 
        correlateTimeWindowInMills = 0, lastCorrelationRecord = FALSE, 
        realtimeESStorage = TRUE, receiverDateTimeStamp = 1460124283554, 
        payloadData = structure(list(timestamp = "2016-04-08T10:03:18.244", 
               status = "get", source = "MSG1", 
               region = "", evetid = "", osareqid = "", basis = "", 
               pricingdate = "", content = "", msgname = "", recipient = "", 
               objid = "", idlreqno = "", host = "web01", webservermember = "webSingleton"), 
              .Names = c("timestamp", 
              "status", "source", "region", "evetid", 
              "osareqid", "basis", "pricingdate", "content", "msgname", 
              "recipient", "objid", "idlreqno", "host", "webservermember" 
               )), payloadDataText = "", key = "web:appinternalstats_prod", 
        destinationTopicName = "web_appinternalstats_realtimedata_topic", 
        hdfsPath = "web/appinternalstats_prod", esindex = "web", 
        estype = "appinternalstats_prod", useCase = "appinternalstats_prod", 
        appCode = "web"), .Names = c("senderDateTimeStamp", "senderHost", 
               "senderAppcode", "senderUsecase", "destinationTopic", "correlatedRecord", 
               "needCorrelationCacheCleanup", "needCorrelation", "correlationAttributes", 
               "correlationRecordCount", "correlateTimeWindowInMills", "lastCorrelationRecord", 
               "realtimeESStorage", "receiverDateTimeStamp", "payloadData", 
               "payloadDataText", "key", "destinationTopicName", "hdfsPath", 
               "esindex", "estype", "useCase", "appCode")) 

任何想法如何將json條目的payloadData部分轉換爲數據框?

+1

運行代碼給出了一個錯誤:在結構上的錯誤(名單(時間戳=「2016-04-08T10 :03:18.244「,status =」get「,: 'names'屬性[16]的長度必須與矢量[15] – user1357015

+0

@ user1357015相同,我已更新了工作dput輸出的帖子 – user1471980

回答

1

這可能是你想要的東西:

library(rjson) 
d<-fromJSON(file="file.txt") 
myDf <- do.call("rbind", lapply(d, function(x) { 
       data.frame(TimeStamp = x$payloadData$timestamp, 
          Source = x$payloadData$source, 
          Host = $payloadData$host, 
          Status = x$payloadData$status)})) 
+0

我得到了這個錯誤:錯誤:意外的'}'在: 「源= d $ payloadData $源, 狀態= d $ payloadData $狀態}」 >) 錯誤:在意外 ')' 「)」 >) 錯誤:在意外 ')' 「)」 – user1471980

+1

對不起。錯過了「)」。現在應該工作。 – Psidom

+0

d $ payloadData中的錯誤:$運算符對於原子向量無效 – user1471980

1

考慮包tidyjson

library(tidyjson) 
library(magrittr) 

json <- '{"senderDateTimeStamp":"2016/04/08 10:03:18","senderHost":null,"senderCode":"web_app","senderUsecase":"appinternalstats_prod","destinationTopic":"web_app_appinternalstats_realtimedata_topic","correlatedRecord":false,"needCorrelationCacheCleanup":false,"needCorrelation":false,"correlationAttributes":null,"correlationRecordCount":0,"correlateTimeWindowInMills":0,"lastCorrelationRecord":false,"realtimeESStorage":true,"receiverDateTimeStamp":1460124283554,"payloadData":{"timestamp":"2016-04-08T10:03:18.244","status":"get","source":"MSG1","ITEM":"TEST1","basis":"","pricingdate":"","content":"","msgname":"","idlreqno":"","host":"web01","Webservermember":"Web"},"payloadDataText":"","key":"web_app:appinternalstats_prod","destinationTopicName":"web_app_appinternalstats_realtimedata_topic","esindex":"web_app","estype":"appinternalstats_prod","useCase":"appinternalstats_prod","Code":"web_app"}' 

json %>% 
    gather_keys() 

# head() of above 
# document.id     key 
# 1   1 senderDateTimeStamp 
# 2   1   senderHost 
# 3   1   senderCode 
# 4   1  senderUsecase 
# 5   1 destinationTopic 
# 6   1 correlatedRecord 

json %>% 
    enter_object("payloadData") %>% 
    gather_keys() %>% 
    append_values_string() 

# head() of above 
# document.id   key     string 
# 1   1 timestamp 2016-04-08T10:03:18.244 
# 2   1  status      get 
# 3   1  source     MSG1 
# 4   1  ITEM     TEST1 
# 5   1  basis       
# 6   1 pricingdate       
+0

@JasonAiskalns,json數據在一個文件中。我首先將它讀入一個對象中:data <-fromJSON(file =「file.txt」),當我運行你的代碼時,接收到這個錯誤:UseMethod中的錯誤(「as.tbl_json」): no適用於'as.tbl_json'的方法應用於類「list」的對象 – user1471980

相關問題