2016-11-23 22 views
3

我想刮的網站:link加上引號,以JSON R中

我用GEThttr,並獲得JSON精簡版的對象,但不包括引號,象下面這樣:

"hxbase_json1({sum:3003,list:[{Number:'1'... 

所以jsonlite::fromJSON無法讀取該JSON ..

我的代碼是

url <- 'http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?' 
date <- '2015-12-31' 
page <- 1 

res <- GET(url, query = list(date = date, 
          count = 20, 
          pname = 20, 
          titType = 'null', 
          page = page 
          )) 

resC <- content(res) 
resC1 <- jsonlite::fromJSON(resC) 

我想知道是否有任何包自動向json添加引號的包?或者有無論如何閱讀這樣的JSON?

回答

4

將來,請發佈您的R代碼和正確的URL。這在技術上不是JSON數據,它是一個JavaScript構造(它們不一樣)。你可以做一些手術,並獲得V8包的幫助:

library(httr) 
library(V8) 
library(stringi) 

res <- GET("http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?date=2015-12-31&count=20&pname=20&titType=null&page=1&callback=hxbase_json11479871629254") 

ctx <- v8() 

content(res) %>% 
    stri_replace_first_fixed("hxbase_json1(", "var dat=") %>% 
    stri_replace_last_fixed(")", "") %>% 
    ctx$eval() 

ctx$get("dat") %>% 
    dplyr::glimpse() 
## List of 2 
## $ sum : int 3003 
## $ list:'data.frame': 20 obs. of 13 variables: 
## ..$ Number  : chr [1:20] "1" "2" "3" "4" ... 
## ..$ StockNameLink: chr [1:20] "stock_bg.aspx?code=000002&date=2015-12-31" "stock_bg.aspx?code=601601&date=2015-12-31" "stock_bg.aspx?code=000550&date=2015-12-31" "stock_bg.aspx?code=000001&date=2015-12-31" ... 
## ..$ industry  : chr [1:20] "萬科A(000002)" "中國太保(601601)" "江鈴汽車(000550)" "平安銀行(000001)" ... 
## ..$ stockNumber : chr [1:20] "24.36" "24.07" "23.01" "18.69" ... 
## ..$ industryrate : chr [1:20] "90.27" "86.41" "84.29" "84.14" ... 
## ..$ Pricelimit : chr [1:20] "A" "A" "A" "A" ... 
## ..$ lootingchips : chr [1:20] "15.00" "15.00" "9.03" "15.00" ... 
## ..$ Scramble  : chr [1:20] "15.00" "12.00" "20.00" "15.00" ... 
## ..$ rscramble : chr [1:20] "8.00" "6.00" "18.00" "8.00" ... 
## ..$ Strongstock : chr [1:20] "27.91" "29.34" "14.25" "27.45" ... 
## ..$ Hstock  : chr [1:20] " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-14/1202040307.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-28/1202085787.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-19/1202057166.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-10/1202033377.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ ... 
## ..$ Wstock  : chr [1:20] "<a href =\"http://stockdata.stock.hexun.com/000002.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/601601.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000550.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000001.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" ... 
## ..$ Tstock  : chr [1:20] "<img alt=\"\" onclick=\"addIStock('000002','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('601601','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000550','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000001','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" ...