2017-10-19 85 views
0

我颳了以下網站:https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio刮JavaScript對象和R /內轉換成JSON Rvest

我試圖讓貨幣匯率表到通過rvest包R的數據框,但表格本身是在HTML代碼中的JavaScript變量中配置的。

我所在的相關CSS選擇器,現在我有這個:

library(rvest)  
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>% 
     read_html() %>% 
     html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') 

我的輸出是現在下面的JavaScript腳本,作爲XML節點集:

<script> 
$(document).ready(function(){ 
    var valor = '{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"}, {"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"}, {"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"}, {"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"}, {"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"}, {"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}], "tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]}'; 
    if(valor != '{}'){ 
     var objJSON = eval("(" + valor + ")"); 
     var tabla="<tbody>"; 
     for (var i = 0; i < objJSON["tablaDolar"].length; i++) { 
      tabla+= "<tr>"; 
      tabla+= "<td>" + objJSON["tablaDolar"][i].nombreDolar + "</td>"; 
      tabla+= "<td>$" + objJSON["tablaDolar"][i].compra + "</td>"; 
      tabla+= "<td>$" + objJSON["tablaDolar"][i].venta + "</td>"; 
      tabla+= "</tr>"; 
     } 
     tabla+= "</tbody>"; 
     $("#tablaDolar").append(tabla); 
     var tabla2=""; 
     for (var i = 0; i < objJSON["tablaDivisas"].length; i++) { 
      tabla2+= "<tr>"; 
      tabla2+= "<td>" + objJSON["tablaDivisas"][i].nombreDivisas + "</td>"; 
      tabla2+= "<td>$" + objJSON["tablaDivisas"][i].compra + "</td>"; 
      tabla2+= "<td>$" + objJSON["tablaDivisas"][i].venta + "</td>"; 
      tabla2+= "</tr>"; 
     } 
     tabla2+= "</tbody>"; 
     $("#tablaDivisas").append(tabla2); 
    } 
    bmnIndicadoresResponsivoInstance.cloneResponsive(0); 
}); 
</script> 

我的問題是,怎麼辦我刪除了幾乎所有的JavaScript函數/操作符,以僅獲取此數據,並將其最終轉換爲JSON表,如下所示:

{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"}, 
{"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"}, 
{"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"}, 
{"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"}, 
{"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"}, 
{"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}], 
"tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]} 

換句話說,我需要從使用R.

出於某種原因,我有麻煩完成這件事都在R(JS腳本提取「英勇」的變量,而不必變量導出爲外部.txt文件,然後使用一個子)

回答

0

你可以這樣做:

library(rvest)  
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>% 
    read_html() %>% 
    html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') %>% 
    as_list() 

banorte_vec <- strsplit(banorte[[c(1,1)]],"\r\n")[[1]] 
valor <- grep("valor = ", banorte_vec, value = T) 
valor <- gsub("\tvar valor = ","",valor) 
valor <- gsub("';$","",valor) 
valor <- gsub("^'","",valor) 

library(jsonlite) 
result <- fromJSON(valor) 
result 

$tablaDivisas 
    nombreDivisas compra venta 
1 FRANCO SUIZO 18.60 19.45 
2 LIBRA ESTERLINA 24.20 25.15 
3  YEN JAPONES 0.1635 0.171 
4 CORONA SUECA 2.15 2.45 
5 DOLAR CANADA 14.50 15.35 
6   EURO 21.75 22.60 

$tablaDolar 
    nombreDolar compra venta 
1 VENTANILLA 17.73 19.15 
0

肯定一點更重量級的答案,但推廣到其它更粗糙的「JavaScript的問題」。

library(rvest) 
library(stringi) 
library(V8) 
library(tidyverse) 

banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>% 
     read_html() %>% 
     html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') 

我們將建立一個JavaScript V8背景:

ctx <- v8() 

然後:

  • 得到<script>內容
  • 將其切分爲線
  • 把它變成一個純字符矢量
  • 取出克魯夫特
  • 評估的JavaScript

這是不是太糟糕:

html_text(banorte) %>% 
    stri_split_lines() %>% 
    flatten_chr() %>% 
    keep(stri_detect_regex, "^\tvar") %>% 
    ctx$eval() 

由於JavaScript是一個JSON字符串,我們做的EVAL中的R VS V8:

jsonlite::fromJSON(ctx$get("valor")) 
## $tablaDivisas 
##  nombreDivisas compra venta 
## 1 FRANCO SUIZO 18.60 19.45 
## 2 LIBRA ESTERLINA 24.20 25.15 
## 3  YEN JAPONES 0.1635 0.171 
## 4 CORONA SUECA 2.15 2.45 
## 5 DOLAR CANADA 14.50 15.35 
## 6   EURO 21.75 22.60 
## 
## $tablaDolar 
## nombreDolar compra venta 
## 1 VENTANILLA 17.73 19.15 

如果在javascript中有其他有用的處理,這會更好地推廣。

注意:我的Chrome測試版頻道中的Google翻譯沒有很好地翻譯該網站,但我認爲您非常接近違反「TérminosLegales」頁面上第6項的精神,但直到我可以翻譯它我不能完全說明。當/如果我能和看起來像你一樣,我會刪除它。