在R網站上抓取數據網頁

正在使用R和rvest從www.nseindia.com網站抓取數據。我第一次能夠下載數據，但之後出現以下錯誤消息...在R網站上抓取數據網頁

使用方法錯誤（「xml_find_all」）：沒有適用於'xml_find_all'的方法應用於類「字符的對象「

想獲得股指期貨的第一行

我的代碼如下

library("rvest") 

    website_nifty_future_live<- read_html("https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures") 

    nifty_spot<- website_nifty_future_live %>% 
     + html_nodes(".alt:nth-child(2) td:nth-child(13)") %>% 
     + html_text() 
    nifty_spot<-as.numeric(gsub(",","",nifty_spot))

來源

2017-09-13 Himadri

我已經測試了MacOS和Debian上的代碼。工作正常，評估後沒有錯誤。 rvest版本0.3.2，R版本R版本3.3.3。 – Gonzo

正在使用Windows時，您重新運行代碼時會發生問題。感謝您的反饋意見。欣賞！ – Himadri

代碼中的「+」會導致該錯誤。嘗試刪除「+」號後 – SBista

的錯誤很可能是由於‘在你的代碼的開頭+’的招牌 - 我沒有得到th刪除它們時出錯。

我建議使用下面的代碼閱讀完整的表作爲data.frame：

library("rvest") 

url_nifty <- "https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures" 
website_nifty_future_live<- read_html(url_nifty) 

nifty_spot<- website_nifty_future_live %>% 
    html_nodes("#tab26Content > table:nth-child(1)") %>% 
    html_table(header = NA, trim = TRUE, fill = FALSE, dec = ".") %>% 
    as.data.frame()

它是那麼當然很容易得到的第一行含。標題，例如與

nifty_spot[1, ] 
    Instrument Underlying Expiry.Date Option.Type Strike.Price Open.Price High.Price Low.Price Prev..Close Last.Price Volume Turnover.lacs. 
1 Index Futures  NIFTY 28SEP2017   -   - 10,105.00 10,144.70 10,078.00 10,107.90 10,096.90 94,799 7,18,943.53 
    Underlying..Value 
1   10079.3

希望它有幫助！

來源

2017-09-13 14:03:38 TomS

在R網站上抓取數據網頁

回答

相關問題