1
rvest
包的簡單應用:我試圖從一個網站上刮掉一類html鏈接。在R中進行網頁瀏覽,訪問html節點
此代碼讓我看起來像一個網站的權節點:
library(rvest)
library(magrittr)
foo <- "http://www.realclearpolitics.com/epolls/2010/house/2010_elections_house_map.html" %>%
read_html
另外,我所在使用CSS選擇合適的節點:
foo %>%
html_nodes("#states td") %>%
extract(2:4)
返回
{xml_nodeset (3)}
[1] <td>\n <a class="dem" href="/epolls/2010/house/ar/arkansas_4th_district_rankin_vs_ross-1343.html">\n <span>AR4</span>\n </a>\n</td>
[2] <td>\n <a class="dem" href="/epolls/2010/house/ct/connecticut_1st_district_brickley_vs_larson-1713.html">\n <span>CT1</span>\n </a>\n</td>
[3] <td>\n <a class="dem" href="/epolls/2010/house/ct/connecticut_2nd_district_peckinpaugh_vs_courtney-1715.html">\n <span>CT2</span>\n </a>\n</td>
好吧,所以href
屬性是我正在尋找。但是,這
foo %>%
html_nodes("#states td") %>%
extract(2:4) %>%
html_attr("href")
回報
[1] NA NA NA
我如何訪問底層鏈接?
嘗試'foo%>%html_nodes(「#states td a」)%>%extract(2:4)%>%html_attr(「href」)' – Jay
@jay你應該做出答案。湯姆:你並不是針對主播和杰倫的解決方案。 – hrbrmstr