一種選擇是單獨定位並替換問題列。
保證金欄能夠與xpath
# get the html
html <- URL %>%
read_html()
# Example using the first margin column (column # 6)
html %>%
html_nodes(xpath = '//table[2]') %>% # get table 2
html_nodes(xpath = '//td[6]/text()') %>% # get column 6 using text()
iconv("UTF-8", "UTF-8") # to convert "−" to "-"
# [1] "−10.44%" "−3.00%" "−0.83%" "−0.51%" "0.09%" "0.17%" "0.57%"
# [8] "0.70%" "1.45%" "2.06%" "2.46%" "3.01%" "3.12%" "3.86%"
#[15] "4.31%" "4.48%" "4.79%" "5.32%" "5.56%" "6.05%" "6.12%"
#[22] "6.95%" "7.27%" "7.50%" "7.72%" "8.51%" "8.53%" "9.74%"
#[29] "9.96%" "10.08%" "10.13%" "10.85%" "11.80%" "12.20%" "12.25%"
#[36] "14.20%" "14.44%" "15.40%" "17.41%" "17.76%" "17.81%" "18.21%"
#[43] "18.83%" "22.58%" "23.15%" "24.26%" "25.22%" "26.17%"
有針對性做其他保證金列相同。我使用iconv
將−
轉換爲-
,因爲這是一個編碼問題,但您可以使用基於替換的解決方案(例如使用sub
)。
要與美國總統的名字目標列,您可以再次使用XPath:
html %>%
html_nodes(xpath = '//table[2]') %>%
html_nodes(xpath = '//td[3]/a/text()') %>%
html_text()
# [1] "John Quincy Adams" "Rutherford Hayes" "Benjamin Harrison"
# [4] "George W. Bush" "James Garfield" "John Kennedy"
# [7] "Grover Cleveland" "Richard Nixon" "James Polk"
#[10] "Jimmy Carter" "George W. Bush" "Grover Cleveland"
#[13] "Woodrow Wilson" "Barack Obama" "William McKinley"
#[16] "Harry Truman" "Zachary Taylor" "Ulysses Grant"
#[19] "Bill Clinton" "William Henry Harrison" "William McKinley"
#[22] "Franklin Pierce" "Barack Obama" "Franklin Roosevelt"
#[25] "George H. W. Bush" "Bill Clinton" "William Taft"
#[28] "Ronald Reagan" "Franklin Roosevelt" "Abraham Lincoln"
#[31] "Abraham Lincoln" "Dwight Eisenhower" "Ulysses Grant"
#[34] "James Buchanan" "Andrew Jackson" "Martin Van Buren"
#[37] "Woodrow Wilson" "Dwight Eisenhower" "Herbert Hoover"
#[40] "Franklin Roosevelt" "Andrew Jackson" "Ronald Reagan"
#[43] "Theodore Roosevelt" "Lyndon Johnson" "Richard Nixon"
#[46] "Franklin Roosevelt" "Calvin Coolidge" "Warren Harding"
也許有人可以建立在這得到了整個表...'read_html(」的 00.001 -10.44%')%>% html_nodes(xpath =」// td/text()[before-sibling :: span]「)%>%html_text()' – cory