R代表webscraping - 拉價格和名稱

我想獲得的價格，並從下面的網址蒸汽網站遊戲名稱的列表，但無法弄清楚如何xpathSApplyshould解析下面：R代表webscraping - 拉價格和名稱

http://store.steampowered.com/search/?sort_by=Price & SORT_ORDER = ASC &放大器;';「>價格

這裏是我的代碼

require(RCurl) 
require(XML) 
url <- "http://store.steampowered.com/search/results?sort_by=Name&sort_order=ASC&category1=1" 
SOURCE <- getURL(url,encoding="UTF-8") #Download the page 
substring (SOURCE,1,200) 
PARSED <- htmlParse(SOURCE) #Format the html code 
##My problem is in this line below 
(xpathSApply(PARSED, "//div[@class='col search_price']"))

來源

2014-08-27 Austin Trombley

試試這個：

require(RCurl) 
require(XML) 
url <- "http://store.steampowered.com/search/?sort_by=Metascore&sort_order=DESC&" 
SOURCE <- getURL(url, encoding="UTF-8") #Download the page 
PARSED <- htmlParse(SOURCE, asText = TRUE, encoding = "utf-8") 
xpaths <- c(price="//a/div[@class='col search_price']", 
      title="//div[@class='col search_name ellipsis']/h4") 
res <- sapply(xpaths, function(x) xpathSApply(PARSED, x, xmlValue, trim = TRUE)) 
head(res) 
#  price title       
# [1,] "9,99€" "Half-Life 2"     
# [2,] "9,99€" "Half-Life"     
# [3,] "19,99€" "BioShock™"     
# [4,] "18,99€" "The Orange Box"    
# [5,] "19,99€" "Portal 2"     
# [6,] "14,99€" "The Elder Scrolls V: Skyrim"

來源

2014-08-27 20:34:28 lukeA

很好的答案，尤其是創建標頭的技巧。我使用此方法：PARSED < - htmlTreeParse（url，useInternal = TRUE）而不是RCurl plus getURL（）加上htmlParse。有什麼不同？ – lawyeR 2014-08-28 01:59:43

謝謝@lawyeR。 Afaik，'htmlParse'只是'htmlTreeParse（useInternalNodes = TRUE，...）'的快捷方式。我離開了OP的'RCurl'，因爲它在需要的時候給了你更大的抓取控制。 – lukeA 2014-08-28 09:00:05

R代表webscraping - 拉價格和名稱

回答

相關問題