1
我在提取有關某些產品的信息的網站,但我遇到了價格方面的問題。我的代碼如下:由R下載的源代碼和網站源代碼的差異
> enlace<-"http://www.carulla.com/products/0000687608965009/Crema+Dental+Sensitive+Proalivio+Colgate"
> download.file(enlace, destfile = "scrapedpage.html", quiet=TRUE)
> doc<-read_html("scrapedpage.html")
> # description
> toString(xml_find_all(doc,xpath=paste0('//*[@id="pdpProduct"]/div[3]/h3')))
[1] "<h3 class=\"pdpInfoProductName\" itemprop=\"name\">Crema Dental Sensitive Proalivio Colgate</h3>"
> # reference
> toString(xml_find_all(doc,xpath=paste0('//*[@id="pdpProduct"]/div[3]/p')))
[1] "<p class=\"pdpInfoProductRef\">\r\n\t\t\t\t\t\t\t\t\tPresentación:C \r\n\t\t\t\t\t\t\t\t\tPLU:739983</p>"
> # prices
> toString(xml_find_all(doc,xpath=paste0('//*[@id="pdpProduct"]/div[3]/div[1]/div[2]/h4')))
[1] ""
我在原來的頁面,在那裏我找到這個檢查源代碼這樣的信息:
<div class="pdpInfoProduct pull-left">
<h3 class="pdpInfoProductName" itemprop="name">Crema Dental Sensitive Proalivio Colgate</h3>
<h2 class="pdpInfoProductBrand" itemprop="brand">COLGATE</h2>
<p class="pdpInfoProductRef">
Presentación:C
PLU:739983</p>
<div class="pdpInfoProductPrices">
<div class="pull-right">
<div class="pro-big-Ovalo">
<p>25%</p>
</div>
</div>
<div class="pdpInfoProductPrice" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="priceCurrency" content="COP" />
<meta itemprop="price" content="17213.0" />
<h4 class="priceOffer">
$17.213</h4>
<h6 class="before">Antes: <span class="strikeText">
$22.950</span>
</h6>
</div>
</div>
我感興趣的信息是17.213 $,但是當我嘗試下載其中R的源代碼,我得到如下:
> con2<-url(enlace,"r")
> x<-readLines(con2)
> close(con2)
> x[1270:1285]
[1] "\t\t\t\t\t\t\t\t\tPLU:739983</p>"
[2] "\t\t\t\t\t\t\t<div class=\"pdpInfoProductPrices\">\t"
[3] "\t\t\t\t\t<div class=\"pdpInfoProductPrice\" itemprop=\"offers\" itemscope itemtype=\"http://schema.org/Offer\">"
[4] "\t\t\t\t\t"
[5] "\t\t\t\t\t<meta itemprop=\"priceCurrency\" content=\"COP\" />"
[6] " <meta itemprop=\"price\" content=\"\" />"
[7] "\t\t\t\t\t\t<h4 class=\"price\">"
[8] "\t\t\t\t\t\t\t</h4>"
[9] "\t\t\t\t\t\t</div>"
[10] "\t\t\t\t</div>"
[11] "\t\t\t\t"
[12] "\t\t\t\t\t\t\t\t\t"
[13] "\t\t\t\t\t\t\t\t\t\t\t\t\t <div class=\"product-seller row-fluid\">"
[14] "\t\t\t\t <!-- +++++ Carulla Seller +++++ --> "
[15] " <p> Vendido por:   Carulla</p> "
[16] " </div>"
即,我獲得\噸\噸\噸\噸\噸\噸\噸,而不是17.213 $。
我會非常感謝您的幫助。