GET {httr}返回一個錯誤的請求響應

我試圖刮取存儲在searchlink中的url的html元素。唯一適用於我的方法是htmlTreeParse {XML}。但它不返回我正在查找的元素。例如：img[@title='Add to compare']GET {httr}返回一個錯誤的請求響應

searchlink <- "http://www.realtor.ca/Map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1" 

doc <- htmlTreeParse(searchlink,useInternalNodes = T) 


    classes <- xpathSApply(doc,"//img[@title='Add to compare']",function(x){xmlGetAttr(x,'class')})

上面運行的類的結果：

list()

我自己也嘗試readLines和GET {} HTTR但他們無論是在讀取URL返回一個錯誤。我猜這是因爲URL中的特殊字符，但不知道如何去修復它。下面給出響應：

Response [http://www.realtor.ca/Map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1] 
    Date: 2014-12-01 16:46 
    Status: 400 
    Content-type: text/html; charset=us-ascii 
    Size: 324 B 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd"> 
<HTML><HEAD><TITLE>Bad Request</TITLE> 
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD> 
<BODY><h2>Bad Request - Invalid URL</h2> 
<hr><p>HTTP Error 400. The request URL is invalid.</p> 
</BODY></HTML>

來源

2014-12-01 Bahae Omid

嘗試刪除在URL中的一個#，我只是一個?

library("httr") 
url <- "http://www.realtor.ca/Map.aspx?CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1" 
res <- GET(url) 
tt <- content(res)

更換然後解析HTML內容tt

來源

2014-12-02 01:33:28 sckott

感謝斯科特它的工作！但我仍然無法提取我需要的html元素。我試圖解析該網頁上所有「添加到比較」按鈕的屬性。我試過'xpathSApply（tt，「// img [@ title ='Add to compare']」，function（x）{xmlGetAttr（x，'class'）}）'但它返回NULL。我不確定問題出在哪裏。有任何想法嗎？ – 2014-12-02 02:28:09

如果你做'xpathSApply（tt，「// img [@title]」，xmlGetAttr，name =「title」）'我沒有看到任何帶有'Add to compare'屬性的標題 – sckott 2014-12-02 03:33:24

這就是爲什麼它很奇怪因爲如果你檢查網頁上的元素，你會看到以下內容：' Add to compare '。所以出於某些原因，在閱讀html時，好像整個html代碼不被刮掉。 – 2014-12-02 05:33:15

GET {httr}返回一個錯誤的請求響應

回答

相關問題