JSoup帶有空格的Java URL（雙重編碼錯誤）

我的第一個問題是 'Document doc = Jsoup.connect（URL）.get（）;'在這個操作中通常會發生什麼編碼和解碼。例如，我可以給它utf-8或utf-16（使用最新的JSoup庫）。

我的第二個問題是以下網址：

 String url = "http://www.chestertons.com/property-to-buy/search-results/properties-in-london-england-to-buy/b-t-llondon, england/?pagesize=60"

如果滾動有英格蘭前的空間，我想編碼的空間，UTF-8，但Jsoup解析器雙重編碼URL，我需要這樣做是因爲Jsoup根本不喜歡空格。

完整的代碼是：

 Document doc = Jsoup.connect(URL).userAgent("Chrome/41.0.2228.0 " 
       + "(Windows NT 6.1)" 
       + "AppleWebKit/537.36 (KHTML, like Gecko) Mozilla/5.0 " 
       + "Safari/537.36").timeout(14000).followRedirects(false). 
       ignoreContentType(true).get();

而且我從控制檯得到的錯誤是：

 SEVERE: IO exception from crawling 
    org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=http://www.chestertons.com/property-to-buy/search-results/properties-in-london-england-to-buy/b-t-llondon%252C%2520england/?pagesize=60

任何幫助或洞察問題，將不勝感激

來源

2017-03-02 Kevster

我設法找到正確的措辭，並通過其他職位判斷它不是「eindeutig」。

這裏是我的解決方法：

 Document doc; 
     doc = Jsoup.parse(new URL(getUrl()).openStream(), "ISO-8859-1", getUrl());

現在我唯一的問題是我如何避免重定向和忽略的內容類型（可能）通過解析來代替。這是我能看到解決問題的唯一方法。

來源

2017-03-02 01:11:39 Kevster

JSoup帶有空格的Java URL（雙重編碼錯誤）

回答

相關問題