2
我遇到特殊字符和charset = iso-8859-1
的問題。 我在這裏使用的代碼與UTF-8一起工作良好,所以我不明白我在做什麼錯。Jsoup - 解析帶有字符集的HTML文件iso-8859-1
下面是代碼:
File input = new File("https://stackoverflow.com/users/marcioapf/example.html");
Document doc = Jsoup.parse(input, "iso-8859-1", "");
Elements elements = doc.select("span.DEPUTADO") ;
System.out.println(elements.toString());
這裏是輸出:
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Joãozinho Pereira</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Isnaldo Bulhões</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Antonio Albuquerque</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Jeferson Morais</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Inácio Loiola</span>
這是應該的:
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Joãozinho Pereira</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Isnaldo Bulhões</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Antonio Albuquerque</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Jeferson Morais</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Inácio Loiola</span>
我怎樣才能解決呢?
如果首先將整個文件加載到內存中,然後用'Jsoup.parse(字符串)'方法處理它,會發生什麼?此外,輸出在技術上是正確的。 –