解析UTF-8編碼的XML文件

我有一個XML文件，其中包含一些從URL中檢索到的阿拉伯字符，所以我必須使用UTF-8對其進行編碼，以便它可以處理這些字符。解析UTF-8編碼的XML文件

XML文件：

<Entry> 

    <lstItems>    
      <item> 
     <id>1</id> 
      <title>News Test 1</title> 
      <subtitle>16/7/2012</subtitle> 
     <img>joelle.mobi-mind.com/imgs/news1.jpg</img> 
      </item> 
      <item> 
     <id>2</id> 
      <title>كريم</title> 
      <subtitle>16/7/2012</subtitle> 
     <img>joelle.mobi-mind.com/imgs/news2.jpg</img> 
      </item> 
      <item> 
     <id>3</id> 
      <title>News Test 333</title> 
      <subtitle>16/7/2012</subtitle> 
     <img>joelle.mobi-mind.com/imgs/news3.jpg</img> 
      </item> 
      <item> 
     <id>4</id> 
      <title>ربيع</title> 
      <subtitle>16/7/2012</subtitle> 
     <img>joelle.mobi-mind.com/imgs/cont20.jpg</img> 
      </item> 
      <item> 
     <id>5</id> 
      <title>News Test 55555</title> 
      <subtitle>16/7/2012</subtitle> 
     <img>joelle.mobi-mind.com/imgs/cont21.jpg</img> 
      </item>  
      <item> 
     <id>6</id> 
      <title>News Test 666666</title> 
      <subtitle>16/7/2012</subtitle> 
     <img>joelle.mobi-mind.com/imgs/cont22.jpg</img> 
      </item>    
    </lstItems> 
    </Entry>

我解析從URL作爲字符串檢索XML，如下圖所示：

public String getXmlFromUrl(String url) { 

    try { 
     return new AsyncTask<String, Void, String>() { 
      @Override 
      protected String doInBackground(String... params) { 
       //String xml = null; 
       try { 
        DefaultHttpClient httpClient = new DefaultHttpClient(); 
        HttpGet httpPost = new HttpGet(params[0]); 
        HttpResponse httpResponse = httpClient.execute(httpPost); 
        HttpEntity httpEntity = httpResponse.getEntity(); 
        xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8"); 


       } catch (Exception e) { 
        e.printStackTrace(); 
       } 
       return xml; 




      } 
     }.execute(url).get(); 
    } catch (InterruptedException e) { 
     // TODO Auto-generated catch block 
     e.printStackTrace(); 
    } catch (ExecutionException e) { 
     // TODO Auto-generated catch block 
     e.printStackTrace(); 
    } 
    return xml; 
}

現在返回的字符串傳遞給此方法來獲取供日後使用的文件如下圖所示：

此消息ocured錯誤：

09-18 07:51:40.441: E/Error:(1210): Unexpected token (position:TEXT ï»¿@1:4 in [email protected])

因此，代碼崩潰，我有以下錯誤

09-18 07:51:40.451: E/AndroidRuntime(1210): java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.university1/com.example.university1.MainActivity}: java.lang.NullPointerException

上面顯示請注意，該代碼工作正常使用ISO編碼。

來源

2012-09-18 Karim M. El Tel

Upvoted僅僅因爲有兩個沒有評論的評論（沒有任何明顯的理由，這個問題是合理的）。 – bortzmeyer

我也這麼做了。 – Neta

您已在您的UTF-8文件中添加了BOM。哪個不好。

也許你用記事本編輯了你的文件，或者你應該檢查你的編輯器以確保它不會添加BOM。

由於BOM似乎在文本內部而不是在開始時，您還需要通過在其位置周圍使用刪除鍵（它在大多數編輯器中不可見）將其刪除。這可能發生在文件連接操作過程中。

來源

2012-09-18 08:21:38

很抱歉，爲什麼BOM不好？檢測UTF-8編碼是否有用？ – RvdK

它被創建用來檢測其他Unicode格式的字節順序，但是被微軟擴展爲UTF-8，它沒有任何意義。例如，當你連接文件時，它會產生奇怪的錯誤。 –

那麼你建議我做什麼@dystroy –

這可能不是問題，但EntityUtils.toString(httpEntity).getBytes()正在使用默認平臺編碼。您應該使用EntityUtils.toString(httpEntity)作爲String，不需要將其轉換爲字節。

此外，請閱讀此http://kunststube.net/encoding/瞭解發生了什麼事情的有用背景。

來源

2012-09-18 08:24:35 artbristol

Thanx的幫助，但它不是這裏的問題 –

@Karim'new String（EntityUtils.toString（httpEntity）.getBytes（），「UTF-8」）;如果您的平臺編碼不是UTF-8 – artbristol

，那麼將會混淆您的字符，這些字符很奇怪，而不是XML中的阿拉伯字符！ –

解析UTF-8編碼的XML文件

回答

相關問題