2017-10-09 84 views
1

創建基於Spring MVC寧靜的控制器,它採用硬編碼的RSS HTTP URL和XML並將其轉換爲JSON:使用Java轉換RSS訂閱XML到JSON是顯示特殊字符

RssFeedController:

import java.io.IOException; 
import java.io.InputStream; 
import java.net.HttpURLConnection; 
import java.net.MalformedURLException; 
import java.net.URL; 
import java.net.URLConnection; 

import org.apache.commons.io.IOUtils; 
import org.apache.log4j.Logger; 
import org.json.JSONObject; 
import org.json.XML; 

import com.fasterxml.jackson.databind.ObjectMapper; 

@RestController 
public class RssFeedController { 

    private HttpHeaders headers = null; 

    public RssFeedController() { 
     headers = new HttpHeaders(); 
     headers.add("Content-Type", "application/json"); 
    } 

    @RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json") 
    public String getRssFeedAsJson() throws IOException { 
     InputStream xml = getInputStreamForURLData("http://www.samplefeed.com/feed"); 
     String xmlString = IOUtils.toString(xml); 
     JSONObject jsonObject = XML.toJSONObject(xmlString); 
     ObjectMapper objectMapper = new ObjectMapper(); 
     Object json = objectMapper.readValue(jsonObject.toString(), Object.class); 
     String response = objectMapper.writeValueAsString(json); 
     return response; 
    } 

    public static InputStream getInputStreamForURLData(String targetUrl) { 
     URL url = null; 
     HttpURLConnection httpConnection = null; 
     InputStream content = null; 

     try { 
      url = new URL(targetUrl); 
      URLConnection conn = url.openConnection(); 
      conn.setRequestProperty("User-Agent", "Mozilla/5.0"); 
      httpConnection = (HttpURLConnection) conn; 
      int responseCode = httpConnection.getResponseCode(); 
      content = (InputStream) httpConnection.getInputStream(); 
     } 
     catch (MalformedURLException e) { 
      e.printStackTrace(); 
     } 
     catch (IOException e) { 
      e.printStackTrace(); 
     } 
     return content; 
    } 

pom.xml的

<dependency> 
     <groupId>org.json</groupId> 
     <artifactId>json</artifactId> 
     <version>20170516</version> 
    </dependency> 

    <dependency> 
     <groupId>commons-io</groupId> 
     <artifactId>commons-io</artifactId> 
     <version>2.5</version> 
    </dependency> 

所以,原來的RSS源有以下內容:

<item> 
    <title>October Fest Weekend</title> 
    <link>http://www.samplefeed.com/feed/OctoberFestWeekend</link> 
    <comments>http://www.samplefeed.com/feed/OctoberFestWeekend/#comments</comments> 
    <pubDate>Wed, 04 Oct 2017 17:08:48 +0000</pubDate> 
    <dc:creator><![CDATA[John Doe]]></dc:creator> 
      <category><![CDATA[Uncategorized]]></category> 

    <guid isPermaLink="false">http://www.samplefeed.com/feed/?p=9227</guid> 
    <description><![CDATA[<p> 
</p> 
<p>Doors Open:6:30pm<br /> 
Show Begins: 7:30pm<br /> 
Show Ends (Estimated time): 11:00pm<br /> 
Location: Staples Center</p> 
<p>Directions</p> 
<p>Map of ...</p> 
<p>The post <a rel="nofollow" href="http://www.samplefeed.com/feed/OctoberFestWeekend/">OctoberFest Weekend</a> appeared first on <a rel="nofollow" href="http://www.samplefeed.com">SampleFeed</a>.</p> 
]]></description> 

這使得成JSON這樣的:

{ 
    "guid": { 
     "content": "http://www.samplefeed.com/feed/?p=9227", 
     "isPermaLink": false 
    }, 
    "pubDate": "Wed, 04 Oct 2017 17:08:48 +0000", 
    "category": "Uncategorized", 
    "title": "October Fest Weekend", 
    "description": "<p>\n??</p>\n<p>Doors Open:6:30pm<br />\nShow Begins:?? 7:30pm<br />\nShow Ends (Estimated time):??11:00pm<br />\nLocation: Staples Center</p>\n<p>Directions</p>\n<p>Map of ...</p>\n<p>The post <a rel=\"nofollow\" href=\"http://www.samplefeed.com/feed/OctoberFestWeekend/\">OctoberFest Weekend</a> appeared first on <a rel=\"nofollow\" href=\"http://www.samplefeed.com\">Sample Feed</a>.</p>\n", 
    "dc:creator": "John Doe", 
    "link": "http://www.samplefeed.com/feed/OctoberFestWeekend", 
    "comments": "http://www.samplefeed.com/feed/OctoberFestWeekend/#comments" 
} 

請在所呈現的JSON注意到有兩個問號(「?」)之後像這樣的「說明」鍵的值內:

"description": "<p>\n??</p>\n 

此外,還有在這裏兩個問號演出開始後:

<br />\nShow Begins:?? 

還在晚上11點之前。

Show Ends (Estimated time):??11:00pm<br /> 

這不是唯一的顯示特殊字符的模式,還有地方有三個?生成的標記和一些地方像?????

例如

<title>Today’s 20th Annual Karaoke</title> 

呈現像這樣JSON:

"title": "Today???s 20th Annual Karaoke" 

而且

<content-encoded>: <![CDATA[(Monte Vista High School, NY.). </span></p>]]></content:encoded> 

呈現像這樣JSON:

"content:encoded": "(Monte Vista High School, NY.).????</span></p> 

還有的地方對XML有類似的地方短跑(「 - 」):

<strong>Welcome</strong> – Welcome to the Party! 

它得到渲染JSON:

<strong>Welcome</strong>????? Welcome to the Party! 

有誰知道如何設置正確的編碼在我的代碼,所以我能避免這些壞/特殊字符呈現問題?

回答

0

得到這樣擺脫未知字符(580)爲:

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8") 
public String getRssFeedAsJson() throws IOException, IllegalArgumentException { 
    String xmlString = readUrlToString("http://www.sample.com/feed"); 
    JSONObject xmlJSONObj = XML.toJSONObject(xmlString); 
    byte[] ptext = xmlJSONObj.toString().getBytes(ISO_8859_1); 
    String jsonResponse = new String(ptext, UTF_8); 
    return jsonResponse; 
} 

public static String readUrlToString(String url) { 
    BufferedReader reader = null; 
    String result = null; 
    String retValue = null; 
    try { 
     URL u = new URL(url); 
     HttpURLConnection conn = (HttpURLConnection) u.openConnection(); 
     conn.setRequestProperty("User-Agent", "Mozilla/5.0"); 
     conn.setRequestMethod("GET"); 
     conn.setDoOutput(true); 
     conn.setReadTimeout(2 * 1000); 
     conn.connect(); 
     reader = new BufferedReader(new InputStreamReader(conn.getInputStream())); 
     StringBuilder builder = new StringBuilder(); 
     String line; 
     while ((line = reader.readLine()) != null) { 
      builder.append(line).append("\n"); 
     } 
     result = builder.toString(); 
     retValue = result.replaceAll("[^\\x00-\\x7F]", ""); 
    } 
    catch (IOException e) { 
     e.printStackTrace(); 
    } 
    finally { 
     if (reader != null) { 
      try { 
       reader.close(); 
      } 
      catch (IOException ignoreOnClose) { 
      } 
     } 
    } 
    return retValue; 
} 

令人沮喪,沒有人比其他SamDev試圖幫助...

0

使用Java正顯示特殊字符

審查一行代碼行我得到了解決後,我更新我的回答你的問題 特殊字符轉換RSS訂閱XML到JSON回覆爲

如果更新這行代碼

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json") 

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8") 

你需要指定UTF-8 charset編碼產生JSON的參數值的方式。我很抱歉以前的誤解答案,但是我現在更新它。

+0

我很困惑...我不是使用emojis - 我將如何在我的代碼中使用這些庫中的一個?你能爲我提供一個例子嗎?還有什麼其他編碼或字符集可以防止這種情況?感謝您的迴應... –

+0

Unicode不僅適用於emojis,當系統無法呈現流時,它可以是任何東西,因此它顯示**?**。如果你想檢查內部**?**使用這個Java庫之一併解析它,在運行時 – 2017-10-10 00:13:44

+0

你可以請給我看一些代碼使用其中之一? –