我想從這個飼料解析RSS數據:http://fulltextrssfeed.com/feeds.bbci.co.uk/news/rss.xml,這是使用FullTextRssFeed現場使用產生的。唯一的問題是,當我嘗試獲得描述時,我收到'<',其他一切都正常!我已經試過到使用JSoup這一點,但我不知道怎麼樣。你能建議如何? 我使用的代碼是一樣的,在this tutorial使用,但我已經取代使用的RSS URL。再次感謝! RSS訂閱描述回報「<」
回答
在尋找關於如何做到這一點的想法在網上,我發現,這樣做實際上是illegal,因爲它讓內容的這種方法違反了使用很多的網絡資源我希望使用的條款。現在你將不得不堅持使用簡短的RSS源。
你的問題是因爲你的RSS提要裏面的描述包含html而不是純文本。下面是描述內容:
<div><span class="story-date"><span class="date">3 April 2013</span> <span class="time-text">Last updated at</span> <span class="time">23:25 ET</span></span> <p><img src="http://news.bbcimg.co.uk/media/images/66739000/jpg/_66739180_philpotts.jpg" width="464" height="261" alt="Mick and Mairead Philpott, Paul Mosley"/><span class="c2">Mick and Mairead Philpott, and Paul Mosley, will be sentenced on Thursday</span></p> <p class="introduction" id="story_continues_1">A couple convicted of killing six of their children in a house fire in Derby are due to be sentenced later.</p> <p>Mick and Mairead Philpott will reappear at Nottingham Crown Court where they were found guilty of six counts of manslaughter, along with their friend Paul Mosley, on Tuesday.</p> <p>The maximum sentence for the crime is life imprisonment.</p> <p>Mrs Justice Thirlwall was due to pass sentence on Wednesday but needed more time to consider mitigation.</p> <p>The court was told that Philpott, 56, was jailed for seven years in 1978 for attempting to murder a previous girlfriend and given a concurrent five-year sentence for stabbing the woman's mother.</p> <p>In 1991 he received a conditional discharge for assault after he head-butted a colleague</p> <p>And in 2010 he was given a police caution after slapping Mairead and dragging her outside by her hair.</p> <p>When Philpott set fire to his house in Victory Road, Derby, he was also facing trial over a road rage incident in which he punched a motorist in the face.</p> <p>He had admitted common assault in relation to the incident but denied dangerous driving.</p> <span class="cross-head">Rape allegation</span> <p>Police have also confirmed that they intend to "thoroughly" investigate an allegation that Philpott raped a woman several years ago.</p> <p>She made the allegation after the death of Philpott's children, but police decided to wait until the end of the manslaughter trial before investigating the complaint further.</p> <p>On Tuesday the jury returned unanimous manslaughter verdicts on Philpott and Mosley, 46, while Mairead Philpott, 32, was convicted by a majority.</p> <p>Jade Philpott, 10, John, nine, Jack, eight, Jesse, six, and Jayden, five, died on the morning of the fire on 11 May 2012.</p> <p>Mairead Philpott's son from a previous relationship, 13-year-old Duwayne, died later in hospital.</p> </div><img src="http://pixel.quantserve.com/pixel/p-89EKCgBk8MZdE.gif" border="0" height="1" width="1" />
你需要改變一些方式,它可以忽略的是描述裏面的html內容內的解析器。一旦你得到完整的html代碼片段,你可以在WebView中渲染它。我認爲通常CDATA是在XML數據(如RSS提要)內存在其他類型的XML內容(本例中爲HTML)時使用的。老實說,雖然我不熟悉它的來龍去脈,但我可能是不正確的。
你對[CDATA](http://www.w3schools.com/xml/xml_cdata.asp)部分是對的。 – 2013-04-12 06:44:51
你myRssFeed.getDescription()
得到的HTML看起來是這樣的:
<div><span class="story-date"><span class="date">6 April 2013</span> <span class="time-text">Last updated at</span> <span class="time">08:57 ET</span></span> <p><img src="http://news.bbcimg.co.uk/media/images/51606000/jpg/_51606573_fa1d16c0-9c6c-4f82-b0b8-ab66ddd94f78.jpg" width="304" height="171" alt="Breaking news"/></p> <p class="introduction">Nelson Mandela has been discharged from hospital after treatment for pneumonia, South Africa's government has said.</p> <p>It said there had been "a sustained and gradual improvement in his condition".</p> <p>The 94-year-old was admitted on 27 March for a recurring lung infection and had fluid drained at the undisclosed hospital.</p> <p>Mr Mandela served as South Africa's first black president from 1994 to 1999 and is regarded by many as the father of the nation.</p> <p>The <a href="http://redirect.viglink.com?key=11fe087258b6fc0532a5ccfc924805c0&u=http%3A%2F%2Fwww.thepresidency.gov.za%2Fpebble.asp%3Frelid%3D15178">presidency statement read</a>: "Former President Nelson Mandela has been discharged from hospital today, 6 April, following a sustained and gradual improvement in his general condition.</p> <p>"The former president will now receive home-based high care. President [Jacob] Zuma thanks the hard working medical team and hospital staff for looking after Madiba so efficiently."</p> <p>Madiba is Mr Mandela's clan name.</p> <p>The statement continued: "[Mr Zuma] also extended his gratitude to all South Africans and friends of the Republic in Africa and around the world for support."</p> </div><img src="http://pixel.quantserve.com/pixel/p-89EKCgBk8MZdE.gif" border="0" height="1" width="1" />
使用Jsoup你可以試試這個(未經測試):
而不是
feedDescribtion.setText(myRssFeed.getDescription());
使用這樣的:
feedDescribtion.setText(extractDescriptionText(myRssFeed.getDescription());
用以下方法:
private String extractDescriptionText(String description) {
StringBuffer b = new StringBuffer();
Document dom = Jsoup.parse(description);
Elements paragraphs = dom.getElementsByTag("p");
for (int i=1; i<paragraphs.size(); i++) { // start with 1 to skip the 'breaking news' paragraph
Element p = paragraphs.get(i);
b.append(p.text());
b.append("\n"); // line-break after each paragraph
}
return b.toString();
}
這應該有效。也許一些微調是必要的,但這可以通過Jsoup的幫助很容易地實現。
編輯:
這是extractDescriptionText()
給出了上面的例子:
納爾遜·曼德拉已經從醫院治療肺炎 出院後,南非政府已經說。它說有 「他的病情持續和逐漸改善」。該 94歲考入3月27日爲一個反覆出現的肺部感染 ,並在流體未公開的醫院倒掉。曼德拉先生擔任 成爲南非第一位黑人總統1994年至1999年,是 被許多人視爲民族的父親認爲。總統聲明 的內容如下:「前總統納爾遜曼德拉已於4月6日從 醫院出院,繼續改善 。」前總統現在將獲得 家庭護理。總統[雅各布]祖馬感謝勤奮工作的 醫療隊和醫院的工作人員照顧麥迪巴,所以 有效。「麥迪巴是曼德拉先生的氏族名稱。聲明 繼續說:「[祖馬先生]還向非洲和非洲共和國的朋友以及全世界的 的南非 表示感謝,以獲得支持。」
你試過這個嗎?我在假期時遠離機器,所以我無法爲自己嘗試這個。 – AndroidDev 2013-04-06 19:22:47
不,它沒有經過測試,但我之前和Jsoup一起工作過,我很確定它會起作用。如上所述,可能需要進行一些微調,例如上面的例子中有一個嵌入式鏈接,我不確定Element#text()方法如何處理它。 – Ridcully 2013-04-06 19:37:08
好的,這將是大約6/5天,直到我能夠測試此代碼,並希望能夠接受並獎勵這個答案,謝謝! – AndroidDev 2013-04-06 21:10:23
我會評論,但我沒有足夠的分數。
我會建議使用雅虎管道重定向您的rss提要。你甚至可以選擇它重定向爲json而不是xml。
如果您的解析器正在大多數網站確定你去過這將解決您的問題最簡單的方法。
- 1. 解析RSS訂閱源描述時返回奇數字符
- 2. Django1.2 RSS訂閱的描述和日期 - 谷歌閱讀器
- 3. 翻閱RSS訂閱
- 4. RSS訂閱LINQ
- 5. 訂閱RSS源
- 6. ASP.Net RSS訂閱
- 7. 閱讀vb.net中的RSS訂閱源,rss訂閱源是php
- 8. Django網站 - RSS訂閱 - 保持獲取屬性錯誤的描述
- 9. RSS訂閱WordPress(.COM)
- 10. Oracle 11g RSS訂閱
- 11. 過濾RSS訂閱
- 12. Actionscript 3.0 RSS訂閱
- 13. 管理RSS訂閱
- 14. 用PHP RSS訂閱
- 15. RSS訂閱解析
- 16. 點擊一個DIV訂閱RSS訂閱
- 17. 如何清潔rss描述
- 18. 從RSS獲取URL描述
- 19. 如何「清理」rss描述?
- 20. NSDateFormatter DateFromString從RSS訂閱返回(NULL)
- 21. WordPress的RSS訂閱返回404
- 22. RSS閱讀器沒有得到完整描述
- 23. Android RSS閱讀描述數據讀取錯誤
- 24. RSS訂閱RSS源中的HTML代碼
- 25. 建立一個RSS訂閱
- 26. RSS訂閱怪異字符
- 27. Facebook RSS訂閱大圖
- 28. RSS訂閱麻煩w/Linq
- 29. XML RSS訂閱JS問題
- 30. Google Play Store的RSS訂閱
「我使用的代碼與本教程中的代碼相同」。這是在我的問題的後半部分提到的。 – AndroidDev 2013-04-04 16:02:31
我的錯誤,我以爲你說*你*使用jsoup,而不是你*嘗試*使用jsoup。無論如何,如果您將url指向rss feed而不是您的rss feed,它會正常工作嗎? – FoamyGuy 2013-04-04 16:04:11
試試這個[link](http://www.ibm.com/developerworks/opensource/library/x-android/)我用這個例子來得到RSS源,它們工作正常。 – 2013-04-04 16:06:18