0
我想從this鏈接中獲取新聞文章。我的代碼是:提取文本<p></p>與BeautifulSoup
def get_news_details(news_url):
source = requests.get(news_url)
plain_text = source.text
soup = BeautifulSoup(plain_text, "html.parser")
content = soup.findAll('div', {'class' : 'big-img-box'})
print(content[0].findAll('p'))
結果表明:
[<p></p>, <p></p>, <p></p>, <p></p>, <p></p>, <p></p>]
和content
值:
<div class="big-img-box">
<div class="left-imgs">
<figure>
<img alt="iOS developer hints possibility of 4K Apple TV" class="img-responsive" src="http://www.aninews.in/contentimages/detail/appletv.jpg"/>
<figcaption><span class="heading-inner-span"></span></figcaption>
</figure>
<div class="mb10"></div>
</div>
<p></p> New York [USA], August 6 <a class="highlights" href="http://aninews.in/" target="_blank">(ANI)</a>: The latest designs from Apple's HomePod firmware revealed that the tech giant is hinting the launch of a <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/4k-apple-tv.html"> 4K Apple TV</a></span> with high dynamic range (HDR) support for both <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/hdr10.html"> HDR10 </a></span> and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/dolby-vision.html"> Dolby Vision</a></span>.<p></p> While the current range of Apple's TV set-top box is incompatible to 4K technology, <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/ios.html">iOS</a></span> developer <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/guilherme-rambo.html"> Guilherme Rambo</a></span> revealed that the company is hinting an adoption of the ultra high-definition format, reports <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/the-verge.html">The Verge</a></span>.<p></p> Reports of the new range of Apple TV have surfaced time and again over the past few months, starting February this year.<p></p> It is said that implementing the HDR and 4K content will prove to b beneficial for the company, rather than a simpler resolution, since popular online movie and television platforms like <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/netflix.html"> Netflix</a></span> and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/amazon.html"> Amazon</a></span> support the two high-definition formats.<p></p> Last month, iTunes started listing movies as supporting 4K and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/hdr.html"> HDR</a></span> in users' purchase histories, thus providing more thrust to the speculations of the 4K <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/apple.html"> Apple</a></span> TV. <a class="highlights" href="http://aninews.in/" target="_blank">(ANI)</a><p></p>
</div>
我可以content[0].text
但我得到的文章的有些笨拙版本無法格式化它。
在檢查鉻的網頁時,文章似乎寫在<p>article_text</p>
標籤裏面。而在content
中,它顯示爲<p></p>article_text
標籤。如果前版本出現在soup
,我可以得到我想要的輸出。應該做什麼 ?
這適用於我(我的意思是「整理」,謝謝澄清)。但我想知道爲什麼Chrome的頁面檢查('
文本
')和BeautifulSoup的版本('文本')有什麼區別? – Aroonalok我不確定。但是,我會說,當瀏覽器軟件或BeautifulSoup遇到一個未經過編碼以符合其標準的頁面時,它必須對該代碼執行某些操作才能顯示它。 Chrome的設計師在遇到問題時可能朝着一個方向發展,而BeautifulSoup的另一個方向。這種情況下的結果有點不同。 –
@BillBell嘿比爾我只是想向你展示對這個StackOverflow標籤的良好支持以及對社區的支持,感謝你,你是一個很好的人。祝你一切順利,我只是想讓你知道我們如何感謝你的幫助。 –