我無法用BeautifulSoup刮擦任何東西

我正在使用BeautifulSoup颳去一些網頁內容。我無法用BeautifulSoup刮擦任何東西

我學習這個例子的代碼，但我總是得到一個「無」的迴應。

代碼：

import urllib2 
from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup(urllib2.urlopen('http://www.velocidadcuchara.com/2011/08/helado-platano-light.html').read()) 

post = soup.find('div', attrs={'id': 'topmenucontainer'}) 

print post

知不知道林做錯了什麼？

謝謝！

來源

2011-08-18 Mateo

我不認爲你做錯了什麼。

這是混淆BeautifulSoup的第二個腳本標記。標籤是這樣的：

<script type='text/javascript'> 
<!--//--><![CDATA[//><!-- 
var arVersion = navigator.appVersion.split("MSIE") 
var version = parseFloat(arVersion[1]) 

function fixPNG(myImage) 
{ 
    if ((version >= 5.5) && (version < 7) && (document.body.filters)) 
    { 
     var imgID = (myImage.id) ? "id='" + myImage.id + "' " : "" 
     var imgClass = (myImage.className) ? "class='" + myImage.className + "' " : "" 
     var imgTitle = (myImage.title) ? 
        "title='" + myImage.title + "' " : "title='" + myImage.alt + "' " 
     var imgStyle = "display:inline-block;" + myImage.style.cssText 
     var strNewHTML = "<span " + imgID + imgClass + imgTitle 
        + " style=\"" + "width:" + myImage.width 
        + "px; height:" + myImage.height 
        + "px;" + imgStyle + ";" 
        + "filter:progid:DXImageTransform.Microsoft.AlphaImageLoader" 
        + "(src=\'" + myImage.src + "\', sizingMethod='scale');\"></span>" 
     myImage.outerHTML = strNewHTML  
    } 
} 
//--><!]]> 
</script>

但BeatifulSoup似乎認爲它仍然是在註釋或東西，包括文件的腳本標籤的內容的其餘部分。

嘗試：

print str(soup.findAll('script')[1])[:2000]

，你就會明白我的意思。

如果刪除CDATA，那麼你應該找到頁面正確分析：

soup = BeautifulSoup(
    urllib2.urlopen('http://www.velocidadcuchara.com/2011/08/helado-platano-light.html') 
    .read() 
    .replace('<![CDATA[', '').replace('<!]]>', ''))

來源

2011-08-18 12:17:34 Duncan

你的HTML有些奇怪。 BeautifulSoup盡力而爲，但有時卻無法解析它。

嘗試移動<head>中的第一個<link>元素，這可能有所幫助。

來源

2011-08-18 11:31:29

你可以嘗試LXML LIB使用。

lxml article

from lxml.html import parse 
doc = parse('http://java.sun.com').getroot() 
post = doc.cssselect('div#topmenucontainer')

來源

2011-12-21 15:20:10 jordiburgos

我無法用BeautifulSoup刮擦任何東西

回答

相關問題