3
解析與BS4此示例文件,從蟒蛇2.7.6:BeautifulSoup(BS4)解析錯誤
<html>
<body>
<p>HTML allows omitting P end-tags.
<p>Like that and this.
<p>And this, too.
<p>What happened?</p>
<p>And can we <p>nest a paragraph, too?</p></p>
</body>
</html>
使用:
from bs4 import BeautifulSoup as BS
...
tree = BS(fh)
HTML有,望穿秋水,允許省略結束標籤各種元素類型,包括P(檢查模式或解析器)。然而,BS4的美化()這份文件表明,它並沒有結束任何這些段落,直到它看到</BODY>:
<html>
<body>
<p>
HTML allows omitting P end-tags.
<p>
Like that and this.
<p>
And this, too.
<p>
What happened?
</p>
<p>
And can we
<p>
nest a paragraph, too?
</p>
</p>
</p>
</p>
</p>
</body>
這不是美化()的錯,因爲手動遍歷樹我得到同樣的結構:
<[document]>
<html>
␊
<body>
␊
<p>
HTML allows omitting P end-tags.␊␊
<p>
Like that and this.␊␊
<p>
And this, too.␊␊
<p>
What happened?
</p>
␊
<p>
And can we
<p>
nest a paragraph, too?
</p>
</p>
␊
</p>
</p>
</p>
</body>
␊
</html>
␊
</[document]>
現在,這將是XML正確的結果(至少到</BODY>,此時它應該報告WF錯誤)。但這不是XML。是什麼賦予了?