如何刪除XML聲明使用BeautifulSoup4

我有一個結構類似這樣的XHTML文件：我使用BeautifulSoup如何刪除XML聲明使用BeautifulSoup4

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html> 
<html lang="en"> 
<head> 
... 
</head> 
<body> 
... 
</body> 
<html>

，我想從文件中刪除XML聲明，所以我看起來像這樣：

<!DOCTYPE html> 
<html lang="en"> 
<head> 
... 
</head> 
<body> 
... 
</body> 
<html>

我找不到一種方法來獲取XML聲明以將其刪除。據我所知，它似乎不是Doctype，聲明，標記或NavigableString。有沒有一種方法可以找到它來提取它？

作爲工作的例子，我可以用這樣的代碼刪除文檔類型（假設該文件的文本是變量「HTML」）：

soup = BeautifulSoup(html) 
[item.extract() for item in soup.contents if isinstance(item, Doctype)]

來源

2015-10-19 Jason Champion

你可以用下面的辦法：

import bs4 
from bs4 import BeautifulSoup 
soup = BeautifulSoup(html, 'html.parser') 

for e in soup: 
    if isinstance(e, bs4.element.ProcessingInstruction): 
     e.extract() 
     break

來源

2015-10-19 06:25:32

完美，謝謝。 :) –

如何刪除XML聲明使用BeautifulSoup4

回答

相關問題