2015-07-21 57 views
7

我想刮一個帶有BeautifulSoup 4.4.0的標籤名稱在camelCase和find_all似乎無法找到它們的XML文件。示例代碼:find_all with CamelCase標籤名稱與BeautifulSoup 4

from bs4 import BeautifulSoup 

xml = """ 
<hello> 
    world 
</hello> 
""" 
soup = BeautifulSoup(xml, "lxml") 

for x in soup.find_all("hello"): 
    print x 

xml2 = """ 
<helloWorld> 
    :-) 
</helloWorld> 
""" 
soup = BeautifulSoup(xml2, "lxml") 

for x in soup.find_all("helloWorld"): 
    print x 

我得到的輸出是:

$ python soup_test.py 
<hello> 
    world 
</hello> 

什麼是查找駱駝套管/大寫的標籤名稱的正確方法是什麼?

回答

6

對於使用BeautifulSoup的任何區分大小寫的解析,您都希望在"xml"模式下進行解析。默認模式(解析HTML)不關心大小寫,因爲HTML不關心大小寫。在你的情況,而不是使用"lxml"模式,切換到"xml"

from bs4 import BeautifulSoup 

xml2 = """ 
<helloWorld> 
    :-) 
</helloWorld> 
""" 
soup = BeautifulSoup(xml2, "xml") 

for x in soup.find_all("helloWorld"): 
    print x