2016-10-25 47 views
1

無標籤元素考慮以下html片段:獲取Beautifulsoup

<div class="mapCopy"> 
    <b> 
     <a href="someurl.com"> 
      URL Text 
     </a> 
    </b> 
    <br/> 
     Address Line 1 
    <br/> 
     Address Line 2 
    <br/> 
     City, State, Zip 
    <p> 
     Phone: (123) 456-7890 
    <br/> 
     Fax: (123) 456-7890 
    </p> 
</div> 

我怎麼可能只提取1地址線,地址線2,城市,州和郵編?我相信我應該能夠迭代div並排除任何具有<b>標記的元素,但我不確定必要的語法。

回答

0

您可以提取不包含標籤<div>的所有兒童:

>>> S = BeautifulSoup("<div...") 
>>> [child.strip() for child in S.find('div').children 
...  if "<" not in str(child) 
...  and len(child) > 1 
... ] 
['Address Line 1', 'Address Line 2', 'City, State, Zip']