2017-05-26 81 views
0

使用Beautifulsoup,我想找到<a><p>封閉,並用<p>它括包裝他們,但我不知道該怎麼辦呢如何找到不是由特定標籤環繞標籤與標籤

<p><a href="example1.com">example1.com</a></p> 
<p><a href="example2.com">example2.com</a></p> 
<a href="example3.com">example3.com</a> 
<p><a href="example3.com">example3.com</a></p> 

我想改變HTML如上

<p><a href="example1.com">example1.com</a></p> 
<p><a href="example2.com">example2.com</a></p> 
<p><a href="example3.com">example3.com</a></p> <-here 
<p><a href="example3.com">example3.com</a></p> 
+0

你嘗試過什麼?你的代碼? –

回答

2

你需要使用css selectorwrap他們每個人的選擇那些美女主播與p標籤

In [2]: from bs4 import BeautifulSoup as BS 

In [3]: html = """<p><a href="example1.com">example1.com</a></p> 
    ...: <p><a href="example2.com">example2.com</a></p> 
    ...: <a href="example3.com">example3.com</a> 
    ...: <p><a href="example3.com">example3.com</a></p>""" 

In [4]: soup = BS(html, "html.parser") 

In [5]: for a in soup.select("p ~ a"): 
    ...:  a.wrap(soup.new_tag("p")) 
    ...:  

In [6]: soup 
Out[6]: 
<p><a href="example1.com">example1.com</a></p> 
<p><a href="example2.com">example2.com</a></p> 
<p><a href="example3.com">example3.com</a></p> 
<p><a href="example3.com">example3.com</a></p> 
1
soup = BeautifulSoup(...) 
items = soup.find_all('a') 
for item in items: 
    if item.parent.name != u'p': 
     item.wrap(soup.new_tag('p')) 
0

試試這個:

from bs4 import BeautifulSoup 

    data = """ 
    <p><a href="example1.com">example1.com</a></p> 
    <p><a href="example2.com">example2.com</a></p> 
    <a href="example3.com">example3.com</a> 
    <p><a href="example3.com">example3.com</a></p> 
    """ 


    soup = BeautifulSoup(data, 'html.parser') 
    for a in soup('a'): # shortcut for soup.find_all('p') 

     if a.parent.name != 'p' : 
      a.wrap(soup.new_tag("p")) 
    print soup