2016-01-22 88 views
1
content='<p>Hello, the web site is <a href="https://www.google.com">Google</a></p>. <p>The search engine is <a href="https://www.baidu.com">Baidu</a></p>.' 
soup = BeautifulSoup(content, 'html.parser') 

現在我想在HREF的URL地址來替換整個<a> </a>。所以我想得到預期的結果:更換<a></a>與HREF在BeautifulSoup

Hello, the web site is https://www.google.com. The search engine is https://www.baidu.com. 

任何人都可以提供解決方案嗎?

+0

和問題是什麼?首先使用BS找到''並獲得'href'。 – furas

回答

1

首先找到a並獲得href那麼你可以添加href以前的兄弟和刪除a

from bs4 import BeautifulSoup 

content='<p>Hello, the web site is <a href="https://www.google.com">Google</a></p>. <p>The search engine is <a href="https://www.baidu.com">Baidu</a></p>.' 
soup = BeautifulSoup(content, 'html.parser') 

# find all `a` 
all_a = soup.findAll('a') 

for a in all_a: 
    # find `href` in `a` 
    href = a['href'] 

    #print('--- before ---') 
    #print(soup) 

    # add `href` to `previousSibling` 
    a.previousSibling.replaceWith(a.previousSibling + href) 

    # remove `a` 
    a.extract() 

    #print('--- after ---') 
    #print(soup) 

print(soup) 

'<p>Hello, the web site is https://www.google.com</p>. <p>The search engine is https://www.baidu.com</p>.'