我怎樣才能使用BeautifulSoup來覆蓋一個單詞的跨度？

我需要使用BeautifulSoup將單詞列表轉換爲跨度。我怎樣才能使用BeautifulSoup來覆蓋一個單詞的跨度？

例如

<html><body>word-one word-two word-one</body></html>

必須

<html><body><span>word-one</span> word-two <span>word-one</span></body></html>

其中word-one需要被移動到一個跨度

到目前爲止我能找到這些元素使用：

for html_element in soup(text=re.compile('word-one')): 
    print(html_element)

但是將這些文本替換爲範圍並不清楚。

來源

2017-05-13 Nishant

你還在使用['lxml']（http://lxml.de/）嗎？請參閱[另一個元素之後的python lxml append元素]（https://stackoverflow.com/questions/7474972/python-lxml-append-element-after-another-element） –

沒有隻是嘗試BS，因爲我發現它更容易 – Nishant

我已經做了這樣的事情，其中變量html是您的代碼<html><body>word-one word-two word-one</body></html>，我分開了文本和代碼，然後將它們添加在一起。

soup = BeautifulSoup(html,'html.parser') 
text = soup.text # Only the text from the soup 

soup.body.clear() #Clear the text between the body tags 

new_text = text.split() # Split beacuse of the spaces much easier 

for i in new_text: 
    new_tag = soup.new_tag('span') #Create a new tag 
    new_tag.append(i) #Append i to it (from the list that's split between spaces) 
    #example new_tag('a') when we append 'word' to it it will look like <a>word</a> 
    soup.body.append(new_tag) #Append the whole tag e.g. <span>one-word</span)

我們也可以用正則表達式來匹配某個詞。

soup = BeautifulSoup(html, 'html.parser') 
text = soup.text # Only the text from the soup 

soup.body.clear() # Clear the text between the body tags 

theword = re.search(r'\w+', text) # Match any word in text 
begining, end = theword.start(), theword.end() 

soup.body.append(text[:begining]) # We add the text before the match 

new_tag = soup.new_tag('span') # Create a new tag 

new_tag.append(text[begining:end]) 
# We add the word that we matched in between the new tag 
soup.body.append(new_tag) # We append the whole text including the tag 
soup.body.append(text[end:]) # Append everything that's left

我確定我們可以用類似的方式使用.insert。

來源

2017-05-13 18:08:00

但如果你這樣做，你會不會使用像div等重要標籤？我的意思是我想跨越他們，但不打擾div或p或表的 – Nishant

我不太確定你的意思，你給了我一個HTML，我給你這樣做的方式。你可能是在一個真實的網站上做的，因此，你必須關閉父標籤並做同樣的事情。例如：a = soup.find（'p'），然後是a.div.clear，您將清除它之間的所有內容，我已經放置了評論，以便了解發生了什麼。如果它更容易，請嘗試理解我可以將這些代碼引薦給這些文檔的代碼。 –

想澄清。想象身體也有一個div，所以採取文本和清除不會工作不？無論如何，這一個很多的想法。 – Nishant

我怎樣才能使用BeautifulSoup來覆蓋一個單詞的跨度？

回答

相關問題