如何獲得文字和某些標籤

鑑於像如何獲得文字和某些標籤

"<p> >this line starts with an arrow <br /> this line does not </p>"

或

"<p> >this line starts with an arrow </p> <p> this line does not </p>"

字符串我怎樣才能找到帶箭頭開始和一個div

包圍他們的字裏行間替換文本

，使之成爲：

"<p> <div> >this line starts with an arrow </div> <br /> this line does not </p>

來源

2014-06-24 madprops

你如何定義「行」？ –

與[@ alexcxe's]（http://stackoverflow.com/a/24391725/2461379）一起回答，因爲，呃... [我會在這裏留下這個...]（http://stackoverflow.com/ a/1732454/2461379） –

因爲它是一個HTML你解析，使用工具 - HTML解析r，如BeautifulSoup。

使用find_all()地發現，與>開始，wrap()它們與新的div標籤的所有文本節點：

from bs4 import BeautifulSoup 

data = "<p> >this line starts with an arrow <br /> this line does not </p>" 

soup = BeautifulSoup(data) 
for item in soup.find_all(text=lambda x: x.strip().startswith('>')): 
    item.wrap(soup.new_tag('div')) 

print soup.prettify()

打印：

<p> 
    <div> 
    >this line starts with an arrow 
    </div> 
    <br/> 
    this line does not 
</p>

來源

2014-06-24 16:25:18 alecxe

+1。當我點擊這個問題時，我真的擔心會有一些關於正則表達式的東西...... –

你可以用>\s+(>.*?)<嘗試正則表達式模式。

import re 
regex = re.compile("\\>\\s{1,}(\\>.{0,}?)\\<") 
testString = "" # fill this in 
matchArray = regex.findall(testString) 
# the matchArray variable contains the list of matches

並用<div> matched_group </div>替換匹配的組。這裏模式尋找> >和<內的任何內容。

這裏是debuggex

來源

2014-06-24 16:19:19 Braj

你可以試試這個表達式演示，

>(\w[^<]*)

DEMO

Python代碼會是這樣，

>>> import re 
>>> str = '"<p> >this line starts with an arrow <br /> this line does not </p>"' 
>>> m = re.sub(r'>(\w[^<]*)', r"<div> >\1</div> ", str) 
>>> m 
'"<p> <div> >this line starts with an arrow </div> <br /> this line does not </p>"'

來源

2014-06-24 16:22:04

如何獲得文字和某些標籤

回答

相關問題