先行斷言具有多個值

我有以下文字：先行斷言具有多個值

[red] 

aaa [bbb] hello 

[blue] 

aaa 

[green] 

ccc

我想提取所有的章節標題的文本。我想，從一個特定的節頭從標題的列表匹配，直到另一頭前向斷言：

keys = ('red', 'blue', 'green') 
for key in keys: 
    match = re.search(r'\[' + key + r'\](.*)(?=(?:' + '|'.join(keys) + r'|$))', 
         text, flags=re.DOTALL) 

    print(key, match.group(1))

我失去了一些東西，雖然因爲它不匹配任何。有任何想法嗎？

來源

2017-04-07 mart1n

請問'。*？'而不是'。*'有幫助嗎？見https://regex101.com/r/1RZ2rF/1 –

最後，我決定不使用正則表達式匹配的部分內容：

# Walk through the file line by line and collect text from the specific sections 
keys = ('red', 'blue', 'green') 
last_section = '' 
for line in text.splitlines(): 
    if line.startswith('#'): 
     continue 

    match = re.match(r'^\[(' + '|'.join(keys) + ')\]', line) 
    if match: 
     last_section = match.group(1) 
     continue 

    if last_section: 
     new_contents[last_section] += '\n' + line 

for section in new_contents: 
    new_contents[section] = new_contents[section].strip()

來源

2017-04-07 11:58:13 mart1n

即使您以其他方式說，您正在使用正則表達式... re.match是pythons正則表達式操作的一部分。 –

對不起，我的意思是沒有正則表達式解析出部分的內容。 – mart1n

你可以正則表達式findall！您可以將您部分和它的值加在一起一樣，

>>> import re 
>>> print re.findall(r'\[(\w*)\]([\w \n]*)',text) 
[('red', '\n\naaa '), ('bbb', ' hello\n\n'), ('blue', '\n\naaa\n\n'), ('green', '')]

在這裏爲您節\[(\w*)\]和([\w \n]*)在您的部分內容。有了這個結果，你可以去掉或替換冗餘換行符！

希望它有幫助！

來源

2017-04-07 10:26:36

這也會使'[bbb]'成爲一個單獨的組。 –

也許因爲這種方法什麼這樣可以工作：

keys = ('red', 'blue', 'green') 

res = re.findall(r'\[\w+\].?|([\w\[\] ]+)', text) 
res = [x for x in res if x] 

for n in range(len(keys)): 
    print(keys[n], res[n])

結果：

('red', 'aaa [bbb] hello') 
('blue', 'aaa') 
('green', 'ccc')

例：

https://regex101.com/r/p55ckh/1

來源

2017-04-07 10:54:46

我只想找到基於定義的鍵的部分。所以如果有一個部分'[粉紅色]'，它應該被認爲是前面有效部分標題文本的一部分。 – mart1n

以這種方式或按組的方式有什麼不同？如果'[紅色]'是第1部分，那麼它將從第1部分獲得文本...向前確實不需要。你應該能夠按照你提到的方式來適應這個例子。關鍵是正則表達式如何捕捉我認爲的部分之間的內容。 –

這很重要，因爲如果我將'[pink] \ n \ nbbb'附加到示例文本並通過您的正則表達式運行它，它將不會將此字符串包含在'green'鍵的值中。 – mart1n

的字符串處理方法，無論在文字的鍵的順序。如果你不想使用正則表達式，希望它有幫助！

text = '[red]\naaa [bbb] hello\n[blue]\naaa\n[green]\nccc' 

# keys = ('red', 'blue', 'green') 
# keys = ('blue', 'red', 'green') 
# keys = ('green', 'red', 'blue') 
keys = ('green', 'blue', 'red') 
# store key and index of key tuple 
index_key_tuples = [] 

for key in keys: 
    index = text.find('[' + key + ']') 
    if index != -1: 
     index_key_tuples.append((index, key)) 
# sort the index key tuple 
index_key_tuples.sort() 

i = 0 
size = len(index_key_tuples) 
while i < size - 1: 
    # start index of content of key 
    item = index_key_tuples[i] 
    key = item[1] 
    start_index = item[0] + len(key) + 2 # 2 is for square bracket 
    # end index of content of key 
    next_item = index_key_tuples[i + 1] 
    end_index = next_item[0] 
    # content of key 
    key_content = text[start_index:end_index].strip() 
    print(key, key_content) 
    i += 1 

# handle the last key 
last_item = index_key_tuples[size-1] 
key = last_item[1] 
start_index = last_item[0] + len(key) + 2 
key_content = text[start_index:].strip() 
print(key, key_content)

來源

2017-04-07 12:13:12 Fogmoon

先行斷言具有多個值

回答

相關問題