2017-07-09 57 views
3

字符串(\ S):只有一個匹配使用後返回| *

Person1(has(1, 1) has(2, 2) 
    has(3, 3) 
    had(4, 4) had(5, 5)) 
Person2(has(6, 6) had(7, 7)) 

我想has()選擇所有內容爲Person1,即['1, 1', '2, 2', '3, 3']

我嘗試過:has\((\d, \d)\)(.|\s)*Person2與全局模式標誌,但只返回1, 1

回答

5

使用re.findall()功能的解決方案:

import re 

s = ''' 
Person1(has(1, 1) has(2, 2) 
    has(3, 3) 
    had(4, 4) had(5, 5)) 
Person2(has(6, 6) had(7, 7))''' 

has_items = re.findall(r'(?<!Person2\()has\(([^()]+)\)', s) 
print(has_items) 

輸出:

['1, 1', '2, 2', '3, 3'] 

  • (?<!Person2\() - 回顧後負斷言,確保了關鍵has子串不前面帶有Person2(

  • ([^()]+) - 所述第一捕獲含has


組到grep has項一定Person使用下面統一方法具有延長例如:

def grepPersonItems(s, person): 
    person_items = [] 
    person_group = re.search(r'(' + person + '\(.*?\)\))', s, re.DOTALL) 

    if person_group: 
     person_items = re.findall(r'has\(([^()]+)\)', person_group.group()) 
    return person_items 

s = ''' 
Person1(has(1, 1) has(2, 2) 
    has(3, 3) 
    had(4, 4) had(5, 5)) 
Person2(has(6, 6) had(7, 7), has(8,8)) Person3(has(2, 6) had(7, 7), has(9, 9))''' 

print('Person1: ', person1_items) 
print('Person2: ', person2_items) 
print('Person3: ', person3_items) 

print(person1_items) 
print(person2_items) 
print(person3_items) 

輸出T:

Person1: ['1, 1', '2, 2', '3, 3'] 
Person2: ['6, 6', '8, 8'] 
Person3: ['2, 6', '9, 9'] 
+0

我只能選擇'Person1'或''Person2' has'?如果'Person2'有1個以上'has',那麼在第一個之後的那個也會被選中。謝謝。 – Harrison

+0

@哈里森,詳細說明你的問題,你想grep'Person1'和'Person2'的所有'has'項目或任何可能的人嗎? – RomanPerekhrest

+0

我的原始問題是爲'Person1'獲得所有'has',但如果你還可以爲'Person2'提供另一個正則表達式,那將會很好。我在問題中簡化了'Person2',它也可以有多個'has'和多行。 – Harrison

1

爲什麼不完全分析,然後你可以拿起你的任何可能需要的 - 你需要兩個模式,一個抓住每個人,它的內容,另搶在其中個人部分+您可以添加更多解析來獲取單個元素並將其轉換爲本機Python類型。例如:

import collections 
import re 

persons = re.compile(r"(Person\d+)\(((?:.*?\(.*?\)\s*)+)\)") 
contents = re.compile(r"(\w+)\((.*?)\)") 

def parse_input(data, parse_inner=True, map_inner=str): 
    result = {} # store for our parsed data 
    for match in persons.finditer(data): # loop through our `Persons` 
     person = match.group(1) # grab the first group to get our Person 
     elements = collections.defaultdict(list) # store for the parsed inner elements 
     for element in contents.finditer(match.group(2)): # loop through the has/had/etc. 
      element_name = element.group(1) # the first group holds the name 
      element_data = element.group(2) # this is the inner content of each has/had/etc. 
      if parse_inner: # if we want to parse the inner elements... 
       element_data = [map_inner(x.strip()) for x in element_data.split(",")] 
      elements[element_name].append(element_data) # add our inner results 
     result[person] = elements # add persons to our result 
    return result # well, obvious... 

然後,您可以解析所有內容並將其存取到您心中的內容。最簡單的例子是:

test = """Person1(has(1, 1) has(2, 2) 
    has(3, 3) 
    had(4, 4) had(5, 5)) 
Person2(has(6, 6) had(7, 7))""" 

parsed = parse_input(test, False) # basic string grab 

print(parsed["Person1"]["has"]) # ['1, 1', '2, 2', '3, 3'] 
print(parsed["Person2"]["has"]) # ['6, 6'] 
print(parsed["Person2"]["had"]) # ['7, 7'] 

但你可以做這麼多......你可以有多個添加的人和有它「轉換」成實際的Python結構:

test = """Person1(has(1, 1) has(2, 2) 
    has(3, 3) 
    had(4, 4) had(5, 5)) 
Person2(has(6, 6) had(7, 7)) 
Person3(has(1, 2) has(3, 4) has(4, 5) foo(6, 7))""" 

parsed = parse_input(test, True, int) # parses everything and auto-converts to int 

print(parsed["Person3"]["has"]) # [[1, 2], [3, 4], [4, 5]] 
print(parsed["Person3"]["has"][1]) # [3, 4] 
print(sum(parsed["Person3"]["foo"][0])) # 13 
print(parsed["Person1"]["has"][1] + parsed["Person2"]["has"][0]) # [2, 2, 6, 6] 
# etc. 
0

我想你可能會嘗試這種方法,我認爲這對所有人來說都是動態和簡單的。它分割並解析字符串,並在Person的字典中推送每個需要的數組。

樣品來源(run here):

import re 

regex = r"has\(\s*(\d+)\s*,\s*(\d+)\s*\)" 

dict={} 
test_str = ("Person1(has(1, 1) has(2, 2)\n" 
    " has(3, 3) \n" 
    " had(4, 4) had(5, 5))\n" 
    "Person2(had(6, 6) has(7, 7))\n" 
    "Person3(had(6, 6) has(8, 8))") 

res=re.split(r"(Person\d+)",test_str) 
currentKey=""; 
for rs in res: 
    if "Person" in rs: 
     currentKey=rs; 
    elif currentKey !="": 
     matches = re.finditer(regex, rs, re.DOTALL) 
     ar=[] 
     for match in matches: 
      ar.append(match.group(1)+","+match.group(2)) 
     dict[currentKey]=ar; 
print(dict) 

輸出爲:

{'Person1': ['1,1', '2,2', '3,3'], 'Person2': ['7,7'], 'Person3': ['8,8']} 
相關問題