2017-01-30 34 views
1

我想逐行處理一個字符串,但我想啓用多線支持。這是示例文本:Python:如何拆分線合併一些線

First line 
Second line 
{{{ 
these three lines 
I want to process 
together 
}}} 
Last Line 

我想我曾經通過下列方式逐行處理它多以{{{,終點開始在}}}

lines = [l for l in text.splitlines()] 
print lines 

眼下這段代碼輸出:

['First line', 'Second line', '{{{', 'these three lines', 'I want to process', 'together', '}}}', 'Last Line'] 

我想以某種方式使lines包含以下內容:

['First line', 'Second line', 'these three lines I want to process together', 'Last Line'] 

或者更高級的例子

First Line 
Second line 
Third{{{line 
fourth line 
fifth}}}line 
sixth line 

在這種情況下,我想行包含

['First Line', 'Second line', 'Third', 'line fourth line fifth', 'line', 'sixth line'] 
+1

嘗試迭代當前輸出,檢查'{{{',然後連接所有行,直到你到'}}}'。 – tburrows13

回答

3

這是一個生成器,它將參數作爲輸入文件對象,並一次生成一行。它應該接受盡可能多的{{{}}}上同一行但不測試不平衡構建體:

def merge_lines(fd): 
    concat = False 
    for line in fd: 
     while True: 
      #print (line) 
      if len(line.strip()) == 0: break 
      if not concat: 
       if ('{{{' in line): 
        deb, line = line.split('{{{', 1) 
        yield deb 
        concat = True 
        old = None 
       else: 
        yield line.strip('\r\n') 
        line = "" 
      if concat: 
       if ('}}}' in line): 
        deb, line = line.split('}}}', 1) 
        concat = False 
        if old: 
         yield old.strip() + ' ' + deb 
        else: yield deb 
       else: 
        if old: 
         old += ' ' + line.strip('\r\n') 
        else: 
         old = line.strip('\r\n') 
        line = "" 

實施例在Python 3:

>>> t = """First line 
a{{{b}}}c{{{d 
e 
f}}}g{{{h 
i}}} 
j 
k 
""" 
>>> for line in merge_lines(io.StringIO(t)): print(line) 

First line 
a 
b 
c 
d e f 
g 
h i 
j 
k 
0
def split(text):  
    lines = [] 
    while '{{{' in text: 
     head, sep, tail = text.partition('{{{') 
     lines.extend(head.splitlines()) 
     head, sep, tail = tail.partition('}}}') 
     lines.append(head.replace('\n', ' ').strip()) 
     text = tail 

    lines.extend(text.splitlines()) 
    return lines 
0

這裏是我的解決方案。它很長很簡單。我希望也許有一種方法,使其在短短的幾行,但它不會處理案件時}}}{{{是在同一行

def _split_with_merging(text): 
    lines = [l for l in text.splitlines() if l != ""] 
    nlines = [] 
    multiline = False 
    for l in lines: 
     if multiline: 
      if "}}}" in l: 
       lparts = l.split("}}}") 
       nlines[len(nlines) - 1] += lparts[0] 
       if lparts[1] != "": 
        nlines.append(lparts[1]) 
       multiline = False 
      else: 
       nlines[len(nlines) - 1] += l 
     else: 
      if "{{{" in l: 
       lparts = l.split("{{{") 
       nlines.append(lparts[0]) 
       if lparts[1] != "": 
        nlines.append(lparts[1]) 
       multiline = True 
      else: 
       nlines.append(l) 
    return nlines 
0

您可以使用正則表達式,假設如果你有興趣之間的線{{{ }}}}

text = """First line 
Second line 
THIS{{{ 
these three lines 
I want to process 
together 
}}} 
Last Line""" 

import re 
match_obj = re.search('{{{(.*)}}}', text, re.DOTALL) 
print match_obj.group(1) 

OR

r = re.compile('{{{(.*)}}}', flags=re.DOTALL) 
print re.split(r, text) 
# replace \n 
split_list = re.split(r, text) 
split_list = [l.replace('\n', '') for l in split_list] 
print split_list 

OR

match_list = re.findall('{{{(.*)}}}', text, re.DOTALL) 
match_list = [l.replace('\n', '') for l in match_list] 
print match_list 

如果在給定的文本中有多個{{{ }}},請使用非貪婪匹配,方法是添加'?'例如{{{(.*?)}}}

0

我想這樣的作品作爲一種快速和簡單的解決方案爲您所要完成的任務:

text = """First line 
Second line 
{{{ 
these three lines 
I want to process 
together 
}}} 
Last Line""" 

all_lines = [l for l in text.splitlines()] 
final_list = [] 

nested = False 

for line in all_lines: 
    if line == "{{{": 
     nested = True 
     multiline = "" 
     continue 
    elif line == "}}}": 
     nested = False 
     final_list.append(multiline) 
     continue 


    if nested == True:   
     multiline = multiline + " " + line    
    else: 
     final_list.append(line) 


print(final_list) 

也許不是最乾淨的過代碼,我認爲我們應該用一個.format()更換multiline = multiline + " " + line,但我希望你明白這個主意。

+0

噢......我現在注意到,您還想要處理「{{{」位於其他文本之間的情況。這將需要進一步的工作:) –

0

跟蹤開幕{{{並與in_multi標誌 環路閉合}}}是straigh前鋒:

def split_multi(s): 
    lines = [] 
    in_multi = False 
    for line in s.splitlines(): 
     if in_multi: 
      if '}}}' in line: 
       in_multi = False 
       split = line.split('}}}') 
       if split[0]: 
        tmp.append(split[0]) 
       lines.append(' '.join(tmp)) 
       if split[-1]: 
        lines.append(split[-1]) 
      else: 
       tmp.append(line) 
     else: 
      if '{{{' in line: 
       split = line.split('{{{') 
       in_multi = True 
       if split[0]: 
        lines.append(split[0]) 
        if split[-1]: 
         tmp = [split[-1]] 
       else: 
        tmp = [] 
      else: 
       lines.append(line) 

    return lines 


s1 = """First line 
Second line 
{{{ 
these three lines 
I want to process 
together 
}}} 
Last Line""" 

s2 = """First Line 
Second line 
Third{{{line 
fourth line 
fifth}}}line 
sixth line""" 

print(split_multi(s1)) 
print(split_multi(s2)) 
#['First Line', 'Second line', 'Third', 'line fourth line fifth', 'line', 'sixth line'] 

輸出:

['First line', 'Second line', 'these three lines I want to process together', 'Last Line'] 
['First Line', 'Second line', 'Third', 'line fourth line fifth', 'line', 'sixth line'] 
2

使用正則表達式似乎是一個明智的解決辦法 - 它給您的兩個輸入選項之間的靈活性

import re 

only_line = '''First line 
Second line 
{{{ 
these three lines 
I want to process 
together 
}}} 
Last Line''' 

mixed_line = '''First Line 
Second line 
Third{{{line 
fourth line 
fifth}}}line 
sixth line''' 

def curly_brackets(input_string): 
    # regex - we want to match text before the backets, text in the brackets, and text after the brackets as three groups 
    separate = list(re.findall('(.*)\{{3}(.*)\}{3}(.*)', input_string, re.DOTALL)[0]) 

    # 1-indexed item will be the value between brackets - replace carriage returns with spaces 
    separate[1] = separate[1].replace('\n', ' ') 

    # split according to new lines - there will be none in our bracketed section 
    separate = [x.strip().split('\n') for x in separate] 

    # flatten the lists down - each element of separate is currently a list 
    return [x for sublist in separate for x in sublist] 

print curly_brackets(only_line) 
print curly_brackets(mixed_line) 

這將返回:

['First line', 'Second line', 'these three lines I want to process together', 'Last Line'] 
['First Line', 'Second line', 'Third', 'line fourth line fifth', 'line', 'sixth line'] 

,如果你擁有多套大括號的,但可以適用於迭代的方式應用這將無法正常工作。

0

我2美分(使用joint):

ex1 = """First line 
Second line 
{{{ 
these three lines 
I want to process 
together 
}}} 
Last Line""" 

ex2 = """First Line 
Second line 
Third{{{line 
fourth line 
fifth}}}line 
sixth line""" 

def parse_lines(txt, start_sep='{{{', end_sep='}}}'): 
    depth = 0 # 1+ if we are inside a {{{ group 
       # can be used to test unbalanced constructs 
    lines = [] 
    current_line = '' 
    n = len(txt) 
    i = 0 
    while i < n: 
     c = txt[i] 
     not_handled = True 
     need_to_add = False 
     if c == '\n': # end of line 
      if depth == 0 : # save line and empty buffer 
       need_to_add = True 
      elif current_line != '': # add a space instead of the line break 
       current_line = ''.join((current_line,' ')) 
      not_handled = False 
      i += 1 
     elif c == start_sep[0] and\ 
      txt[i:i+len(start_sep)] == start_sep: 
      #^takes small advantage of lazy evaluation 
      # (see questions/13960657) 
       depth += 1 
       need_to_add = True 
       not_handled = False 
       i += len(start_sep) 
     elif c == end_sep[0] and\ 
      txt[i:i+len(end_sep)] == end_sep: 
       depth -= 1 
       need_to_add = True 
       not_handled = False 
       i += len(end_sep) 
     if not_handled: 
      current_line = ''.join((current_line,c)) 
      i += 1 
     elif need_to_add and current_line != '': 
      lines.append(current_line) 
      current_line = '' 
    if current_line != '': # add last line 
     lines.append(current_line) 
    return lines 

其中返回:

>>> parse_lines(ex1) 
['First line', 'Second line', 'these three lines I want to process together ', 'Last Line'] 
>>> parse_lines(ex2) 
['First Line', 'Second line', 'Third', 'line fourth line fifth', 'line', 'sixth line'] 

請注意第一個示例中在多行上的額外' ''\n}}}'結尾。