你接近:)這給一個鏡頭:
for each_line in txtFreeForm:
match = re.search('add roth (?!in[-]plan)',each_line.lower())
if match is not None:
print(each_line[match.end():])
編輯: 唉唉我誤解......你有其中不少。這需要一些更具攻擊性的魔法。
import re
from functools import partial
txtFreeForm = ['Add roth Sweep non vested money after 5 years of termination',
'Add roth in-plan to the 401k plan.']
def roths(rows):
for row in rows:
match = re.search('add roth\s*', row.lower())
if match:
yield row, row[match.end():]
def filter_pattern(pattern):
return partial(lazy_filter_out, pattern)
def lazy_filter(pattern):
return partial(lazy_filter, pattern)
def lazy_filter_out(pattern, rows):
for row, rest in rows:
if not re.match(pattern, rest):
yield row, rest
def magical_transducer(bad_words, nice_rows):
magical_sentences = reduce(lambda x, y: y(x), [roths] + map(filter_pattern, bad_words), nice_rows)
for row, _ in magical_sentences:
yield row
def main():
magic = magical_transducer(['in[-]plan'], txtFreeForm)
print(list(magic))
if __name__ == '__main__':
main()
爲了解釋一下發生了什麼事聽到,你提到你有很多這些單詞來處理。您可能比較兩組項目的傳統方法是嵌套for循環。所以,
results = []
for word in words:
for pattern in patterns:
data = do_something(word_pattern)
results.append(data)
for item in data:
for thing in item:
and so on...
and so fourth...
我用了幾個不同的技術來試圖實現了「奉承」的實施,避免嵌套循環。我會盡我所能來形容他們。
**Function compositions**
# You will often see patterns that look like this:
x = foo(a)
y = bar(b)
z = baz(y)
# You may also see patterns that look like this:
z = baz(bar(foo(a)))
# an alternative way to do this is to use a functional composition
# the technique works like this:
z = reduce(lambda x, y: y(x), [foo, bar, baz], a)
爲什麼不應該返回兩個字符串,如果它們都包含「Add roth」? –
'如果'加入roth'each_line.lower():...'是解決這個問題的更便宜的方法。不需要「重新」。 – DyZ
我同意'in'是一種更便宜的方法。 @AndreiSavin我知道它會返回兩個,如果在文本中找到。但我正在尋找一種方法來區分只包含'add roth'的句子和那些包含'add roth in plan'的句子 –