我想說的是,這個地方幫助我比我還原得更多。我想對所有在過去幫助過我的人表示感謝:)。抓取一個關鍵字和Python中的關鍵字之間的文本
我想從特定的樣式消息中劃出一些文本。這是格式化這樣的:
DATA|1|TEXT1|STUFF: some random text|||||
DATA|2|TEXT1|THINGS: some random text and|||||
DATA|3|TEXT1|some more random text and stuff|||||
DATA|4|TEXT1|JUNK: crazy randomness|||||
DATA|5|TEXT1|CRAP: such random stuff I cant believe how random|||||
我有如下所示的代碼結合了文本添加單詞之間的空間,並將其添加到一個名爲「TEXT」字符串,它看起來像這樣:
STUFF: some random text THINGS: some random text and some more random text and stuff JUNK: crazy randomness CRAP: such random stuff I cant believe how random
我需要它這樣形成:
DATA|1|TEXT1|STUFF: |||||
DATA|2|TEXT1|some random text|||||
DATA|3|TEXT1|THINGS: |||||
DATA|4|TEXT1|some random text and|||||
DATA|5|TEXT1|some more random text and stuff|||||
DATA|6|TEXT1|JUNK: |||||
DATA|7|TEXT1|crazy randomness|||||
DATA|8|NEWTEXT|CRAP: |||||
DATA|9|NEWTEXT|such random stuff I cant believe how random|||||
行號很容易,我已經完成以及carraige返回。我需要的是抓住「CRAP」並將「TEXT1」部分改爲「NEWTEXT」。
我的代碼掃描尋找關鍵字字符串,然後將其添加到自己的線路,然後增加在它們下面的文本,然後在自己的行等下一個關鍵詞這裏是我的代碼,我到目前爲止有:
#this combines all text to one line and adds to a string
while current_segment.move_next('DATA')
TEXT = TEXT + " " + current_segment.field(4).value
KEYWORD_LIST = [STUFF:', THINGS:', JUNK:']
KEYWORD_LIST1 = [CRAP:']
#this splits the words up to search through
TEXT_list = TEXT.split(' ')
#this searches for the first few keywords then stops at the unwanted one
for word in TEXT_list:
if word in KEYWORD_LIST:
my_output = my_output + word
elif word in KEYWORD_LIST1:
break
else:
my_output = my_output + ' ' + word
#this searches for the unwanted keywords leaving the output blank until it reaches the wanted keyword
for word1 in TEXT_list:
if word1 in KEYWORD_LIST:
my_output1 = ''
elif word1 in KEYWORD_LIST1:
my_output1 = my_output1 + word1 + '\n'
else:
my_output1 = my_output1 + ' ' + word1
#my_output is formatted back the way I want deviding up the text into 65 or less character lines
MAX_LENGTH = 65
my_wrapped_output = wrap(my_output,MAX_LENGTH)
my_wrapped_output1 = wrap(my_output1,MAX_LENGTH)
my_output_list = my_wrapped_output.split('\n')
my_output_list1 = my_wrapped_output1.split('\n')
for phrase in my_output_list:
if phrase == "":
SetID +=1
output = output + "DATA|" + str(SetID) + "|TEXT| |||||"
else:
SetID +=1
output = output + "DATA|" + str(SetID) + "|TEXT|" + phrase + "|||||"
for phrase2 in my_output_list1:
if phrase2 == "":
SetID +=1
output = output + "DATA|" + str(SetID) + "|NEWTEXT| |||||"
else:
SetID +=1
output = output + "DATA|" + str(SetID) + "|NEWTEXT|" + phrase + "|||||"
#this populates the fields I need
value = output
然後,我在「my_output」和「my_output1」格式中添加「NEWTEXT」字樣。此代碼貫穿每行查找關鍵字,然後將該關鍵字和carraige返回。一旦它獲得另一個「KEYWORD_LIST1」它將停止並刪除其餘文本,然後開始下一個循環。我的問題是,上面的代碼給我這樣的:
DATA|1|TEXT1|STUFF: |||||
DATA|2|TEXT1|some random text|||||
DATA|3|TEXT1|THINGS: |||||
DATA|4|TEXT1|some random text and|||||
DATA|5|TEXT1|some more random text and stuff|||||
DATA|6|TEXT1|JUNK: |||||
DATA|7|TEXT1|crazy randomness|||||
DATA|8|NEWTEXT|crazy randomness|||||
DATA|9|NEWTEXT|CRAP: |||||
DATA|10|NEWTEXT|such random stuff I cant believe how random|||||
它是從「KEYWORD_LIST1」之前抓住了文本並將其添加到使用newText部分。我知道有一種方法可以根據關鍵字和文本創建組,但我不清楚如何實施它。任何幫助將非常感激。
謝謝。
這是我必須做的就是它爲我工作:
KEYWORD_LIST = ['STUFF:', 'THINGS:', 'JUNK:']
KEYWORD_LIST1 = ['CRAP:']
def text_to_message(text):
result=[]
for word in text.split():
if word in KEYWORD_LIST or word in KEYWORD_LIST1:
if result:
yield ' '.join(result)
result=[]
yield word
else:
result.append(word)
if result:
yield ' '.join(result)
def format_messages(messages):
title='TEXT1'
for message in messages:
if message in KEYWORD_LIST:
title='TEXT1'
elif message in KEYWORD_LIST1:
title='NEWTEXT'
my_wrapped_output = wrap(message,MAX_LENGTH)
my_output_list = my_wrapped_output.split('\n')
for line in my_output_list:
if line = '':
yield title + '|'
else:
yield title + '|' + line
for line in format_messages(text_to_message(TEXT)):
if line = '':
SetID +=1
output = "DATA|" + str(SetID) + "|"
else:
SetID +=1
output = "DATA|" + str(SetID) + "|" + line
#this is needed instead of print(line)
value = output
由於習慣問題,'ALL_CAPS'和正常情況下不會在變量名混合。你的'TEXT_list'可能更適合名爲'text_list'。只是一個小問題。 – brc
我可能會嘗試'csv'模塊,而不是自己做。 – Dave
brc,必須通過我的java編碼哈哈。 我更新了上述內容以提供更多詳細信息。 – Opy