2016-10-13 89 views
0

我已將一篇小說粘貼到文本文件中。 我想爲他們保留在每個頁面的頂部出現的(僅僅取消其在這些線路出現也將這樣做)刪除包含下列句子的所有行:Python從長長的字符串中刪除單詞的完整句子

"Thermal Molecular Movement in , Order and Probability"

"Molecular and Ionic Interactions as the Basis for the Formation"

"Interfacial Phenomena and Membranes"

我第一次嘗試是如下:

mystring = file.read() 
mystring=mystring.strip("Molecular Structure of Biological Systems") 
mystring=mystring.strip("Thermal Molecular Movement in , Order and Probability") 
mystring=mystring.strip("Molecular and Ionic Interactions as the Basis for the Formation") 
mystring=mystring.strip("Interfacial Phenomena and Membranes") 

new_file=open("no_refs.txt", "w") 

new_file.write(mystring) 

file.close() 

但是這對輸出文本文件沒有影響......內容是完全沒有改變......我覺得這很奇怪,因爲下面的例子玩具正常工作:

>>> "Hello this is a sentence. Please read it".strip("Please read it") 
'Hello this is a sentence.' 

正如上面沒有工作,我嘗試以下代替:

file=open("novel.txt", "r") 
mystring = file.readlines() 
for lines in mystring: 
    if "Thermal Molecular Movement in , Order and Probability" in lines: 
     mystring.replace(lines, "") 
    elif "Molecular and Ionic Interactions as the Basis for the Formation" in lines: 
     mystring.replace(lines, "") 
    elif "Interfacial Phenomena and Membranes" in lines: 
     mystring.replace(lines, "") 
    else: 
     continue 

new_file=open("no_refs.txt", "w") 

new_file.write(mystring) 
new_file.close() 
file.close() 

但這種嘗試我得到這個錯誤:

類型錯誤:預期字符串或其他字符緩衝區對象

回答

2
  • 第一個str.strip()只刪除該模式,如果發現開始結束字符串,它解釋它似乎工作在你的測試中,但實際上並不是你想要的。
  • 其次,你要不能在當前行執行列表中的替代(你不分配回置換結果)

這裏有一個固定的版本,成功地消除的模式行:

with open("novel.txt", "r") as file: 
    mystring = file.readlines() 
    for i,line in enumerate(mystring): 
     for pattern in ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]: 
      if pattern in line: 
       mystring[i] = line.replace(pattern,"")      

    # print the processed lines 
    print("".join(mystring)) 

注意enumerate構建體,其允許迭代上的值&索引。僅對值進行迭代將允許查找模式,但不能在原始列表中修改它們。

另請注意,with open構造,只要離開塊就關閉文件。

這裏的一個版本可以完全消除含圖案的線(掛在,有一個在那裏一些單行功能編程的東西):

with open("novel.txt", "r") as file: 
    mystring = file.readlines() 
    pattern_list = ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"] 
    mystring = "".join(filter(lambda line:all(pattern not in line for pattern in pattern_list),mystring)) 
    # print the processed lines 
    print(mystring) 

解釋:根據條件的行的過濾器列表:無不需要的模式必須符合要求。

+0

這是非常感謝:你知道我會如何刪除整條線,而不僅僅是模式,例如「第3.1節」生物系統的能量和動力學「,第337頁」 - 刪除整條線...我試過「mystring.pop(i)」,但它給出:AttributeError:'str'對象沒有屬性'pop' –