從文本文件中刪除網址

我需要從文本文件中刪除所有網址。我讀取文件，我一行一行地迭代，然後寫一個乾淨的文件。但是下面的代碼很奇怪。它刪除原始文件的第一行，並總共添加新的3行。最重要的是它不會刪除網址。從文本文件中刪除網址

import sys 
import re 

sys.stdout = open('text_clean.txt', 'w') 

with open("text.txt",encoding="'Latin-1'") as f: 
    rep = re.compile(r""" 
         http[s]?://.*?\s 
         |www.*?\s 
         |(\n) 
         """, re.X) 
    non_asc = re.compile(r"[^\x00-\x7F]") 
    for line in f: 
     non = non_asc.search(line) 
     if non: 
      continue 
     m = rep.search(line) 
     if m: 
      line = line.replace(m.group(), "") 
      if line.strip(): 
       print(line.strip())

來源

2016-08-06 ganesa75

你爲什麼要覆蓋stdout？你不需要那個 –

你可以用「」用正則表達式替換任意匹配，並且它可能是最有效的方式做到這一點我用

import re 
new_file = open('text_clean.txt', 'w') 
with open("text.txt",encoding="'Latin-1'") as f: 
    text = re.sub(r'(?:(?:http|https):\/\/)?([-a-zA-Z0-9.]{2,256}\.[a-z]{2,4})\b(?:\/[[email protected]:%_\+.~#?&//=]*)?',"",f.read(),flags=re.MULTILINE) 
    text = '\n'.join([a for a in text.split("\n") if a.strip()]) 
    new_file.write(text) 

new_file.close()

測試例如：

asdas 
d 
asd 
asd 
https://www.google.com 
http://facebook.com 
facebook.com 
google.com 
dasd.asdasd.asd //this is url too ?

輸出：

asdas 
d 
asd 
asd 
//this is url too ?

來源

2016-08-06 12:53:04

它確實如前所述。它刪除第一行，它在文本文件的總行數上增加了2行，並且不刪除任何http任何URL。 – ganesa75

你能給我們輸入文件的例子 –

我編輯看現在:) –

從文本文件中刪除網址

回答

相關問題