用字符串替換tweet網址

我試圖刪除大csv文件中的所有URL並用字符串「URL」（所謂的等價標記）替換它。代碼做我想要的，但它在一行中聚集/連接一些行。用字符串替換tweet網址

這意味着原始csv有63.000行，輸出csv只有55000.這不是我想要的。我如何使用此令牌替換鏈接並將所有列分開？

#links are replaced with links 

import re 
with open('data_feat1.csv',"r", encoding="utf-8") as oldfile2, open('data_feat2.csv', 'w',encoding="utf-8") as newfile2: 
    for line in oldfile2: 
     line=re.sub(r"http\S+", r"URL", line) #replaces links with "URL" 
     newfile2.write(line) 
newfile2.close()

來源

2017-05-25 M. H.

您可以發佈一些示例數據嗎？ –

解決的辦法是增加一個「爲」URL「：

line=re.sub(r"http\S+", r'URL"', line) #replaces links with "URL"

我不知道爲什麼它的工作，但它確實！

來源

2017-05-25 16:38:41

它工作的原因是因爲它使用正則表達式來搜索http。

re module處理正則表達式。 re.sub將用第二個參數（URL"）代替匹配的正則表達式。

正則表達式正在做的是搜索http以及它後面的所有內容。「後面的所有內容」由\S+字符指定，這些字符表示「直到空白的所有內容」

請看Pythex。這將是一個學習如何在Python中使用正則表達式的好地方

來源

2017-05-26 20:48:55 MattR

用字符串替換tweet網址

回答

相關問題