正確地將輸出格式化爲文件

我正在解析URL並將它們保存到文件中。我的代碼工作正常的Windows，但在Ubuntu它增加了一個小「U」每一行正確地將輸出格式化爲文件

import re 

reports = "C:\Users/_____/Desktop/Reports/" 
string = "Here is a string to test. http://www.blah.com & http://2nd.com" 
url_match = re.findall(r'(https?://[^\s]+)', string) 
print url_match 

if url_match != []: 
    with open(reports + "_URLs.txt", "a") as text_file: 
     text_file.write('{}'.format(url_match).replace(',', "\n").replace('[', '').replace(']', '').replace("'", '').replace(' ', '').__add__("\n"))

的前面有沒有人對如何解決這種想法？謝謝

來源

2015-11-19 BeMy Friend

怎麼樣'text_file.write（「{}」格式（url_match）.replace（「」，「\ n」）。replace（'['，''）.replace（']'，''）.replace（''「，''）.replace（''，''）.__ add __（」\ n 「）[1：]）（最後注意'[1：]'） – inspectorG4dget

''{}'。format（url_match）'就是'url_match'。 – TigerhawkT3

另外，您應該使用'+'而不是'.__ add __（）'。 – TigerhawkT3

'{}'.format(url_match)將url_match列表變成其人類可讀的字符串，然後您使用一些複雜的字符串替換回到要寫入的行的列表。沿着這條線你會得到一個unicode字符串，因此就是'u'。我不會去猜測爲什麼發生這種情況，因爲真正的解決辦法是隻處理列表：

import re 

# reports = "C:\Users/_____/Desktop/Reports/" 
reports = "" # for test 
string = "Here is a string to test. http://www.blah.com & http://2nd.com" 
url_match = re.findall(r'(https?://[^\s]+)', string) 
print url_match 
if url_match: 
    with open(reports + "_URLs.txt", "a") as text_file: 
     for url in url_match: 
      text_file.write(url + '\n')

來源

2015-11-20 00:17:26 tdelaney

是的，這工作...謝謝！ >>「，然後OP試圖用一堆替換來破解。」 :-)我會到達那裏。再次感謝 –

我並不是故意要這麼苛刻！有時候，如果它看起來像一個徹頭徹尾的黑客，它是一個好主意，回到原始數據，並尋找一種更乾淨的方式。 – tdelaney

此外，如果你發現你仍然有unicode數據進來，也許是因爲輸入文件有這個，或者你從剪貼板粘貼，你可能想插入'url_match = [item.encode（'ascii'，'ignore'））對於url_match中的項目]' – jeedo

正確地將輸出格式化爲文件

回答

相關問題