使用Python,我必須編寫一個基本上「清理」數據文本文件的腳本。到目前爲止,我已經取出了所有不需要的字符或將它們替換爲可接受的字符(例如,可以用空格替換破折號-
)。現在我已經到了必須分開加在一起的單詞的地步。這裏是文本的第15行的代碼段文件用大寫字母分隔連接詞
AccessibleComputing Computer accessibility
AfghanistanHistory History of Afghanistan
AfghanistanGeography Geography of Afghanistan
AfghanistanPeople Demographics of Afghanistan
AfghanistanCommunications Communications in Afghanistan
AfghanistanMilitary Afghan Armed Forces
AfghanistanTransportations Transport in Afghanistan
AfghanistanTransnationalIssues Foreign relations of Afghanistan
AssistiveTechnology Assistive technology
AmoeboidTaxa Amoeba
AsWeMayThink As We May Think
AlbaniaHistory History of Albania
AlbaniaPeople Demographics of Albania
AlbaniaEconomy Economy of Albania
AlbaniaGovernment Politics of Albania
我想要做的是獨立的是在其中大寫字母出現點相連接的話。例如,我希望第一行看起來像這樣:
Accessible Computing Computer accessibility
腳本必須接受文件輸入並將結果寫入輸出文件。這是我目前所擁有的,根本不起作用! (不知道如果我在正確的軌道或沒有在任)
import re
input_file = open("C:\\Users\\Lucas\\Documents\\Python\\pagelinkSample_10K_cleaned2.txt",'r')
output_file = open("C:\\Users\\Lucas\\Documents\\Python\\pagelinkSample_10K_cleaned3.txt",'w')
for line in input_file:
if line.contains('A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'):
newline = line.
output_file.write(newline)
input_file.close()
output_file.close()
我想要做的是在連接到前一個單詞的大寫字母之前插入一個空格。我早些時候看到了這個話題,但我無法弄清楚文件輸入:( – lsch91