2017-09-16 14 views
1

在文本文件我有以下的,我需要獲得與datasourcename上一個簡單的CSV格式,文件名RE公式得到2串並投入CSV

數據結構網絡化 <DataSourceDefinitionSet> <TABFileDataSourceDefinition id="id1" readOnly="false"> <DataSourceName>AirportLayout</DataSourceName> <FileName>\\GIS\GIS\Corporate Services\Information Services\AirportLayout.TAB</FileName> </TABFileDataSourceDefinition> <TABFileDataSourceDefinition id="id2" readOnly="false"> <DataSourceName>Asset_Toilets</DataSourceName> <FileName>\\gis\gis\CITY WORKS\Infrastructure Management\Asset_Toilets.TAB</FileName> </TABFileDataSourceDefinition> <TABFileDataSourceDefinition id="id3" readOnly="false"> <DataSourceName>BaseLayer_Text</DataSourceName> <FileName>\\GIS\GIS\Corporate Services\Information Services\BaseLayer_Text.TAB</FileName> </TABFileDataSourceDefinition> CODE

import re 
filename='CRC_Public_Features.mws' 
input_file = open(filename) 
count=0 
for line in input_file: 
    line = line.rstrip() 
    if re.search('<FileName>', line) : 
     line=line.replace('<Filename>','') 
     count+=1 
     print str(count)+','+line 

輸出

>>> 
*** Remote Interpreter Reinitialized *** 
>>> 

1,  <FileName>\\GIS\GIS\Corporate Services\Information Services\AirportLayout.TAB</FileName> 
2,  <FileName>\\gis\gis\CITY WORKS\Infrastructure Management\Asset_Toilets.TAB</FileName> 3, 

我想

1,AirportLayout,\GIS\GIS\Corporate Services\Information Services\AirportLayout.TAB

我嘗試了以下重,但得到沒有任何結果。

'([^] *)'

我該怎麼辦?我需要兩行數據源名稱和文件名。

=====基於公認的答案FINAL代碼中使用

import re 
filename='CRC_Public_Features.mws' 
data = open(filename).read() 
count=0 
#for line in infile: 
#data=line 
values = [re.findall(first+"(.*?)"+second, data) for first, second in [("<{}>".format(b), "</{}>".format(b)) for b in ["DataSourceName","FileName"]]] 
ids = [re.search("\d+", i).group(0) for i in re.findall('id="(.*?)"', data)] 
final_values = [ids[0]] + [i[0] for i in values] 
DataSourceName=values[0] 
FileName=values[1] 
total=len(FileName) 
with open("Output.csv", "w") as text_file: 
     text_file.write("ID,DataSourceName,FileName,MWS\n") 
for item in FileName: 
    print str(count+1)+","+str(DataSourceName[count])+","+str(FileName[count]) 
    with open("Output.csv", "a") as text_file: 
     text_file.write(str(count+1)+","+str(DataSourceName[count])+","+str(FileName[count])+","+str(filename)+"\n") 
    count+=1 
+2

您在此處使用的XML解析器的原因嗎? –

+0

主要是因爲我試圖讓更多的 – GeorgeC

回答

1

你可以試試這個:

import re 
filename='CRC_Public_Features.mws' 
data = open(filename).read() 
values = [re.findall(first+"(.*?)"+second, data) for first, second in [("<{}>".format(b), "</{}>".format(b)) for b in ["DataSourceName","FileName"]]] 
ids = [re.search("\d+", i).group(0) for i in re.findall('id="(.*?)"', data)] 
final_values = [ids[0]] + [i[0] for i in values] 

輸出:

['1', 'AirportLayout', '\\GIS\\GIS\\Corporate Services\\Information Services\\AirportLayout.TAB'] 
+0

謝謝,但我在哪裏提名它打開的文件的名稱?它是'數據'應該是數據=打開(文件名)? – GeorgeC

+0

@GeorgeC道歉無法在第一時間添加。請參閱我最近的編輯。 – Ajax1234

2

隨着xml.etree.ElementTreecsv模塊:

import xml.etree.ElementTree as ET, csv 

tree = ET.parse('CRC_Public_Features.mws') 
root = tree.getroot() 

with open('result.csv', 'w', newline='') as f: 
    writer = csv.writer(f, delimiter=',') 
    for i,ds in enumerate(root.findall('TABFileDataSourceDefinition'), 1): 
     writer.writerow([i, ds.find('DataSourceName').text, ds.find('FileName').text]) 

最終result.csv內容:

1,AirportLayout,\\GIS\GIS\Corporate Services\Information Services\AirportLayout.TAB 
2,Asset_Toilets,\\gis\gis\CITY WORKS\Infrastructure Management\Asset_Toilets.TAB 
3,BaseLayer_Text,\\GIS\GIS\Corporate Services\Information Services\BaseLayer_Text.TAB 
+0

好主意,但我得到>>> 回溯(最近通話最後一個): 文件「」,第6行,在 類型錯誤:「換行」是該功能的無效關鍵字參數 >>> – GeorgeC

+1

還不如開始枚舉1而不是添加一個... –

+0

@JonClements,當然,編輯 – RomanPerekhrest