2015-10-09 202 views
0

我想將xml文件轉換爲csv文件,我嘗試了bash腳本awk,xmlstarlet但沒有運氣,現在我在python中嘗試這個但仍然沒有運氣,下面是我的示例 xml文件使用python將xml文件轉換爲csv文件

<items><item> 
<Name>demo title 1</Name> 
<FileType>image</FileType> 
<ReleaseDate>15 May 2015</ReleaseDate> 
<Quality> 
HDRiP</Quality> 
<size>2848292</size> 
<Rating>6.6</Rating> 
<Genre>Comedy, 
Music</Genre> 
<Cast>rules bank demo, 
anademo demo 2, 
Hai demo 3, 
Ale Demo 4</Cast> 
<Languages>English</Languages> 
<Subtitles> 
hindi</Subtitles> 
<FileName>demo title 1 fname</FileName> 
<FileSize>1.4GB</FileSize> 
<NoOfFiles>5</NoOfFiles> 
<UploadTime>4 months</UploadTime> 
<DateOfDataCapture>May 29, 2015</DateOfDataCapture> 
<TimesDownloaded>2,339</TimesDownloaded> 
<UpVotes>+742</UpVotes> 
<DownVotes>-37</DownVotes> 
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType> 
<Summary>this is demo pics 
collected for wallpapers only it is free available on many app and urls. 

Written by 

demo1.Cdemo324.78K 

report summary</Summary> 
</item><item> 
<Name>demo title 2</Name> 
<FileType>image</FileType> 
<ReleaseDate>16 May 2015</ReleaseDate> 
<Quality> 
HDRiP</Quality> 
<size>2855292</size> 
<Rating>6.9</Rating> 
<Genre>Comedy, 
Music</Genre> 
<Cast>rules bank demo, 
anademo demo 12, 
Hai demo 13, 
Ale Demo 14</Cast> 
<Languages>English</Languages> 
<Subtitles> 
hindi</Subtitles> 
<FileName>demo title 2 fname</FileName> 
<FileSize>1.3GB</FileSize> 
<NoOfFiles>5</NoOfFiles> 
<UploadTime>4 months</UploadTime> 
<DateOfDataCapture>May 29, 2015</DateOfDataCapture> 
<TimesDownloaded>2,339</TimesDownloaded> 
<UpVotes>+742</UpVotes> 
<DownVotes>-37</DownVotes> 
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType> 
<Summary>this is demo pics 2 
collected for wallpapers only it is free available on many app and urls. 

Written by 

demo2.C2demo324.78K 

report summary</Summary> 
</item> 
</items> 

i want convert into csv file and each <item> records should be in same line , 

when i am trying to use xml parser , it is converted records into csv file but issue is my tag values in multiple line and also contain new line character so it is converted csv in same way like 
below is sample csv file converted. 
demo title 1,image,15 May 2015, 
HDRiP, 
2848292,6.6,Comedy, 
Music,rules bank demo, 
anademo demo 2, 
Hai demo 3, 
Ale Demo 4,English 

i want it new line character should be replace by space so all records of single items saved in one row in csv file . 

我試圖蟒蛇XML解析器xml2csv太多,但梁沒有運氣,普萊舍建議我怎麼能讀取XML文件,並與空間刪除這些不必要的換行字符。

+0

請看看[編輯的幫助(http://stackoverflow.com/editing-help)。 – Cyrus

回答

0

嘗試這樣的:

 import csv 
    from lxml import etree 

    # in: xml with trader joe's locations 
     # out: csv with trader joe's locations 

     out = raw_input("Name for output file: ") 
    if out.strip() is "": 
    out = "trader-joes-all-locations.csv" 

    out_data = [] 

    # use recover=True to ignore errors in the XML 
      # examples of errors in this XML: 
     # missing "<" in opening tag: 
      # fax></fax> 
     # missing "</" in closing tag: 
      # <uid>1429860810uid> 
      # 
     # also ignore blank text 
    parser = etree.XMLParser(recover=True, remove_blank_text=True) 

     # xml on disk...could also pass etree.parse a URL 
     file_name = "trader-joes-all-locations.xml" 

     # use lxml to read and parse xml 
      root = etree.parse(file_name, parser) 

     # element names with data to keep 
     tag_list = [ "name", "address1", "address2", "beer", "city",      "comingsoon", "hours", "latitude", "longitude", "phone", "postalcode", "spirits", "state", "wine" ] 

    # add field names by copying tag_list 
    out_data.append(tag_list[:]) 

    def missing_location(p): 
    lat = p.find("latitude") 
    lon = p.find("longitude") 
if lat is None or lon is None: 
return True 
else: 
    return False 

     # pull info out of each poi node 
    def get_poi_info(p): 
    # if latitude or longitude doesn't exist, skip 
     if missing_location(p): 
     print "tMissing location for %s" % p.find("name").text 
return None 
    info = [] 
    for tag in tag_list: 
    # if tag == "name": 
    # print "%s" % p.find(tag).text 
    node = p.find(tag) 
    if node is not None and node.text: 
    if tag == "latitude" or tag == "longitude": 
    info.append(round(float(node.text), 5)) 
    else: 
    info.append(node.text.encode("utf-8")) 
    # info.append(node.text.encode("ascii", "ignore")) 
else: 
    info.append("") 
return info 

print "nreading xml..." 

# get all <poi> elements 
pois = root.findall(".//poi") 
    for p in pois: 
    poi_info = get_poi_info(p) 
# print "%s" % (poiInfo) 
if poi_info: 
out_data.append(poi_info) 

print "finished xml, writing file..." 

out_file = open(out, "wb") 
csv_writer = csv.writer(out_file, quoting=csv.QUOTE_MINIMAL) 
    for row in out_data: 
csv_writer.writerow(row) 

out_file.close() 

print "wrote %sn" % out