所以我有專利數據,我希望從XML存儲到CSV文件。我已經能夠通過發明名稱,日期,國家和專利號碼的每次迭代運行我的代碼,但是當我嘗試將結果寫入CSV文件時出現問題。CSV Writer只寫第一行文件
的XML數據看起來像這樣(爲衆多的一個部分):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<us-patent-grant lang="EN" dtd-version="v4.2 2006-08-23" file="USD0584026-20090106.XML" status="PRODUCTION" id="us-patent-grant" country="US" date-produced="20081222" date-publ="20090106">
<us-bibliographic-data-grant>
<publication-reference>
<document-id>
<country>US</country>
<doc-number>D0584026</doc-number>
<kind>S1</kind>
<date>20090106</date>
</document-id>
</publication-reference>
我通過一單通運行,寫這幾行代碼是:
for xml_string in separated_xml(infile): # Calls the output of the separated and read file to parse the data
soup = BeautifulSoup(xml_string, "lxml") # BeautifulSoup parses the data strings where the XML is converted to Unicode
pub_ref = soup.findAll("publication-reference") # Beginning parsing at every instance of a publication
lst = [] # Creating empty list to append into
for info in pub_ref: # Looping over all instances of publication
# The final loop finds every instance of invention name, patent number, date, and country to print and append into
with open('./output.csv', 'wb') as f:
writer = csv.writer(f, dialect = 'excel')
for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
#print(inv_name.text, pat_num.text, date_num.text, country.text)
#lst.append((inv_name.text, pat_num.text, date_num.text, country.text))
writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])
最後,我的.csv文件中的輸出是這樣的:
"Content addressable information encapsulation, representation, and transfer",07475432,20090106,US
我不確定問題出在哪裏,我知道我還是一個Python新手,但任何人都可以找到問題嗎?
'開放(「./ output.csv」,‘AB’+)' –