2013-04-15 73 views
0

我對Python和GAE非常新,但我試圖從eventful.com api(以XML格式)下載XML文件,解析它並隨後將其存儲Google Cloud SQL上的數據庫中的信息。讀取基於Web的XML文件並在Python中解析它

到目前爲止,我的代碼如下,我已經設法在查看各種在線教程後編寫,但是我不斷收到許多錯誤,代碼根本無法用於我。如果任何人有我的錯在哪裏的指示,請讓我知道,凱倫。

我嘗試打電話給多事的XML文件,並解析它:

import webapp2 
from google.appengine.ext.webapp import template 
import os 
import datetime 
from google.appengine.ext import db 
from google.appengine.api import urlfetch 
import urllib #import python library which does http requests 
from xml.dom import parseString #imports xml parser called minidom 

class XMLParser(webapp2.RequestHandler): 
    def get(self): 
     base_url = fetch('http://api.eventful.com/rest/events/search?app_key=zGtDX6cwQ=dublin&?q=music') 
     #downloads data from xml file 
     response = urllib.urlopen(base_url) 
     #converts data to string: 
     data = response.read() 
     #closes file 
     response.close() 
     #parses xml downloaded 
     dom = parseString(data) 
     #retrieves the first xml tag that the parser finds with name tag 
     xmlTag = dom.getElementsByTagName('title')[0].toxml() 
     #strip off the tag to just reveal event name 
     xmlData = xmlTag.replace('<title>', '').replace('</title>', '') 
     #print out the xml tag and data in this format: 
     print xmlTag 
     #just print the data 
     print xmlData 

我收到以下錯誤,當我嘗試在谷歌App Engine的用戶。然而,運行這段代碼的GAE發射器 -

2013-04-15 16:52:05 Running command: "['C:\\Python27\\python.exe', 'C:\\Program Files  (x86)\\Google\\google_appengine\\dev_appserver.py', '--skip_sdk_update_check=yes', '--port=8080', '--admin_port=8002', u'C:\\Users\\Karen\\Desktop\\Development\\own_tada']" 
INFO  2013-04-15 16:52:17,944 devappserver2.py:498] Skipping SDK update check. 
WARNING 2013-04-15 16:52:18,005 api_server.py:328] Could not initialize images API;  you are likely missing the Python "PIL" module. 
INFO  2013-04-15 16:52:18,065 api_server.py:152] Starting API server at:  http://localhost:54619 
INFO  2013-04-15 16:52:18,085 dispatcher.py:150] Starting server "default" running  at: http://localhost:8080 
INFO  2013-04-15 16:52:18,095 admin_server.py:117] Starting admin server at: http://localhost:8002 
ERROR 2013-04-15 15:52:35,767 wsgi.py:219] 
Traceback (most recent call last): 
    File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 196, in Handle 
    handler = _config_handle.add_wsgi_middleware(self._LoadHandler()) 
    File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 255, in _LoadHandler 
    handler = __import__(path[0]) 
    File "C:\Users\Karen\Desktop\Development\own_tada\own.py", line 8, in <module> 
    from xml.dom import parseString #imports xml parser called minidom 
ImportError: cannot import name parseString 
INFO  2013-04-15 16:52:35,822 server.py:561] default: "GET/HTTP/1.1" 500 - 
ERROR 2013-04-15 15:52:37,586 wsgi.py:219] 
Traceback (most recent call last): 
    File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 196, in Handle 
    handler = _config_handle.add_wsgi_middleware(self._LoadHandler()) 
    File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 255, in _LoadHandler 
    handler = __import__(path[0]) 
    File "C:\Users\Karen\Desktop\Development\own_tada\own.py", line 8, in <module> 
    from xml.dom import parseString #imports xml parser called minidom 
ImportError: cannot import name parseString 
INFO  2013-04-15 16:52:37,617 server.py:561] default: "GET /favicon.ico HTTP/1.1" 500 - 

一個I用於上述代碼,例如教程來自下面的URL:http://www.travisglines.com/web-coding/python-xml-parser-tutorial

編輯:

感謝Josh提供的幫助,我現在在用我的代碼啓動我的代碼時沒有收到任何錯誤,但是我只看到一個空白屏幕並希望它打印出解析的信息(或其迄今爲止的進展情況) 。我知道這可能看起來像一個非常愚蠢的問題,但我真的是一個初學者,所以我很抱歉!固定碼(負誤差)是:

import webapp2 
from google.appengine.ext.webapp import template 
import os 
import datetime 
from google.appengine.ext import db 
from google.appengine.api import urlfetch 
import urllib #import python library which does http requests 
import xml.dom.minidom as mdom #imports xml parser called minidom 

class XMLParser(webapp2.RequestHandler): 
    def get(self): 
    base_url = 'http://api.eventful.com/rest/events/search?app_key=zGtDX6cwQjCRdkf6&l=dublin&?q=music' 
    #downloads data from xml file 
    response = urllib.urlopen(base_url) 
    #converts data to string: 
    data = response.read() 
    #closes file 
    response.close() 
    #parses xml downloaded 
    dom = mdom.parseString(data) 
    #retrieves the first xml tag that the parser finds with name tag 
    xmlTag = dom.getElementsByTagName('title')[0].toxml() 
    #strip off the tag to just reveal event name 
    xmlData = xmlTag.replace('<title>', '').replace('</title>', '') 
    #print out the xml tag and data in this format: 
    print xmlTag 
    #just print the data 
    print xmlData 

app = webapp2.WSGIApplication([('/', XMLParser), 
          ], 
          debug=True) 

下一步該怎麼做任何指導,將不勝感激或者你能發現什麼東西不對我的Python代碼,謝謝!

回答

1

這應該可以解決您的進口

import xml.dom.minidom as mdom 

和解析的原始帶:

dom = mdom.parseString(data) 

至於你會想看到從返回的childNodes和數據元數據操作parseString。

如:

for element in dom.getElementsByTagName('title')[0].childnodes: 
    print element.data 

要看到結構一旦其被解析。

+0

謝謝@Josh這似乎已經解決了,我似乎與我的代碼,但現在有錯誤我在GAE中運行時遇到了查看問題的問題(我看到的只是一個空白屏幕)。對不起,如果這看起來像一個非常愚蠢的問題 - 我必須重申,我是一個完全初學者!我只想看看它是如何解析XML文件。我會在上面的主要問題中更新我的代碼!(這是現在減去錯誤) – Karen

+0

嗨@Josh我意識到,我看到一個空白屏幕的原因是,我不得不檢查我的返回結果的錯誤控制檯(再次抱歉愚蠢的問題)。我現在正在爲由xml(包含星形元素)返回的此類文本的編碼產生新的錯誤。我已經打開了一個新的stackoverflow問題來解決這個問題,但我已經選擇了你的答案作爲我在這裏提出的問題的解決方案!謝謝。新問題:http://stackoverflow.com/questions/16026594/unicode-encoding-errors-python-parsing-xml-cant-encode-a-character-star?noredirect=1#comment22863206_16026594 – Karen

2

Appengine支持​​這是非常簡單的include它並解析您的文檔。

在您的app.yaml文件

libraries: 
    - name: lxml 
    - version: latest 

,然後import lxml並按照parsing說明