2011-11-15 37 views
6

我一直試圖在google應用程序引擎中使用html5lib和python 2.7上的lxml。但是,當我運行下面的代碼時,它給了我一個錯誤,說「NameError:全局名稱'etree'未定義」。是不是可以在谷歌應用程序引擎上使用lxml.etree?或者我錯過了什麼?Google App Engine上的Python 2.7無法使用lxml.etree

的app.yaml

application: testsite 
version: 1 
runtime: python27 
api_version: 1 
threadsafe: false 

handlers: 
- url: /.* 
    script: index.py 

libraries: 
- name: lxml 
    version: "2.3" # I thought this would allow me to use lxml.etree 

index.py

from testhandler import TestHandler 
application = webapp.WSGIApplication([('/', TestHandler)], debug=True) 

testhandler.py

import urllib2 
import html5lib 
from html5lib import treebuilders 
try: 
    from lxml import etree 
    print("running with lxml.etree") 
except ImportError: 
    try: 
     # Python 2.5 
     import xml.etree.cElementTree as etree 
     print("running with cElementTree on Python 2.5+") 
    except ImportError: 
     try: 
      # Python 2.5 
      import xml.etree.ElementTree as etree 
      print("running with ElementTree on Python 2.5+") 
     except ImportError: 
      try: 
       # normal cElementTree install 
       import cElementTree as etree 
       print("running with cElementTree") 
      except ImportError: 
       try: 
        # normal ElementTree install 
        import elementtree.ElementTree as etree 
        print("running with ElementTree") 
       except ImportError: 
        print("Failed to import ElementTree from any known place") 

from google.appengine.ext import webapp 

class TestHandler(webapp.RequestHandler): 
    def get(self): 
     f = urllib2.urlopen("http://www.yahoo.com/").read() 
     doc = html5lib.parse(f, treebuilder='lxml') 
     elems = doc.xpath("//*[local-name() = 'a']") 
     self.response.out.write(len(elems)) 

錯誤

running with cElementTree on Python 2.5+ 
Status: 500 Internal Server Error 
Content-Type: text/html; charset=utf-8 
Cache-Control: no-cache 
Expires: Fri, 01 Jan 1990 00:00:00 GMT 
Content-Length: 769 

<pre>Traceback (most recent call last): 
    File &quot;/usr/local/bin/google_appengine/google/appengine/ext/webapp/_webapp25.py&quot;,  line 701, in __call__ 
handler.get(*groups) 
    File &quot;/home/test/testhandler.py&quot;, line 38, in get 
    parser = html5lib.HTMLParser(tree= treebuilders.getTreeBuilder('lxml')) 
    File &quot;/home/test/html5lib/html5parser.py&quot;, line 68, in __init__ 
    self.tree = tree(namespaceHTMLElements) 
    File &quot;/home/test/html5lib/treebuilders/etree_lxml.py&quot;, line 176, in __init__ 
    builder = etree_builders.getETreeModule(etree, fullTree=fullTree) 
NameError: global name 'etree' is not defined 
</pre> 

ADD

那麼,我嘗試了幾種方法來創建一個doc對象,但沒有運氣。其中一種方法,我試圖導入from lxml.html import document_fromstring,這給了我這個錯誤。

Traceback (most recent call last): 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4143, in _HandleRequest 
    self._Dispatch(dispatcher, self.rfile, outfile, env_dict) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4049, in _Dispatch 
    base_env_dict=env_dict) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 616, in Dispatch 
    base_env_dict=base_env_dict) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3120, in Dispatch 
    self._module_dict) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3024, in ExecuteCGI 
    reset_modules = exec_script(handler_path, cgi_path, hook) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2887, in ExecuteOrImportScript 
    exec module_code in script_module.__dict__ 
    File "/home/yoo/eclipse_workspace/website_checker/src/index.py", line 5, in <module> 
    from handlers.updatecheck import UpdateCheckHandler 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate 
    return func(self, *args, **kwargs) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module 
    return self.FindAndLoadModule(submodule, fullname, search_path) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate 
    return func(self, *args, **kwargs) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule 
    description) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate 
    return func(self, *args, **kwargs) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted 
    description) 
    File "/home/test/updatecheck.py", line 4, in <module> 
    from lxml.html import document_fromstring 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate 
    return func(self, *args, **kwargs) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module 
    return self.FindAndLoadModule(submodule, fullname, search_path) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate 
    return func(self, *args, **kwargs) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule 
    description) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate 
    return func(self, *args, **kwargs) 
    File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted 
    description) 
    File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 12, in <module> 
    from lxml import etree 
ImportError: cannot import name etree 

根據錯誤,似乎app引擎不允許我加載etree模塊出於某種原因。我想用lxml使用xpath,但我不能花太多時間來弄清楚這裏發生了什麼,也沒有足夠的python知識。所以我會試着找到一個'simpletree'版本的方式。

f = urllib2.urlopen("http://www.yahoo.com/").read() 
p = html5lib.HTMLParser() 
doc = p.parse(f) 
# do something with doc.childNodes 
self.response.out.write(len(doc.childNodes)) 

不是一個很好的方法,但至少它在我在現場谷歌應用程序引擎上測試時工作。

+0

什麼版本的HTML5lib?在回購中,包含錯誤的行不再是行176,並且我無法看到當前版本中可能發生錯誤的任何方式,因爲該名稱將被定義,或者整個事件將因導入錯誤而失敗。 – geoffspear

+0

很抱歉沒有及時回覆您。根據第13行的html5lib/__ init__.py,我認爲版本爲0.90 __version__ =「0.90」'。我剛剛通過pip安裝了庫,可能它是舊版本? –

+0

我得到這個錯誤,當我忘了把正確的條目在app.yaml,而不是使用2.3我用最新的 – semisided1

回答

1

你在本地安裝了lxml嗎?我之前有同樣的錯誤 - 導入失敗。你可以在這裏下載lxml:http://pypi.python.org/pypi/lxml/

lxml與GAE一起工作,這很棒。但是現在它真的沒有任何文檔或例子。

+0

是的。我嘗試了我的本地機器上的原始代碼,它的工作完美,但是當我上傳它到谷歌應用程序引擎,然後他們給我上面的錯誤。 –

0

你testhandler

1

的頂部在Windows上嘗試

import lxml

,我有這個問題,這是由於在python27發行不包括LXML。你可以使用腳本easy_install,但是你必須編譯給我帶來麻煩的源代碼。

使用這個職位,我在谷歌論壇中發現:

https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.python/Q8YeOIbn5Ds

但是,如果你想保存自己的痛苦的努力得到它從源代碼編譯,只需安裝預編譯的二進制文件,比如一個可用從: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

只需從上述網站下載可執行文件並運行* .exe,它阻止所有必要的代碼。

0

用點安裝:pip install lxml

相關問題