編輯2:已解決!請參閱下面有關正確導入的答案from lib.bs4 import BeautifulSoup
而不僅僅是from bs4 import BeautifulSoup
使用BeautifulSoup的Appengine上的Python ImportError:沒有名爲bs4的模塊
編輯:把bs4放在項目的根目錄似乎可以解決問題;然而,它不是一個理想的結構。所以,我正在離開這個問題,嘗試並獲得更強大的解決方案。
過去曾提出過這個問題的一個變種,但那裏的解決方案似乎並不奏效。說實話,我不確定這是因爲BeautifulSoup還是Appengine的變化。
見:Python 2.7 : How to use BeautifulSoup in Google App Engine?,How to include third party Python libraries in Google App Engine?,和Which version of BeautifulSoup works with GAE (python 2.5)?
通過Lipis提出的解決方案似乎是添加第三方庫的項目,然後添加以下到主應用程序的根目錄的libs文件夾:
import sys
sys.path.insert(0, 'libs')
目前,我的結構是這樣的:
ntj-test
├── lib
│ └── bs4
├── templates
├── main.py
├── get_data.py
└── app.yaml
這裏是我的app.yaml:
application: ntj-test
version: 1
runtime: python27
api_version: 1
threadsafe: yes
handlers:
- url: /favicon\.ico
static_files: favicon.ico
upload: favicon\.ico
- url: .*
script: main.app
libraries:
- name: webapp2
version: latest
- name: jinja2
version: latest
這裏是我的main.py:
import webapp2
import jinja2
import get_data
import sys
sys.path.insert(0, 'lib')
JINJA_ENVIRONMENT = jinja2.Environment(
loader=jinja2.FileSystemLoader('templates'),
extensions=['jinja2.ext.autoescape'],
autoescape=True,
)
class MainHandler(webapp2.RequestHandler):
def get(self):
teamName = get_data.all_coach_data()[1]
coachName = get_data.all_coach_data()[2]
teamKey = get_data.all_coach_data()[0]
values = {
'coachName': coachName,
'teamName': teamName,
'teamKey': teamKey,
}
template = JINJA_ENVIRONMENT.get_template('index.html')
self.response.write(template.render(values))
app = webapp2.WSGIApplication([
('/', MainHandler)
], debug=True)
get_data.py返回正確的數據,以我的變量用於填充值,這是我在調試器中已經驗證。
在我的開發環境中啓動main.py時,問題出現了(我還沒有上傳到gcloud)。如果沒有失敗,無論是漂亮的招數我已經通過上面的鏈接或在我的谷歌搜索發現,終端總是返回:
Import Error: No module named bs4
從上面的SO鏈接之一,一個評論者說,只有「GAE支持純Python模塊bs4並不純粹,因爲有些部分是用C編寫的。「我不確定這是否屬實,我不確定如何驗證它。我沒有足夠的評價來評論。 :(
我已經通過了Crummy的網站上的bs4文檔,我已經閱讀了所有相關的SO問題和答案,並且我試圖從Appengine的文檔中收集提示。但是,我一直無法找到解決方案不涉及使用已棄用版本的BeautifulSoup,它不具備我需要的功能。
我是初學者編程和使用StackOverflow,所以如果我遺漏了一些重要的信息或不如果您有任何疑問,請通知我,我會在必要時編輯並添加其他信息。
謝謝!
個EDITS: 我不知道,如果GET_DATA代碼將是矯枉過正,但在這裏它是:
from bs4 import BeautifulSoup
import urllib2, re
teamKeys = {
'ATL': 'Atlanta Falcons',
'HOU': 'Houston Texans',
}
def get_all_coaches():
for key in teamKeys:
page = urllib2.urlopen("http://www.nfl.com/teams/coaches?coaType=head&team=" + key)
soup = BeautifulSoup(page)
return(head_coach(soup))
def head_coach(soup):
head = soup.select('.coachprofiletext p')[0].text
position, name = re.split(': ', head)
return name
def export_coach_data():
testList = []
for key in teamKeys:
page = urllib2.urlopen("http://www.nfl.com/teams/coaches?coaType=head&team=" + key)
soup = BeautifulSoup(page)
teamKey = key
teamName = teamKeys[key]
headCoach = head_coach(soup)
t = [
teamKey,
teamName,
str(headCoach),
]
testList.append(t)
return(testList)
def all_coach_data():
results = data.export_coach_data()
ATL = results[0]
HOU = results[1]
return ATL
我想指出,這可能與執行力差散落(我只是在業餘時間內認真開發了幾個月),但它確實將正確的值返回給我的主要變量。
這裏是AppEngine上啓動日誌:
2014-11-05 15:36:53 Running command: "['C:\\Python27\\pythonw.exe', 'C:\\Program Files\\Google\\Cloud SDK\\google-cloud-sdk\\platform\\google_appengine\\dev_appserver.py', '--skip_sdk_update_check=yes', '--port=11080', '--admin_port=8003', u'G:\\projects\\coaches']"
INFO 2014-11-05 15:37:00,119 devappserver2.py:725] Skipping SDK update check.
WARNING 2014-11-05 15:37:00,157 api_server.py:383] Could not initialize images API; you are likely missing the Python "PIL" module.
INFO 2014-11-05 15:37:00,190 api_server.py:171] Starting API server at: http://localhost:19713
INFO 2014-11-05 15:37:00,210 dispatcher.py:183] Starting module "default" running at: http://localhost:11080
INFO 2014-11-05 15:37:00,216 admin_server.py:117] Starting admin server at: http://localhost:8003
ERROR 2014-11-05 20:37:48,726 wsgi.py:262]
Traceback (most recent call last):
File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 239, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 298, in _LoadHandler
handler, path, err = LoadObject(self._handler)
File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 84, in LoadObject
obj = __import__(path[0])
File "G:\projects\coaches\main.py", line 3, in <module>
import get_data
File "G:\projects\coaches\get_data.py", line 1, in <module>
from bs4 import BeautifulSoup
ImportError: No module named bs4
INFO 2014-11-05 15:37:48,762 module.py:652] default: "GET/HTTP/1.1" 500 -
你最近怎麼進口? 「從BS4導入BeautifulSoup」? – 2014-11-05 20:07:22
您可以手動解析HTML或通過查找模式提取要查找的內容。 – Ryan 2014-11-05 20:08:23
您正在執行導入的代碼在哪裏? – 2014-11-05 20:10:05