目前我正在使用Python 3.4.3和MongoDB作爲技術進行POC工作。如何使用Python和MongoDB從隨機URL中讀取數據?
我需要www.socialmention.com網站以搜索任何字符串像「財經」或者「蘋果季度業績」等。結果將是多個URL,並且將是隨機的。現在我需要解析每個鏈接並閱讀文章,評論,喜歡,用戶詳細信息等。
直到現在,我成功地從socialmention中捕獲隨機鏈接URL,然後我的想法是創建一個博客字典MongoDB的和維護信息如下圖所示:
> db.blogs_dictionary.find().pretty()
{
"_id" : ObjectId("55401455a1ce265d58f21049"),
"blog_name" : "www.networkcomputing.com",
"article" : "yes",
"article_tag" : "div",
"article_tag_type" : "id",
"article_string" : "article-main",
"article_multipage" : "yes",
"article_multipage_tag" : "span",
"article_multipage_tag_type" : "class",
"article_multipage_tag_string" : "blue strong allcaps",
"article_multipage_query_variable" : "page_number",
"comments" : "yes",
"comments_multipage" : "no",
"comments_multipage_tag" : "",
"comments_multipage_tag_type" : "",
"comments_multipage_tag_string" : "",
"comments_threaded" : "yes",
"comments_threaded_query_variable" : "piddl_msgorder",
"comments_threaded_query_value" : "thrd#msgs",
"comments_main" : "yes",
"comments_main_tag" : "div",
"comments_main_tag_type" : "class",
"comments_main_tag_string" : "comments-main",
"user_name" : "yes",
"user_name_tag" : "span",
"user_name_tag_type" : "class",
"user_name_tag_string" : "smaller strong black",
"user_rank" : "yes",
"user_rank_tag" : "span",
"user_rank_tag_type" : "class",
"user_rank_tag_string" : "smaller black",
"comments_body" : "yes",
"comments_body_tag" : "div",
"comments_body_tag_type" : "class",
"comments_body_tag_string" : "comment-body"
}
然後在Python代碼使用的一些東西一樣......如果從socialmention網站上的鏈接有在我的博客dictonary ......然後檢查文章和評論是否存在..如果存在,則通過URL打開URL並閱讀所需的內容....但是爲了實現這一切,我需要傳遞標籤並動態搜索字符串
for i in db.social_mention.find({},{"blog_name":1,"_id":0}):
for j in db.blogs_dictionary.find({},{"blog_name":1,"_id":0}):
if i['blog_name']==j['blog_name']:
link=db.social_mention.find_one({"blog_name":i['blog_name']},{"link":1,"_id":0})
url=link['link']
print (url)
if (db.blogs_dictionary.find({"blog_name":j['blog_name']},{"article":1,"_id":0})) == "yes":
article_variables=db.blogs_dictionary.find({"blog_name":j['blog_name']},{"article":1,"article_tag":1,"article_tag_type":1,"article_string":1,"article_multi":1,"article_multipage_tag":1,"article_multipage_tag_type":1,"article_multipage_tag_string":1,"article_multipage_query_variable":1,"_id":0}).pretty()
soup = BeautifulSoup(urllib.request.urlopen(url))
data=soup.find(article_variables['article_tag'],article_variables['article_tag_type']=article_variables['article_string'])
print (data.text)
但我得到像關鍵字不能是表達式的錯誤。有沒有其他的方式來做到這一點,或者我應該改變我的設計?
確切的錯誤是什麼? – skyline75489