我通過Reddit上多篇文章試圖循環,經過每一篇文章,並提取相關的頂級實體(通過篩選獲得最高關聯得分完成),然後添加到列表master_locations
:在Python中每循環迭代清空列表?
from __future__ import print_function
from alchemyapi import AlchemyAPI
import json
import urllib2
from bs4 import BeautifulSoup
alchemyapi = AlchemyAPI()
reddit_url = 'http://www.reddit.com/r/worldnews'
urls = []
locations = []
relevance = []
master_locations = []
def get_all_links(page):
html = urllib2.urlopen(page).read()
soup = BeautifulSoup(html)
for a in soup.find_all('a', 'title may-blank ', href=True):
urls.append(a['href'])
run_alchemy_entity_per_link(a['href'])
def run_alchemy_entity_per_link(articleurl):
response = alchemyapi.entities('url', articleurl)
if response['status'] == 'OK':
for entity in response['entities']:
if entity['type'] in entity == 'Country' or entity['type'] == 'Region' or entity['type'] == 'City' or entity['type'] == 'StateOrCountry' or entity['type'] == 'Continent':
if entity.get('disambiguated'):
locations.append(entity['disambiguated']['name'])
relevance.append(entity['relevance'])
else:
locations.append(entity['text'])
relevance.append(entity['relevance'])
else:
locations.append('No Location')
relevance.append('0')
max_pos = relevance.index(max(relevance)) # get nth position of the highest relevancy score
master_locations.append(locations[max_pos]) #Use n to get nth position of location and store that location name to master_locations
del locations[0] # RESET LIST
del relevance[0] # RESET LIST
else:
print('Error in entity extraction call: ', response['statusInfo'])
get_all_links('http://www.reddit.com/r/worldnews') # Gets all URLs per article, then analyzes entity
for item in master_locations:
print(item)
但我認爲出於某種原因,列表locations
和relevance
未被重置。我做錯了嗎?
印刷本的結果是:
Holland
Holland
Beirut
Beirut
Beirut
Beirut
Beirut
Beirut
Beirut
Beirut
Beirut
Beirut
Beirut
Beirut
Mogadishu
Mogadishu
Mogadishu
Mogadishu
Mogadishu
Mogadishu
Mogadishu
Mogadishu
Johor Bahru
(可能從列表中不被清除)
我已經低估了,因爲這是一段長長的代碼,大多不相關,可能已經被簡化了很多。 http://sscce.org/ – Davidmh 2014-09-06 10:05:46