2011-08-19 37 views
0

我用下面的代碼來獲取用戶的追隨者的Twitter列表:屏幕抓取Twitter頁面使用Unicode平等比較失敗的Python

import urllib 
from BeautifulSoup import BeautifulSoup 

#code only looks at one page of followers instead of continuing to all of a user's followers 
#decided to only use a small sample 

site = "http://mobile.twitter.com/NYTimesKrugman/following" 
friends = set() 
response = urllib.urlopen(site) 
html = response.read() 
soup = BeautifulSoup(html) 
names = soup.findAll('a', {'href': True}) 
for name in names: 
    a = name.renderContents() 
    b = a.lower() 
    if ("http://mobile.twitter.com/" + b) == name['href']: 
     c = str (b) 
     friends.add(c) 

for friend in friends: 
    print friend 
print ("Done!") 

不過,我得到以下結果:

NYTimeskrugman 
nytimesphoto 
rasermus 

Warning (from warnings module): 
    File "C:\Users\Public\Documents\Columbia Job\Python Crawler\Twitter  Crawler\crawlerversion14.py", line 42 
    if ("http://mobile.twitter.com/" + b) == name['href']: 
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal 
amnesty_norge 
zynne_ 
fredssenteret 
oljestudentene 
solistkoret 

....(因此它繼續)

這似乎是我能夠獲得大部分以下的名稱,但我收到了一個有點隨機的錯誤。它並沒有阻止代碼完成,但是......我希望有人能夠告訴我發生了什麼?

+1

此警告是因爲您試圖將一個(非ascii)字符串與一個unicode字符串進行比較,而且它不知道如何將字符串解碼爲ascii。但是,實際上,無論如何,你應該只是使用一個庫來詢問twitter。請參閱https://dev.twitter.com/docs/twitter-libraries#python –

+1

'u「http://mobile.twitter.com/」' – leoluk

回答

0

不知道我的答案几年後會有用,但我使用請求而不是urllib重寫了您的代碼。

我認爲最好是讓與其他類「用戶名」選擇只考慮追隨者的名字!

這裏的東西:

import requests 
from bs4 import BeautifulSoup 

site = "http://mobile.twitter.com/paulkrugman/followers" 
friends = set() 
response = requests.get(site) 
soup = BeautifulSoup(response.text) 
names = soup.findAll('a', {'href': True}) 
for name in names: 
    pseudo = name.find("span", {"class": "username"}) 
    if pseudo: 
     pseudo = pseudo.get_text() 
     friends.add(pseudo) 

for friend in friends: 
    print (friend) 
print("Done !") 

@paulkrugman出現在每一集,所以不要忘記刪除它!