屏幕抓取Twitter頁面使用Unicode平等比較失敗的Python

我用下面的代碼來獲取用戶的追隨者的Twitter列表：屏幕抓取Twitter頁面使用Unicode平等比較失敗的Python

import urllib 
from BeautifulSoup import BeautifulSoup 

#code only looks at one page of followers instead of continuing to all of a user's followers 
#decided to only use a small sample 

site = "http://mobile.twitter.com/NYTimesKrugman/following" 
friends = set() 
response = urllib.urlopen(site) 
html = response.read() 
soup = BeautifulSoup(html) 
names = soup.findAll('a', {'href': True}) 
for name in names: 
    a = name.renderContents() 
    b = a.lower() 
    if ("http://mobile.twitter.com/" + b) == name['href']: 
     c = str (b) 
     friends.add(c) 

for friend in friends: 
    print friend 
print ("Done!")

不過，我得到以下結果：

NYTimeskrugman 
nytimesphoto 
rasermus 

Warning (from warnings module): 
    File "C:\Users\Public\Documents\Columbia Job\Python Crawler\Twitter  Crawler\crawlerversion14.py", line 42 
    if ("http://mobile.twitter.com/" + b) == name['href']: 
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal 
amnesty_norge 
zynne_ 
fredssenteret 
oljestudentene 
solistkoret

....（因此它繼續）

這似乎是我能夠獲得大部分以下的名稱，但我收到了一個有點隨機的錯誤。它並沒有阻止代碼完成，但是......我希望有人能夠告訴我發生了什麼？

來源

2011-08-19 snehoozle

此警告是因爲您試圖將一個（非ascii）字符串與一個unicode字符串進行比較，而且它不知道如何將字符串解碼爲ascii。但是，實際上，無論如何，你應該只是使用一個庫來詢問twitter。請參閱https://dev.twitter.com/docs/twitter-libraries#python –

'u「http://mobile.twitter.com/」' – leoluk

不知道我的答案几年後會有用，但我使用請求而不是urllib重寫了您的代碼。

我認爲最好是讓與其他類「用戶名」選擇只考慮追隨者的名字！

這裏的東西：

import requests 
from bs4 import BeautifulSoup 

site = "http://mobile.twitter.com/paulkrugman/followers" 
friends = set() 
response = requests.get(site) 
soup = BeautifulSoup(response.text) 
names = soup.findAll('a', {'href': True}) 
for name in names: 
    pseudo = name.find("span", {"class": "username"}) 
    if pseudo: 
     pseudo = pseudo.get_text() 
     friends.add(pseudo) 

for friend in friends: 
    print (friend) 
print("Done !")

@paulkrugman出現在每一集，所以不要忘記刪除它！

來源

2016-11-02 18:12:04 Raphadasilva

屏幕抓取Twitter頁面使用Unicode平等比較失敗的Python

回答

相關問題