在前面的問題我問了一下從國土Twitter的下載大量的Twitter追隨者(和它們的位置,創建日期,追隨者的數量等) (@haaretzcom)使用R中的twitteR軟件包(請參閱Work around rate limit for extracting large list of user information using twitteR package in R)。 Twitter資訊提供超過90,000名粉絲,使用下面的代碼,我可以下載完整的粉絲列表。問題與非UTF-8和ASCII字符的Twitter包中的R
require(twitteR)
require(ROAuth)
#Loading the Twitter OAuthorization
load("~/Dropbox/Twitter/my_oauth")
#Confirming the OAuth
registerTwitterOAuth(my_oauth)
# opening list to download
haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999)
for (follower in haaretz_followers){
Sys.sleep(5)
haaretz_followers_info<-lookupUsers(haaretz_followers)
haaretz_followers_full<-twListToDF(haaretz_followers_info)
#Export data to csv
write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv", sep=",")
}
該代碼在提取許多用戶時起作用。但是,每當我打了一個特定的用戶,我得到以下錯誤:
Error in twFromJSON(out) :
RMate stopped at line 51
Error: Malformed response from server, was not JSON.
RMate stopped at line 51
The most likely cause of this error is Twitter returning a character which
can't be properly parsed by R. Generally the only remedy is to wait long
enough for the offending character to disappear from searches (e.g. if
using searchTwitter()).
Calls: twListToDF ... lookupUsers -> lapply -> FUN -> <Anonymous> -> twFromJSON
Execution halted
即使我加載Twitter的包後RJSONIO包,我遇到了這個問題。通過一些研究,看起來twitteR和RJSONIO包在解析非UTF-8或ASCII字符(阿拉伯等)http://lists.hexdump.org/pipermail/twitter-users-hexdump.org/2013-May/000335.html時存在問題。有沒有辦法簡單地忽略我的代碼中的非UTF-8或ASCII,並且仍然提取所有跟隨者信息?任何幫助將非常感激。
你能夠抓住鳴叫,它只是解析失敗,或者你甚至不能下載的鳴叫?如果前者,你可以使用'readLines',然後分出違規字符 –
@RicardoSaporta不幸的是,它甚至不會讓我下載推文。當涉及違規的用戶信息時,循環會中斷。 – Thomas
@Thomas:仍然沒有答案?每當我嘗試用twitteR做任何事情時,我都會遇到這種情況... – Heisenberg