問題與非UTF-8和ASCII字符的Twitter包中的R

在前面的問題我問了一下從國土Twitter的下載大量的Twitter追隨者（和它們的位置，創建日期，追隨者的數量等）（@haaretzcom）使用R中的twitteR軟件包（請參閱Work around rate limit for extracting large list of user information using twitteR package in R）。 Twitter資訊提供超過90,000名粉絲，使用下面的代碼，我可以下載完整的粉絲列表。問題與非UTF-8和ASCII字符的Twitter包中的R

require(twitteR) 
    require(ROAuth) 
    #Loading the Twitter OAuthorization 
    load("~/Dropbox/Twitter/my_oauth") 

    #Confirming the OAuth 
    registerTwitterOAuth(my_oauth) 

    # opening list to download 
    haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999) 

    for (follower in haaretz_followers){ 
    Sys.sleep(5) 
    haaretz_followers_info<-lookupUsers(haaretz_followers) 

    haaretz_followers_full<-twListToDF(haaretz_followers_info) 

    #Export data to csv 
    write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv", sep=",") 
}

該代碼在提取許多用戶時起作用。但是，每當我打了一個特定的用戶，我得到以下錯誤：

Error in twFromJSON(out) : 
RMate stopped at line 51 
Error: Malformed response from server, was not JSON. 
RMate stopped at line 51 
The most likely cause of this error is Twitter returning a character which 
can't be properly parsed by R. Generally the only remedy is to wait long 
enough for the offending character to disappear from searches (e.g. if 
using searchTwitter()). 
Calls: twListToDF ... lookupUsers -> lapply -> FUN -> <Anonymous> -> twFromJSON 
Execution halted

即使我加載Twitter的包後RJSONIO包，我遇到了這個問題。通過一些研究，看起來twitteR和RJSONIO包在解析非UTF-8或ASCII字符（阿拉伯等）http://lists.hexdump.org/pipermail/twitter-users-hexdump.org/2013-May/000335.html時存在問題。有沒有辦法簡單地忽略我的代碼中的非UTF-8或ASCII，並且仍然提取所有跟隨者信息？任何幫助將非常感激。

來源

2013-05-15 Thomas

你能夠抓住鳴叫，它只是解析失敗，或者你甚至不能下載的鳴叫？如果前者，你可以使用'readLines'，然後分出違規字符 –

@RicardoSaporta不幸的是，它甚至不會讓我下載推文。當涉及違規的用戶信息時，循環會中斷。 – Thomas

@Thomas：仍然沒有答案？每當我嘗試用twitteR做任何事情時，我都會遇到這種情況... – Heisenberg

有一個包更新（1.1.7），解決了這個問題。見：https://github.com/geoffjentry/twitteR/blob/master/NEWS

來源

2013-08-17 15:48:38 SPi

問題與非UTF-8和ASCII字符的Twitter包中的R

回答

相關問題