在R的查找Twitter追隨者

我想查看使用R（追隨者> 100000）的用戶的Twitter追隨者的個人資料。雖然twitteR是一個很好的軟件包，但在處理高層次的追隨者時會遇到問題，因爲需要實施睡眠例程以避免超出費率限制。我在這裏是一個相對的新手，並想知道如何循環訪問ID對象，以批處理方式輸入100（因爲這是Twitter API一次可以處理的最大值）？在R的查找Twitter追隨者

編輯：代碼添加特（Twitter）庫（plyr） maxTwitterIds = 100 休眠時間= 500秒＃

user<-getUser("[username]") 
followers<-zz$getFollowerIDs() 
ids_matrix = matrix(zz, nrow = maxTwitterIds, ncol = length(zz)/maxTwitterIds) 
followers<-zz$getFollowerIDs() 
#note: for smaller lists of followers it is possible to use the command "lookupUsers(zz)  at this point 
foll<-getTwitterInfoForListIds = function(id_list) { 
    return(lapply(id_list, 

names <- sapply(foll,name) 
sn<sapply(foll,screenName) 
id<-sapply(foll,id) 
verified<-sapply(foll,erified) 
created<-sapply(foll,created) 
statuses<-sapply(foll,statusesCount) 
follower<-sapply(foll,followersCount) 
friends<-sapply(foll,friendsCount) 
favorites<-sapply(foll,favoritesCount) 
location<-sapply(foll,location) 
url<-sapply(foll,url) 
description<-sapply(foll,description) 
last_status<-sapply(foll,lastStatus))) 
} 
alldata = alply(, 2, function(id_set) { 
    info = getTwitterInfoForListIds(id_set) 
    Sys.sleep(sleeptime) 
    return(info) 
})

來源

2012-02-08 Mike Jensen

我想在http://cran.r-project.org/web/packages/twitteR/twitteR.pdf的第6頁，你可以找到很好的信息。 – aatrujillob 2012-02-08 15:04:07

是的，問題是，當處理大量的追隨者列表時，你很快超過了速率限制，所以我正在尋找一種方法將ID塊分成100個批次並在Sys.sleep之後運行。 – 2012-02-08 19:32:29

我首先告訴我沒有使用的TWITTER包開始。因此，我只能爲您提供一些僞代碼，告訴您如何執行此操作的結構。這應該讓你開始。

library(plyr) 

# Some constants 
maxTwitterIds = 100 
sleeptime = 1 # sec 

# Get the id's of the twitter followers of person X  
ids = getTwitterFollowers("x") # I'll use ids = 1:1000 
ids_matrix = matrix(ids, nrow = maxTwitterIds, 
         ncol = length(ids)/maxTwitterIds) 

getTwitterInfoForListIds = function(id_list) { 
    return(lapply(id_list, getTwitterInfo)) 
} 

# Find the information you need from each id 
alldata = alply(ids_matrix, 2, function(id_set) { 
    info = getTwitterInfoForListIds(id_set) 
    Sys.sleep(sleeptime) 
    return(info) 
})

也許你擺脫這種數據結構的需要一些拋光（它是一個嵌套列表），但沒有關於要從這是很難說的Twitter賬戶中提取的信息。

來源

2012-02-08 12:45:22

感謝您的快速回復。代替「getTwitterInfo」，我放置了由twitteR軟件包（getName，getLocation ...）定義的各種參數。但是，它返回錯誤「Splitter_a（.data，.margins，.expand）中的錯誤：無效邊距」。是否有一個讀取功能需要輸入100個ID的批量數據？ – 2012-02-08 19:08:59

請提供一個可重現的例子，這種方式很難提供建議。 – 2012-02-08 19:25:27

我不知道以上是否足夠清楚（我懷疑我在零件上有點模糊）。但另一個想法可能是使用RCurl直接訪問API：使用上面創建的矩陣上的readLines函數，一次將批量ID粘貼到URL 100。迄今爲止，我已經擺弄不成功。會有人有一些想法？ – 2012-02-10 08:18:06

在R的查找Twitter追隨者

回答

相關問題