2012-05-03 110 views
2

我正在使用「TwitteR」軟件包和R程序檢索推文信息。儘管Twitter的API提供轉推特定推文

retweet_count’ function(https://dev.twitter.com/docs/faq#6899) 

我無法弄清楚如何在R利用它(也許用「getURL」功能「RCurl」包?)

基本上,我在尋找辦法

  1. 特定的鳴叫已轉推

  2. R中使用流API獲取實時信息的次數如

    a。新追隨者加入這些用戶,並且

    b。當他們發佈推文或轉推,並且

    c。當他們張貼的訊息是由別人

我希望重新啾啾如果有人可以幫助我找到線索得到任何的這些信息。

回答

3

我不能幫忙解決流API問題,但基於this helpful tutorial基於this helpful tutorial的轉推工作怎麼樣。您可能可以使用它來專注於特定的推文,而不是每個用戶的轉推次數。一些posts here可能更有用。

# get package with functions for interacting with Twitter.com 
require(twitteR) 
# get 1500 tweets with #BBC tag, note that 1500 is the max, and it's subject to mysterious filtering and other restrictions by Twitter 
s <- searchTwitter('#BBC', n=1500) 
# 
# convert to data frame 
df <- do.call("rbind", lapply(s, as.data.frame)) 
# 
# Clean text of tweets 
df$text <- sapply(df$text,function(row) iconv(row,to='UTF-8')) #remove odd characters 
trim <- function (x) sub('@','',x) # remove @ symbol from user names 
# 
# Extract retweets 
library(stringr) 
df$to <- sapply(df$to,function(name) trim(name)) # pull out who msg is to 
df$rt <- sapply(df$text,function(tweet) trim(str_match(tweet,"^RT (@[[:alnum:]_]*)")[2]))  
# 
# basic analysis and visualisation of RT'd messages 
sum(!is.na(df$rt))    # see how many tweets are retweets 
sum(!is.na(df$rt))/length(df$rt) # the ratio of retweets to tweets 
countRT <- table(df$rt) 
countRT <- sort(countRT) 
countRT.subset <- subset(countRT,countRT >2) # subset those RTd at least twice 
barplot(countRT.subset,las=2,cex.names = 0.75) # plot them 
# 
# basic social network analysis using RT 
# (not requested by OP, but may be of interest...) 
rt <- data.frame(user=df$screenName, rt=df$rt) # tweeter-retweeted pairs 
rt.u <- na.omit(unique(rt)) # omit pairs with NA, get only unique pairs 
# 
# begin sna 
library(igraph) 
g <- graph.data.frame(rt.u, directed = T) 
ecount(g) # edges (connections) 
vcount(g) # vertices (nodes) 
diameter(g) # network diameter 
farthest.nodes(g) # show the farthest nodes 
+0

非常感謝!我會研究一下。 – user1371835