2017-01-15 46 views
0

我很好奇如何訪問與邊相關的圖的其他屬性。沿着這裏遵循的是一個小例子:R中的子圖文本分析(igraph)

library("igraph") 
library("SocialMediaLab") 

myapikey ='' 
myapisecret ='' 
myaccesstoken = '' 
myaccesstokensecret = '' 

tweets <- Authenticate("twitter", 
         apiKey = myapikey, 
         apiSecret = myapisecret, 
         accessToken = myaccesstoken, 
         accessTokenSecret = myaccesstokensecret) %>% 
Collect(searchTerm="#trump", numTweets = 100,writeToFile=FALSE,verbose=TRUE) 
g_twitter_actor <- tweets %>% Create("Actor", writeToFile=FALSE) 
c <- igraph::components(g_twitter_actor, mode = 'weak') 
subCluster <- induced.subgraph(g_twitter_actor, V(g_twitter_actor)[which(c$membership == which.max(c$csize))]) 

最初的鳴叫包含以下各列

colnames(tweets) 
[1] "text"   "favorited"  "favoriteCount" "replyToSN"  "created_at"  "truncated"  "replyToSID"  "id"    
[9] "replyToUID"  "statusSource" "screen_name"  "retweetCount" "isRetweet"  "retweeted"  "longitude"  "latitude"  
[17] "from_user"  "reply_to"  "users_mentioned" "retweet_from" "hashtags_used" 

如何訪問文本屬性爲子圖,以便進行文本分析? E(subCluster)$text不起作用

回答

1

E(subCluster)$text不起作用,因爲tweets$text的值在製作時未添加到圖中。所以你必須手動完成。這有點痛苦,但可行。需要對數據框進行一些子集化處理,並根據用戶名進行匹配。

首先,請注意,邊緣類型按特定順序排列:轉推,提及,回覆。來自特定用戶的相同文本可以應用於所有這三種。所以我認爲順序添加文本是有意義的。

> unique(E(g_twitter_actor)$edgeType) 
[1] "Retweet" "Mention" "Reply" 

使用dplryreshape2,使這更容易。

library(reshape2); library(dplyr) 
#Make data frame for retweets, mentions, replies 
rts <- tweets %>% filter(!is.na(retweet_from)) 
ms <- tweets %>% filter(users_mentioned!="character(0)") 
rpls <- tweets %>% filter(!is.na(reply_to)) 

由於users_mentioned可以包含個人清單,我們必須將其列入清單。但我們希望將提及的用戶與提及他們的用戶相關聯。

#Name each element in the users_mentioned list after the user who mentioned 
names(ms$users_mentioned) <- ms$screen_name 
ms <- melt(ms$users_mentioned) #melting creates a data frame for each user and the users they mention 

#Add the text 
ms$text <- tweets[match(ms$L1,tweets$screen_name),1] 

現在通過匹配邊緣類型,將其中的每一個添加到網絡中作爲邊緣屬性。

E(g_twitter_actor)$text[E(g_twitter_actor)$edgeType %in% "Retweet"] <- rts$text 
E(g_twitter_actor)$text[E(g_twitter_actor)$edgeType %in% "Mention"] <- ms$text 
E(g_twitter_actor)$text[E(g_twitter_actor)$edgeType %in% "Reply"] <- rpls$text 

現在你可以子集並獲得文本的邊緣值。

subCluster <- induced.subgraph(g_twitter_actor, 
          V(g_twitter_actor)[which(c$membership == which.max(c$csize))]) 
+0

非常感謝你知道我怎麼能解決http://stackoverflow.com/questions/41664769/igraph-get-ids-of-connected-components以及? –