我有一個數據框,其中包含推文,創建日期,推文ID,最愛和推特計數。我想創建一個語料庫,其中包含每個文檔的最愛和推特計數作爲變量。我也想通過tweet id識別文檔,而不是隨機文檔001 etc id。創建tm語料庫,其中包含來自數據框的文本(tweet)屬性
我開始與下面的數據...查看下面的代碼休息
id
1: 737243856144629760
2: 737242308261842945
3: 737242189055594496
4: 737242018687164416
5: 737241411465170944
6: 737239685295181824
text
1: Have a great Memorial Day and remember that we will soon MAKE AMERICA GREAT AGAIN!
2: "@NBCDFW: Trump rallies veterans at annual Rolling Thunder Gathering https://twitter.com/b08FcMlgkr https://twitter.com/RCDeLvHQqD"
3: "@FrankyLamouche: how many of donald's rolling thunder brigade will sign up and go to war for him in the middle east."
4: "@MariaErnandez3b: Trump Supports Rolling Thunder Rally #TRUMP STRONG https://twitter.com/pfVXQ8NdZu" So true, and remember the M.I.A.'s!
5: "@ScottWRasmussen: Donald Trump and Bikers Share Affection at Rolling Thunder Rally https://twitter.com/ZZl2sc29dn" A great day in D.C.!
6: "@TeaPartyNevada: #Trump2016 "Illegals are taken care of better than our veterans." https://twitter.com/KKIgM4rNma https://twitter.com/1cEZ8wG7Cy"
favorited favoritwitter.comunt replyToSN created truncated replyToSID replyToUID
1: FALSE 25944 NA 2016-05-30 11:26:47 FALSE NA NA
2: FALSE 9268 NA 2016-05-30 11:20:38 FALSE NA NA
3: FALSE 6739 NA 2016-05-30 11:20:09 FALSE NA NA
4: FALSE 15417 NA 2016-05-30 11:19:29 FALSE NA NA
5: FALSE 7192 NA 2016-05-30 11:17:04 FALSE NA NA
6: FALSE 9834 NA 2016-05-30 11:10:12 FALSE NA NA
statusSource screenName retweetCount
1: <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> realDonaldTrump 9455
2: <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> realDonaldTrump 2744
3: <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> realDonaldTrump 1604
4: <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> realDonaldTrump 4237
5: <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> realDonaldTrump 2148
6: <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> realDonaldTrump 3545
isRetweet retweeted longitude latitude
1: FALSE FALSE NA NA
2: FALSE FALSE NA NA
3: FALSE FALSE NA NA
4: FALSE FALSE NA NA
5: FALSE FALSE NA NA
6: FALSE FALSE NA NA
cleantxt
1: have a great memorial day and remember that we will soon make america great again!
2: "@nbcdfw: trump rallies veterans at annual rolling thunder gathering https://twitter.com/b08fcmlgkr https://twitter.com/rcdelvhqqd"
3: "@frankylamouche: how many of donald's rolling thunder brigade will sign up and go to war for him in the middle east."
4: "@mariaernandez3b: trump supports rolling thunder rally #trump strong https://twitter.com/pfvxq8ndzu" so true, and remember the m.i.a.'s!
5: "@scottwrasmussen: donald trump and bikers share affection at rolling thunder rally https://twitter.com/zzl2sc29dn" a great day in d.c.!
6: "@teapartynevada: #trump2016 "illegals are taken care of better than our veterans." https://twitter.com/kkigm4rnma https://twitter.com/1cez8wg7cy"
我嘗試將它與
myReader <- readTabular(mapping=list(content="cleantxt", id="id", created="created", retweet="retweetCount", fav="favoriteCount"))
trumptweetsenhanced <- VCorpus(DataframeSource(trumptweets.df), readerControl=list(reader=myReader))
然而,當我轉換語料庫轉換回語料庫到一個數據框,沒有添加變量
> head(trumptweetsenhanced_dataframe.df)
docs text
1 doc 0001 great memori day rememb will soon make america great
2 doc 0002 nbcdfw trump ralli veteran annual roll thunder gather
3 doc 0003 frankylamouch mani donald roll thunder brigad will sign go war middl east
4 doc 0004 mariaernandezb trump support roll thunder ralli trump strong true rememb ms
5 doc 0005 scottwrasmussen donald trump biker share affect roll thunder ralli great day dc
6 doc 0006 teapartynevada trump illeg taken care better veteran
那麼你卡在哪裏?我在這裏看不到具體的可回答的問題。試着問一個有重點的問題。顯示您嘗試的任何代碼,並準確描述您卡住的位置。以[可重現的格式]包含示例數據(http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)。這會讓你更容易幫助你。 – MrFlick
我提供了更多信息,並將問題限制在一個特定問題上。 – idomeneus