使用r計算文本中n-gram的頻率

-1

我使用R來讀取文本。的通道由100句，然後將其放入一個列表，該列表是這樣的：使用r計算文本中n-gram的頻率

[[1]] 

[1] "WigWagCo: For #TBT here's a video of Travis McCollum (Co-Founder and COO of WigWag) at #SXSW2016 

[[2]] 

[1] "chrisreedfilm: RT @hammertonail: #SXSW2016 doc THE SEER: A PORTRAIT OF WENDELL BERRY gets reviewed by @chrisreedfilm 

[[3]] 

[1] "iamscottrandell: RT @therevue: Take a jaunt down #MemoriesofSXSW &amp; read the stories of @JRNelsonMusic @thegillsmusic &amp; @TheBlancosMusic 
... 
... 

[[99]] 

[1] "SunPowerTalent: SunPower #Clerical #Job: Supply Chain Intern (#Austin, TX) 

[[100]] 

[1] "SunPowerTalent: #Finance #Job alert: General Ledger Accountant | SunPower

列表中的每個對象都是從文字相同的「句子」。如何計算本文中所有3-gram的頻率並知道哪個句子是每個3-gram？

非常感謝。

來源

2016-04-12 Paul

您可以使用R包textcat（https://CRAN.R-project.org/package=textcat）。如果你的100句的列表被稱爲x你根本：

library("textcat") 
n3gram <- textcat_profile_db(x, n = 3)

然後，這是含頻率排序3克100元（相當於原來的句子）的列表。有關更多詳細信息和示例，請參見?textcat_profile_db。

來源

2016-04-12 10:49:31

使用r計算文本中n-gram的頻率

回答

相關問題