R：查找名稱的所有匹配項

-1

我一直在研究這一課的問題，最後得到了測驗所需的答案。對於R來說，我還不太熟悉，但是這需要幾個小時才能理解。我的任務是從叢林找到名稱Jurgis，Ona和Chicago的所有事件。R：查找名稱的所有匹配項

問題：我浪費了很多時間使用GSUB去除標點符號，但後來意識到，有些要素是兩個字：「Jurgis讀」會凝結成「Jurgisread」，不會拿起計數。然後有「Jurgis」凝聚到Ona和芝加哥市的「Jurgiss」等。

想：關於如何在將來更好地處理這些類型的文件的一些提示。

我做了什麼：我得到了開頭的兩行代碼。我使用它們附帶的空格分割元素。然後，我選擇了我想要刪除的標點符號。一旦我移除，我認爲，將是所有常見的，並用空格替換它們，再次分割元素。最後，我table（）並強迫所有的單詞都是大寫字母。

theJungle <- readLines("http://www.gutenberg.org/files/140/140.txt") 
theJungleList <- unlist(strsplit(theJungle[47:13872], " ")) 

splitJungle1<-unlist(strsplit(theJungleList, "[[:space:]]", fixed = FALSE, 
perl = FALSE, useBytes = FALSE)) 

remPunctuation<-gsub("-|'|,|:|;|\\.|\\*|\\(|\"|!|\\?"," ",splitJungle1) 

splitJungle2<-unlist(strsplit(remPunctuation, "[[:space:]]", fixed = FALSE, perl 
= FALSE, useBytes = FALSE)) 

table(toupper(splitJungle2)=="JURGIS") 
table(toupper(splitJungle2)=="ONA") 
table(toupper(splitJungle2)=="CHICAGO")

謝謝！

enter image description here

來源

2017-05-02 Melissa Perez

請參閱：爲什麼「有人能幫助我嗎？」不是一個實際的問題？（http://meta.stackoverflow.com/q/284236） – EJoshuaS

如果這是一類，你可能應該使用某些技術。如果你只是對R中的文本分析感興趣，你可以考慮使用整齊的數據原理和tidytext包。在這種工作模式下尋找單詞頻率是pretty quick thing to do。

library(dplyr) 
library(tidytext) 
library(stringr) 

theJungle <- readLines("http://www.gutenberg.org/files/140/140.txt") 
jungle_df <- data_frame(text = theJungle) %>% 
    unnest_tokens(word, text)

什麼是文本中最常見的詞？

jungle_df %>% 
    count(word, sort = TRUE) 

#> # A tibble: 10,349 × 2 
#>  word  n 
#> <chr> <int> 
#> 1 the 9114 
#> 2 and 7350 
#> 3  of 4484 
#> 4  to 4270 
#> 5  a 4217 
#> 6  he 3312 
#> 7 was 3056 
#> 8  in 2570 
#> 9  it 2318 
#> 10 had 2234 
#> # ... with 10,339 more rows

你經常看到你要找的具體名稱？

jungle_df %>% 
    count(word) %>% 
    filter(str_detect(word, "^jurgis|^ona|^chicago")) 

#> # A tibble: 6 × 2 
#>  word  n 
#>  <chr> <int> 
#> 1 chicago 68 
#> 2 chicago's  4 
#> 3 jurgis 1098 
#> 4 jurgis's 19 
#> 5  ona 200 
#> 6  ona's 25

來源

2017-05-03 00:07:18

哇，謝謝。我後來在路上感興趣，但是是上課的。本週的首要主題是字符串操作，所以我們還沒有使用tidytext，但這是一個方便的知識包。 –

偉大的答案 - 非常簡單的使用「計數」！ – griffmer

R：查找名稱的所有匹配項

回答

相關問題