2017-01-25 37 views
0

我使用rtm中的VCorpus()函數。這裏的問題是我有使用VCorpus()函數但丟失內容

example_text = data.frame(num=c(1,2,3),Author1 = c("Text mining is a great time.","Text analysis provides insights","qdap and tm are used in text mining"),Author2=c("R is a great language","R has many uses","DataCamp is cool!")) 

這看起來像

num        Author1    Author2 
1 1  Text mining is a great time. R is a great language 
2 2  Text analysis provides insights  R has many uses 
3 3 qdap and tm are used in text mining  here is a problem 

然後我鍵入df_source = DataframeSource(example_text[,2:3])只提取最後2列。

df_source看起來正確。在那之後,我做了df_corpus = VCorpus(df_source)df_corpus[[1]]

<<PlainTextDocument>> 
Metadata: 7 
Content: chars: 2 

而且df_corpus[[1]]給我

$content 
[1] "3" "3" 

df_corpus[[1]]應該返回

<<PlainTextDocument>> 
Metadata: 7 
Content: chars: 49 

而且df_corpus[[1]][1]應該返回

$content 
[1] "Text mining is a great time." "R is a great language" 

我不知道哪裏出了問題。任何建議將不勝感激。

+0

從'help(DataframeSource)',數據幀源將數據幀x的每一行解釋爲一個文檔。我認爲你應該將每個句子當作一個文檔來處理,並且必須在使用'DataframeSource'之前將數據框轉換爲6行,1列(句子)。 – kitman0804

+0

@ kitman0804這是一個'datacamp'互動練習。我在網絡瀏覽器中這樣做了,它正確地輸出了預期的結果。但是,當我在筆記本電腦上的r studio中執行此操作時,會產生此問題。 – ftxx

+0

當您創建'example_text'時,添加參數'stringsAsFactors = FALSE',那麼一切都會正常工作。 – kitman0804

回答

0

,都應該是字符內部example_text文本都成爲因素,因爲stringsAsFactors的「剛出廠的」值TRUE,這是從我的角度來看怪異和惱人的。

example_text <- data.frame(num=c(1,2,3),Author1 = c("Text mining is a great time.","Text analysis provides insights","qdap and tm are used in text mining"),Author2=c("R is a great language","R has many uses","DataCamp is cool!")) 
lapply(example_text, class) 

# $num 
# [1] "numeric" 
# 
# $Author1 
# [1] "factor" 
# 
# $Author2 
# [1] "factor" 

爲確保柱作者1和Author2是字符列,您可以嘗試:

  1. 在你的代碼的開頭添加options(stringsAsFactors = FALSE)
  2. stringsAsFactors = FALSE加入您的data.frame(...)聲明中。
  3. 運行example_text[, 2:3] <- lapply(example_text[, 2:3], as.character)
  4. 運行example_text[, 2:3] <- lapply(example_text[, 2:3], paste)

然後一切都應該正常工作。

+0

謝謝!有用! – ftxx