2017-07-02 103 views
1

我正在研究R中的文本挖掘,這裏有幾個來自我的語料庫的文檔,在刪除了標點符號,數字,URL和停用詞後。在R中完成任務

myStopwords <- setdiff(myStopwords, c("r", "big")) 
myCorpus <- tm_map(myCorpus, removeWords, myStopwords) 
myCorpus <- tm_map(myCorpus, stripWhitespace) 
myCorpusCopy <- myCorpus 
for (i in c(1:2, 320)) 
{ 
    cat(paste0("[", i, "] ")) 
    writeLines(strwrap(as.character(myCorpus[[i]]), 60)) 
} 

[1] examples calling java code r 
[2] simulating mapreduce r big data analysis using flights data 
rbloggers 
[320] r reference card data mining now cran lists many useful r 
functions packages data mining applications 

在那之後,我想了如下詞幹,

myCorpus <- tm_map(myCorpus, stemDocument) 
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy) 

當我嘗試運行for循環,它顯示NA,如下

for (i in c(1:2, 320)) 
{ 
cat(paste0("[", i, "] ")) 
writeLines(strwrap(as.character(myCorpus[[i]]), 60)) 
} 

[1] NA 
[2] NA 
[320] NA 

任何想法,我在這裏錯了嗎?

回答

0

我複製你的問題一個內置的數據集:

data("crude") 

myCorpus  <- as.VCorpus(crude) 
myCorpusCopy <- myCorpus 
myCorpus <- tm_map(myCorpus, stemDocument) 
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy) 

我發現最後一行後myCorpus對象的元素在它們的結構更多的領域,如meta和我的情況content和現在這些元素被命名爲字符向量。

,您仍然可以訪問的元素:

myCorpus[[1]]

Diamond Shamrock Corp said that\neffect today it had cut it contract price for crude oil by\n1.50 dlrs a barrel.\n The reduct bring it post price for West Texas\nIntermedi to 16.00 dlrs a barrel, the copani said.\n "The price reduct today was made in the light of falling\noil product price and a weak crude oil market," a company\nspokeswoman said.\n Diamond is the latest in a line of U.S. oil compani that\nhav cut it contract, or posted, price over the last two days\ncit weak oil markets.\n Reuter 
                                                                                                                                 "content" 
                                                                                                                                  <NA> 
                                                                                                                                 "meta" 

as.character()方法是打在物體的元素的新結構(str())從你想的正好相反部分。現在,正文文本顯然實際存儲爲names

我是能夠解決這樣的循環:

for (i in c(1:2, length(myCorpus))) 
{ 
    cat(paste0("[", i, "] ")) 
    writeLines(strwrap(as.character(names(myCorpus[[i]])), 60)) 
} 
[1] Diamond Shamrock Corp said that effect today it had cut it 
contract price for crude oil by 1.50 dlrs a barrel. The 
reduct bring it post price for West Texas Intermedi to 
16.00 dlrs a barrel, the copani said. "The price reduct 
today was made in the light of falling oil product price 
and a weak crude oil market," a company spokeswoman said. 
Diamond is the latest in a line of U.S. oil compani that 
hav cut it contract, or posted, price over the last two 
days cit weak oil markets. Reuter 

[2] OPEC may be forc to meet befor a schedul June session to 
readdress it product cutting agr if the organ want to halt 
the current slide in oil prices, oil industri analyst said. 
"The movement to higher oil price was never to be as easy a 
OPEC thought. They may need an emerg meet to sort out th 
problems," said Daniel Yergin, director of Cambridg Energy 
Research Associates, CERA. Analyst and oil industri sourc 
said the problem OPEC face is excess oil suppli in world 
oil markets. "OPEC problem is not a price problem but a 
production issu and must be address in that way," said Paul 
Mlotok, oil analyst with Salomon Brother Inc. He said the 
market earlier optim about OPEC and its abl to keep product 
under control have given way to a pessimist outlook that 
the organ must address soon if it wish to regain the initi 
in oil prices. But some other analyst were uncertain that 
even an emerg meet would address the problem of OPEC 
production abov the 15.8 mln bpd quota set last December. 
"OPEC has to learn that in a buyer market you cannot have 
deem quotas, fix price and set differentials," said the 
region manag for one of the major oil compani who spoke on 
condit that he not be named. "The market is now tri to 
teach them that lesson again," he added. David T. Mizrahi, 
editor of Mideast reports, expect OPEC to meet befor June, 
although not immediately. However, he is not optimist that 
OPEC can address it princip problems. "They will not meet 
now as they tri to take advantag of the wint demand to sell 
their oil, but in late March and April when demand 
slackens," Mizrahi said. But Mizrahi said that OPEC is 
unlik to do anyth more than reiter it agreement to keep 
output at 15.8 mln bpd." Analyst said that the next two 
month will be critic for OPEC abil to hold togeth price and 
output. "OPEC must hold to it pact for the next six to 
eight weeks sinc buyer will come back into the market 
then," said Dillard Sprigg of Petroleum Analysi Ltd in New 
York. But Bijan Moussavar-Rahmani of Harvard Univers 
Energy and Environ Polici Center said that the demand for 
OPEC oil ha been rise through the first quarter and this 
may have prompt excess in it production. "Demand for their 
(OPEC) oil is clear abov 15.8 mln bpd and is probabl closer 
to 17 mln bpd or higher now so what we ar see character as 
cheat is OPEC meet this demand through current production," 
he told Reuter in a telephon interview. Reuter 
[20] Argentin crude oil product was down 10.8 pct in Januari 
1987 to 12.32 mln barrels, from 13.81 mln barrel in Januari 
1986, Yacimiento Petrolifero Fiscales said. Januari 1987 
natur gas output total 1.15 billion cubic metrers, 3.6 pct 
higher than 1.11 billion cubic metr produced in Januari 
1986, Yacimiento Petrolifero Fiscal added. Reuter