2010-06-04 55 views

回答

9

使用一個例子:(這是提取標記爲/ VBX,其中x是任何單個字符的話)

library("openNLP") 

acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter." 

acqTag <- tagPOS(acq) 

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\\s)(\\w+$)", "\\2", x)) 

    [,1]       
[1,] "said"       
[2,] "sold"       
[3,] "engaged"      
[4,] "said"       
[5,] "is"       
[6,] "did"       
[7,] " not/RB explain./NN Reuter./." 

好吧,我的正則表達式,需要以一定的改善擺脫的最後一行在結果中。

編輯

另一種可能是忽略包含space字符行

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\\s)(\\w+$)", "\\2", x); res[!grepl("\\s",res)]}) 
+0

謝謝! gd047 :)它的工作原理...我幾乎在使用sapply提取的邊緣,但無法獲得如何做到這一點。謝謝。 – 2010-06-04 13:51:58