2013-07-16 58 views
1

道歉,如果這被證明是非常具體的問題,這可能不推廣到其他人的。從同時兩個數據幀應用到值的函數,以產生第三

背景

我希望做一些情感分析,從一個詞彙單詞的基本二元匹配開始,然後朝情感分析的一些較爲複雜的運動,利用語法規則等。

問題

做一些二進制匹配 - 這將形成情感分析的第一階段 - 我提供兩個表,一個含有詞語,而另一個含有這些詞的詞性。

V1  V2  V3   V4 V5 
1 R  is fantastic language <NA> 
2 Java  is  far  from good 
3 Data mining  is fascinating <NA> 


    V1 V2 V3 V4 V5 
1 NN VBZ JJ NN <NA> 
2 NNP VBZ RB IN JJ 
3 NNP NN VBZ JJ <NA> 

我想進行一些基本的情感分析如下:我想申請一個函數有兩個參數,一個字(從第1個數據幀)及其相應的POS標籤(從第二)以確定在確定單詞的正/負方向時使用哪個列表單詞。 例如,單詞fantastic將會和POS標籤'JJ'一起被提取出來,所以單獨的形容詞列表將被檢查是否存在這個單詞。

最後,我想用一個數據幀,顯示匹配結果落得:

V1 V2 V3 V4 V5 
1 0 0 1 0 <NA> 
2 0 0 -1 0 1 
3 0 0 0 1 <NA> 

我試圖制定自己的代碼,但一直得到一個錯誤,在這之後我覺得這是去上班。

#test sentences 
sentences<- as.list(c("R is fantastic language", "Java is far from good", "Data mining is fascinating")) 

#using the OpenNLP package 
require(openNLP) 

#perform tagging 
taggedSentences<- tagPOS(sentences) 

#split to words 
individualWords<- unname(sapply(taggedSentences, function(x){strsplit(x,split=" ")})) 

#Strip Tags 
individualWordsClean<- unname(sapply(individualWords, function(x){gsub("/.+","",x)})) 

#Strip words 
individualTags<- unname(sapply(individualWords, function(x){gsub(".+/","",x)})) 

#create a dataframe for words; courtesy @trinker 
numberRow<- length(individualWords) 
numberCol<- unname(sapply(individualWords, length)) 
df1<- as.data.frame(matrix(nrow=numberRow, ncol=max(numberCol))) 
for (i in 1:numberRow){ 
df1[i,1:numberCol[i]]<- individualWordsClean [[i]] 
} 


#create a dataframe for tags; courtesy @trinker 
numberRow<- length(individualWords) 
numberCol<- unname(sapply(individualTags, length)) 
df2<- as.data.frame(matrix(nrow=numberRow, ncol=max(numberCol))) 
for (i in 1:numberRow){ 
df2[i,1:numberCol[i]]<- individualTags [[i]] 
} 

#Create negative/positive words' lists 
posAdj<- c("fantastic","fascinating","good") 
negAdj<- c("bad","poor") 
posNoun<- "R" 
negNoun<- "Java" 

#Function to match words and assign sentiment score 
checkLexicon<- function(word,tag){ 
if (grep("JJ|JJR|JJS",tag)){ 
ifelse(word %in% posAdj, +1, ifelse(word %in% negAdj, -1, 0)) 
} 
else if(grep("NN|NNP|NNPS|NNS",tag)){ 
ifelse(word %in% posNoun, +1, ifelse(word %in% negNoun, -1, 0)) 
} 
else if(grep("VBZ",tag)){ 
ifelse(word %in% "is","ok","none") 
} 
else if(grep("RB",tag)){ 
ifelse(word %in% "not",-1,0) 
} 
else if(grep("IN",tag)){ 
ifelse(word %in% "far",-1,0) 
} 
} 

#Method to output a single value when used in conjuction with apply 
justShow<- function(x){ 
    x 
    } 

#Main method that intends to extract word/POS tag pair, and determine sentiment score 
mapply(FUN=checkLexicon, word=apply(df1,2,justShow),tag=apply(df2,2,justShow)) 

不幸的是,我已經用這種方法沒有成功,並收到該錯誤是

Error in if (grep("JJ|JJR|JJS", tag)) { : argument is of length zero 

我是比較新的R,但似乎我無法在這裏使用apply功能因爲它不返回mapply函數的參數。另外,我不確定mapply是否會生成另一個數據幀。

請不要批評/建議。由於

PS。 Link TRinker關於R的筆記有興趣的人。

+1

你認爲那是什麼'grep'回報? – joran

+0

它返回索引,除非指定了value = TRUE。 (I * *想我知道你暗示什麼......哦,親愛的,這將是一個愚蠢的錯誤。 –

+1

是的,也許你想'grepl'? – joran

回答

1

這個錯誤是試圖使用grepgrepl。喬蘭指出這一點後糾正了這一點。 工作功能如下。

>df1 

    V1  V2  V3   V4 V5 
1 R  is fantastic language <NA> 
2 Java  is  far  from good 
3 Data mining  is fascinating <NA> 

>df2 

    V1 V2 V3 V4 V5 
1 NN VBZ JJ NN <NA> 
2 NNP VBZ RB IN JJ 
3 NNP NN VBZ JJ <NA> 

#Function to match words and assign sentiment score 
checkLexicon<- function(word,tag){ 
if (grepl("JJ|JJR|JJS",tag)){ 
ifelse(word %in% posAdj, +1, ifelse(word %in% negAdj, -1, 0)) 
} 
else if(grepl("NN|NNP|NNPS|NNS",tag)){ 
ifelse(word %in% posNoun, +1, ifelse(word %in% negNoun, -1, 0)) 
} 
else if(grepl("VBZ",tag)){ 
ifelse(word %in% "is","ok","none") 
} 
else if(grepl("RB",tag)){ 
ifelse(word %in% "not",-1,0) 
} 
else if(grepl("IN",tag)){ 
ifelse(word %in% "far",-1,0) 
} 
} 

#Method to output a single value when used in conjuction with apply 
justShow<- function(x){ 
    x 
    } 

#Main method that intends to extract word/POS tag pair, and determine sentiment score 
myObject<- mapply(FUN=checkLexicon, word=apply(df1,2,justShow),tag=apply(df2,2,justShow)) 

#Shaping the final dataframe 
scoredDF<- as.data.frame(matrix(myObject,nrow=3)) 

    V1 V2 V3 V4 V5 
1 1 ok 1 0 NULL 
2 -1 ok 0 0 1 
3 0 0 ok 1 NULL 
相關問題