從同時兩個數據幀應用到值的函數，以產生第三

道歉，如果這被證明是非常具體的問題，這可能不推廣到其他人的。從同時兩個數據幀應用到值的函數，以產生第三

背景

我希望做一些情感分析，從一個詞彙單詞的基本二元匹配開始，然後朝情感分析的一些較爲複雜的運動，利用語法規則等。

問題

做一些二進制匹配 - 這將形成情感分析的第一階段 - 我提供兩個表，一個含有詞語，而另一個含有這些詞的詞性。

V1  V2  V3   V4 V5 
1 R  is fantastic language <NA> 
2 Java  is  far  from good 
3 Data mining  is fascinating <NA> 


    V1 V2 V3 V4 V5 
1 NN VBZ JJ NN <NA> 
2 NNP VBZ RB IN JJ 
3 NNP NN VBZ JJ <NA>

我想進行一些基本的情感分析如下：我想申請一個函數有兩個參數，一個字（從第1個數據幀）及其相應的POS標籤（從第二）以確定在確定單詞的正/負方向時使用哪個列表單詞。例如，單詞fantastic將會和POS標籤'JJ'一起被提取出來，所以單獨的形容詞列表將被檢查是否存在這個單詞。

最後，我想用一個數據幀，顯示匹配結果落得：

V1 V2 V3 V4 V5 
1 0 0 1 0 <NA> 
2 0 0 -1 0 1 
3 0 0 0 1 <NA>

我試圖制定自己的代碼，但一直得到一個錯誤，在這之後我覺得這是去上班。

#test sentences 
sentences<- as.list(c("R is fantastic language", "Java is far from good", "Data mining is fascinating")) 

#using the OpenNLP package 
require(openNLP) 

#perform tagging 
taggedSentences<- tagPOS(sentences) 

#split to words 
individualWords<- unname(sapply(taggedSentences, function(x){strsplit(x,split=" ")})) 

#Strip Tags 
individualWordsClean<- unname(sapply(individualWords, function(x){gsub("/.+","",x)})) 

#Strip words 
individualTags<- unname(sapply(individualWords, function(x){gsub(".+/","",x)})) 

#create a dataframe for words; courtesy @trinker 
numberRow<- length(individualWords) 
numberCol<- unname(sapply(individualWords, length)) 
df1<- as.data.frame(matrix(nrow=numberRow, ncol=max(numberCol))) 
for (i in 1:numberRow){ 
df1[i,1:numberCol[i]]<- individualWordsClean [[i]] 
} 


#create a dataframe for tags; courtesy @trinker 
numberRow<- length(individualWords) 
numberCol<- unname(sapply(individualTags, length)) 
df2<- as.data.frame(matrix(nrow=numberRow, ncol=max(numberCol))) 
for (i in 1:numberRow){ 
df2[i,1:numberCol[i]]<- individualTags [[i]] 
} 

#Create negative/positive words' lists 
posAdj<- c("fantastic","fascinating","good") 
negAdj<- c("bad","poor") 
posNoun<- "R" 
negNoun<- "Java" 

#Function to match words and assign sentiment score 
checkLexicon<- function(word,tag){ 
if (grep("JJ|JJR|JJS",tag)){ 
ifelse(word %in% posAdj, +1, ifelse(word %in% negAdj, -1, 0)) 
} 
else if(grep("NN|NNP|NNPS|NNS",tag)){ 
ifelse(word %in% posNoun, +1, ifelse(word %in% negNoun, -1, 0)) 
} 
else if(grep("VBZ",tag)){ 
ifelse(word %in% "is","ok","none") 
} 
else if(grep("RB",tag)){ 
ifelse(word %in% "not",-1,0) 
} 
else if(grep("IN",tag)){ 
ifelse(word %in% "far",-1,0) 
} 
} 

#Method to output a single value when used in conjuction with apply 
justShow<- function(x){ 
    x 
    } 

#Main method that intends to extract word/POS tag pair, and determine sentiment score 
mapply(FUN=checkLexicon, word=apply(df1,2,justShow),tag=apply(df2,2,justShow))

不幸的是，我已經用這種方法沒有成功，並收到該錯誤是

Error in if (grep("JJ|JJR|JJS", tag)) { : argument is of length zero

我是比較新的R，但似乎我無法在這裏使用apply功能因爲它不返回mapply函數的參數。另外，我不確定mapply是否會生成另一個數據幀。

請不要批評/建議。由於

PS。 Link TRinker關於R的筆記有興趣的人。

來源

2013-07-16 info_seekeR

你認爲那是什麼'grep'回報？ – joran

它返回索引，除非指定了value = TRUE。（I * *想我知道你暗示什麼......哦，親愛的，這將是一個愚蠢的錯誤。 –

是的，也許你想'grepl'？ – joran

這個錯誤是試圖使用grep爲grepl。喬蘭指出這一點後糾正了這一點。工作功能如下。

>df1 

    V1  V2  V3   V4 V5 
1 R  is fantastic language <NA> 
2 Java  is  far  from good 
3 Data mining  is fascinating <NA> 

>df2 

    V1 V2 V3 V4 V5 
1 NN VBZ JJ NN <NA> 
2 NNP VBZ RB IN JJ 
3 NNP NN VBZ JJ <NA> 

#Function to match words and assign sentiment score 
checkLexicon<- function(word,tag){ 
if (grepl("JJ|JJR|JJS",tag)){ 
ifelse(word %in% posAdj, +1, ifelse(word %in% negAdj, -1, 0)) 
} 
else if(grepl("NN|NNP|NNPS|NNS",tag)){ 
ifelse(word %in% posNoun, +1, ifelse(word %in% negNoun, -1, 0)) 
} 
else if(grepl("VBZ",tag)){ 
ifelse(word %in% "is","ok","none") 
} 
else if(grepl("RB",tag)){ 
ifelse(word %in% "not",-1,0) 
} 
else if(grepl("IN",tag)){ 
ifelse(word %in% "far",-1,0) 
} 
} 

#Method to output a single value when used in conjuction with apply 
justShow<- function(x){ 
    x 
    } 

#Main method that intends to extract word/POS tag pair, and determine sentiment score 
myObject<- mapply(FUN=checkLexicon, word=apply(df1,2,justShow),tag=apply(df2,2,justShow)) 

#Shaping the final dataframe 
scoredDF<- as.data.frame(matrix(myObject,nrow=3)) 

    V1 V2 V3 V4 V5 
1 1 ok 1 0 NULL 
2 -1 ok 0 0 1 
3 0 0 ok 1 NULL

來源

2013-07-20 09:30:12

從同時兩個數據幀應用到值的函數，以產生第三

回答

相關問題