2017-10-16 47 views
1

儘管head()可用於提取前n個規則,但某些RHS項目可能會多次出現。我希望找到前n個獨特的RHS項目以及每個這樣的項目的最高規則。如何使用arules識別前n個推薦物品及其規則?

我已經編寫了完成此操作的代碼,但運行速度非常慢,大概是由於使用了'subset'函數,效率非常低。我的代碼遍歷RHS的獨特項目,找到與它相關的規則的子集,並返回項目的單個頂部規則。這是一種有效的方法嗎?有沒有更好的辦法?

library(arules) 
data("Groceries") 
rules = apriori(Groceries, 
       parameter = list(supp = 0.01, conf = 0.1, target = "rules"), 
       appearance = list(lhs=c("whole milk", "root vegetables"), default="rhs")) 

rules = sort(rules, by=c("confidence", "lift", "support")) 
rhs.unique = unique([email protected]@itemInfo$labels[[email protected]@[email protected]+1]) #Already sorted by top items. 

#Function that returns the top rule for a particular RHS item in a set of rules. 
top_item_rule = function(item, rules=NULL) { 
    rules = subset(rules, rhs %in% item) 
    rules = sort(rules, by=c("confidence", "lift", "support")) 
    head(rules, n=1) 
} 

n = 3 
toprules = lapply(rhs.unique[1:n], top_item_rule, rules) 
toprules = do.call(c, args=toprules) 

回答

2

這個怎麼樣?

rules <- sort(rules, by=c("confidence", "lift", "support")) 
rules[!duplicated(rhs(rules))] 

它返回每個rhs頂部(排序後的第一個)規則。

+0

但當然!謝謝。 – MCornejo

相關問題