2012-01-13 21 views
4

問題:如何從R中的apriori調用中獲取常用項目集的頻率?

arules包的apriori功能推斷從輸入交易關聯規則,並報告支持信心,和每個規則的電梯。關聯規則來自頻繁項目集。我想在輸入事務中獲得最頻繁的項目集。具體來說,我希望獲得給定最小支持的所有項目集。項目集的支持度是包含項目集的事務數與總事務數之比。

要求:

  1. 我強烈希望找到從apriori功能的中間結果的最頻繁項集。也就是說,我寧願不從頭開始編寫程序來計算最頻繁的項目集,因爲apriori函數已經將其計算爲中間步驟。儘管如此,如果真的沒有合理的方式來訪問apriori函數的中間結果,我可以接受其他解決方案。
  2. 我寧願不對apriori函數的結果進行字符串處理,因爲這種方法將過於依賴apriori函數的結果的字符串表示形式。再說一遍,如果事實證明沒有更好的選擇,我可以採用這種方法。
  3. 我知道arules包提供的itemFrequency功能。不幸的是,這個函數只是用單個項目報告項目集。我對任何長度的所有項目集都感興趣,並提供最低的支持。
  4. 我想輸出按支持數值排序,然後按項目集詞典排序。

示例輸入:

a,b 
a,b,c 

程序:

# The following is how I'm using apriori to infer the association rules. 
library(package = "arules") 
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",") 
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001)) 
WRITE(rules, file = "", sep = ",", quote = TRUE, col.names = NA) 

電流輸出:

"","rules","support","confidence","lift" 
"1","{} => {c}",0.5,0.5,1 
"2","{} => {b}",1,1,1 
"3","{} => {a}",1,1,1 
"4","{c} => {b}",0.5,1,1 
"5","{b} => {c}",0.5,0.5,1 
"6","{c} => {a}",0.5,1,1 
"7","{a} => {c}",0.5,0.5,1 
"8","{b} => {a}",1,1,1 
"9","{a} => {b}",1,1,1 
"10","{b,c} => {a}",0.5,1,1 
"11","{a,c} => {b}",0.5,1,1 
"12","{a,b} => {c}",0.5,0.5,1 

所需的輸出:

"itemset","support" 
"{a}",1 
"{a,b}",1 
"{b}",1 
"{a,b,c}",0.5 
"{a,c}",0.5 
"{b,c}",0.5 
"{c}",0.5 
+0

(從R聊天重播)我不確定,但它看起來像你可以做一些正則表達式來獲取文本部分。其餘的看起來像是數字向量中的第一個元素。 – Iterator 2012-01-13 23:31:27

+0

@Iterator我找到了一種方法來訪問'apriori'函數調用的中間結果:'generateItemsets'。看我的部分解決方案。 – reprogrammer 2012-01-14 16:36:38

回答

7

我在arules包的基準manual發現generatingItemsets功能。

library(package = "arules") 
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",") 
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001)) 
itemsets <- unique(generatingItemsets(rules)) 
itemsets.df <- as(itemsets, "data.frame") 
frequentItemsets <- itemsets.df[with(itemsets.df, order(-support,items)),] 
names(frequentItemsets)[1] <- "itemset" 
write.table(frequentItemsets, file = "", sep = ",", row.names = FALSE) 
相關問題