2016-06-23 40 views
0

我目前有apriori函數的問題。事情是我有類似如下的CSV:R - Apriori函數錯誤

Desc,Cantidad,Valor,Fecha,Lugar,UUID 
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7 
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7 
DESCUENTO,1,-170,2014-09-05T15:10:24,83000,7F0C7F0B-BCFC-4FCA-8740-B36AE9932869 
Descuento de TYK Dia,1,-156,2014-06-19T16:52:27,86280,1E08E51E-213A-4EE0-8FE9-492E677FF0C9 
Descuento de TYK Dia,1,-139,2014-04-25T10:52:44,86280,AB802E63-2D0D-4B47-AB70-DDE007929F9F 
DESCUENTO,1,-63,2014-07-04T13:53:10,83000,5B1F12BB-71DE-4734-A774-8D377757A880 
REDONDEO,1,-1,2014-03-29T10:50:59,0,5B241EFA-6654-46EA-B47A-3CB76C5EA923 
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7 
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7 
LAVADO,1,0,2014-05-27T18:18:11,44500,e5d540d6-0f98-4993-ec09-56887cd4a27d 
TUA,1,0,2014-09-29T10:20:31,6500,1d8ada06-a8a1-4bd8-9356-851b5da28108 
Transportación Aerea,1,0,2014-10-03T10:41:09,6500,5fc3925a-d08a-4cdc-be7e-ca02bd488d5b 
OBSEQUIO LAVADO DE CARROCERIA,1,0,2014-04-07T13:45:55,91800,8148ab07-5804-4b2b-b37c-5323b394907a 
Arroz Al Azafran Combos A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585 
Frijoles Charros A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585 
Pepsi Ch A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585 
FECHA DE CONSUMO 18/07/2014,1,0,2014-07-19T18:01:45,6060,0f0465aa-a75b-4f95-8e3b-43c13452cafb 
CAMBIO DE ACEITE DE MOTOR,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED 
CAMBIO DE FILTRO DE ACEITE,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED 

全CSV(https://github.com/antonio1695/BaseX/blob/master/facturas1.csv) 要下載的文件只需點擊查找文件,然後你會看到該文件。 所以我所做的就是:

> df1 <- read.csv("facturas1.csv") 
> rules <- apriori(df1,parameter=list(support=0.01,confidence=0.5)) 
Error in asMethod(object) : 
column(s) 3 not logical or a factor. Discretize the columns first. 

然而,問題是,列離散已如果我更改,以便在數據爲它在第2列,反之亦然的地方有3列。它仍然表示,第3列不是合乎邏輯的,或者是它應該對第2列進行說明的因素。謝謝!

+0

似乎'apriori'只需要因素或邏輯作爲輸入 – HubertL

+0

是的,當我交換列時,非邏輯或非因子輸入應該是列2.然而錯誤仍然說問題是與第3列。如果我在第3列的輸入不會是邏輯或一個因素輸入,下一個錯誤會說,錯誤是在第2列... @HubertL –

+0

我得到了'列(s)2,3 ,5不合邏輯或因素# – HubertL

回答

1

經過一番研究,我發現,先驗功能必須採取間隔,以便它能夠正常工作,所以當您使用離散時,您必須添加參數「類別」以選擇您想要的間隔數。它不可能間隔。我會在這裏發佈代碼:

我決定採取20個間隔,這全部取決於間隔中值的重複頻率。

df$Valor <- discretize(df$Valor, method="frequency",categories = 20) 

希望它有助於某人。

3
library(arules) 
df1 <- read.csv("https://raw.githubusercontent.com/antonio1695/BaseX/master/facturas1.csv") 
trans <- as(df1, "transactions") 
    Error in asMethod(object) : 
    column(s) 3 not logical or a factor. Discretize the columns first. 

讓我們來看看數據幀:

str(df1) 
'data.frame': 10510 obs. of 6 variables: 
$ Desc : Factor w/ 3927 levels "0","00000215R0 - LIQUIDO DE FRENOS",..: 1490 1490 1490 1491 1491 1490 3209 1490 1490 2238 ... 
$ Cantidad: Factor w/ 85 levels "","1","-1","10",..: 2 2 2 2 2 2 2 2 2 2 ... 
$ Valor : int -3405 -3405 -170 -156 -139 -63 -1 -1 -1 0 ... 
$ Fecha : Factor w/ 4054 levels "1294","2014-01-06T11:10:21",..: 4041 4041 3443 1794 596 2125 241 4041 4041 1215 ... 
$ Lugar : Factor w/ 982 levels "","0","1000",..: 487 487 802 848 848 802 2 487 487 373 ... 
$ UUID : Factor w/ 4056 levels "0019A60D-78F8-E341-8D3E-9786201FE017",..: 1988 1988 1979 456 2711 1423 1424 1988 1988 3658 ... 

勇氣是一個數字(int)和需要進行離散化!例如與discretize():

df1$Valor <- discretize(df1$Valor) 
head(df1$Valor) 
[1] [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400) 
[6] [-3405, 2400) 
Levels: [-3405, 2400) [ 2400, 8204) [ 8204,14009] 

現在你可以創建交易和applt APRIORI:

trans <- as(df1, "transactions") 
rules <- apriori(trans,parameter=list(support=0.01,confidence=0.5)) 
rules 
set of 84 rules 
+0

對不起,我剛剛意識到它沒有工作,因爲我這樣做,但它會更改我的數據,並使值的時間間隔,當我看到表中您每個項目的值設置爲[-3405,2400]使我的關聯規則無用。我不知道你是否看到這個。 –

+0

它給了我這樣的規則: '{Valor = [ - 3405,2400}} => {Cantidad = [-1,1627}} 0.9923882 0.9996166 0.9999972 2' 這顯然不是很有幫助。 –

+0

請參閱?離散 –