2010-12-14 23 views
0

我想通過使用Python 2.6.5和R 10.0的RPY2運行rpart。rpy2的問題,rpart從python正確傳遞數據到r

我在Python中創建一個數據幀並一起傳遞,但我得到一個錯誤,指出:

Error in function (x) : binary operation on non-conformable arrays 
Traceback (most recent call last): 
    File "partitioningSANDBOX.py", line 86, in <module> 
    model=r.rpart(**rpart_params) 
    File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 83, in __call__ 
    File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 35, in __call__ 
rpy2.rinterface.RRuntimeError: Error in function (x) : binary operation on non-conformable arrays 

誰能幫助我確定我在做什麼錯拋出這個錯誤?

我的代碼的相關部分是這樣的:

import numpy as np 
import rpy2 
import rpy2.robjects as rob 
import rpy2.robjects.numpy2ri 


#Fire up the interface to R 
r = rob.r 
r.library("rpart") 

datadict = dict(zip(['responsev','predictorv'],[cLogEC,csplitData])) 
Rdata = r['data.frame'](**datadict) 
Rformula = r['as.formula']('responsev ~.') 
#Generate an RPART model in R. 
Rpcontrol = r['rpart.control'](minsplit=10, xval=10) 
rpart_params = {'formula' : Rformula, \ 
     'data' : Rdata, 
     'control' : Rpcontrol} 
model=r.rpart(**rpart_params) 

兩個變量cLogEC和csplitData是浮動式的numpy的陣列。

另外,我的數據幀看起來像這樣:

In [2]: print Rdata 
------> print(Rdata) 
    responsev predictorv 
1 0.6020600  312 
2 0.3010300  300 
3 0.4771213  303 
4 0.4771213  249 
5 0.9242793  239 
6 1.1986571  297 
7 0.7075702  287 
8 1.8115750  270 
9 0.6020600  296 
10 1.3856063  248 
11 0.6127839  295 
12 0.3010300  283 
13 1.1931246  345 
14 0.3010300  270 
15 0.3010300  251 
16 0.3010300  246 
17 0.3010300  273 
18 0.7075702  252 
19 0.4771213  252 
20 0.9294189  223 
21 0.6127839  252 
22 0.7075702  267 
23 0.9294189  252 
24 0.3010300  378 
25 0.3010300  282 

和式看起來像這樣:

In [3]: print Rformula 
------> print(Rformula) 
responsev ~ . 
+0

R中的數據幀是列表。也許你應該將數組傳遞給數組或矩陣? – 2010-12-14 01:58:56

+0

我嘗試傳遞矩陣,但也拋出了錯誤。有趣的是,如果我將r.plsr替換爲r.rpart,它可以很好地工作,rpart和plsr都會說他們需要數據作爲data.frame .... – mishaF 2010-12-14 02:48:55

回答

5

的課題在rpart包作爲R特質碼(準確地說,在以下塊,特別是最後一行:

m <- match.call(expand.dots = FALSE) 
m$model <- m$method <- m$control <- NULL 
m$x <- m$y <- m$parms <- m$... <- NULL 
m$cost <- NULL 
m$na.action <- na.action 
m[[1L]] <- as.name("model.frame") 
m <- eval(m, parent.frame()) 

)。

解決此問題的一種方法是避免輸入該代碼塊(請參見下文),或者可能會從Python構建嵌套評估(以便parent.frame()行爲)。這並不像人們希望的那麼簡單,但可能我會在未來找到時間讓它更容易。

from rpy2.robjects import DataFrame, Formula 
import rpy2.robjects.numpy2ri as npr 
import numpy as np 
from rpy2.robjects.packages import importr 
rpart = importr('rpart') 
stats = importr('stats') 

cLogEC = np.random.uniform(size=10) 
csplitData = np.array(range(10), 'i') 

dataf = DataFrame({'responsev': cLogEC, 
        'predictorv': csplitData}) 
formula = Formula('responsev ~.') 
rpart.rpart(formula=formula, data=dataf, 
      control=rpart.rpart_control(minsplit = 10, xval = 10), 
      model = stats.model_frame(formula, data=dataf)) 
+0

您的答案非常完美,解決方案完美無缺。非常感謝!我正在拉我的頭髮。 – mishaF 2010-12-14 17:18:08