Rpy2和熊貓：從預測到熊貓數據幀加入輸出

我通過RPy2在R中使用randomForest庫。我想回傳使用caretpredict方法計算的值，並將它們連接到原始的pandas數據框。見下面的例子。Rpy2和熊貓：從預測到熊貓數據幀加入輸出

import pandas as pd 
import numpy as np 
import rpy2.robjects as robjects 
from rpy2.robjects import pandas2ri 
pandas2ri.activate() 
r = robjects.r 
r.library("randomForest") 
r.library("caret") 

df = pd.DataFrame(data=np.random.rand(100, 10), columns=["a{}".format(i) for i in range(10)]) 
df["b"] = ['a' if x < 0.5 else 'b' for x in np.random.sample(size=100)] 
train = df.ix[df.a0 < .75] 
withheld = df.ix[df.a0 >= .75] 

rf = r.randomForest(robjects.Formula('b ~ .'), data=train) 
pr = r.predict(rf, withheld) 
print pr.rx()

它返回

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
a a b b b a a a a b a a a a a b a a a a 
Levels: a b

但如何才能join這給withheld數據幀或比較原始值？

我已經試過這樣：

import pandas.rpy.common as com 
com.convert_robj(pr)

但這返回一個字典，其中鍵是字符串。我想有一個工作圍繞withheld.reset_index()，然後將字典鍵轉換爲整數，然後加入兩個，但必須有一個更簡單的方法！

來源

2015-02-05 kungphil

有熊貓到a pull-request that adds R factor to Pandas Categorical functionality 。它尚未合併到熊貓主分支中。如果是，

import pandas.rpy.common as rcom 
rcom.convert_robj(pr)

會將pr轉換爲熊貓分類。

def convert_factor(obj): 
    """ 
    Taken from jseabold's PR: https://github.com/pydata/pandas/pull/9187 
    """ 
    ordered = r["is.ordered"](obj)[0] 
    categories = list(obj.levels) 
    codes = np.asarray(obj) - 1 # zero-based indexing 
    values = pd.Categorical.from_codes(codes, categories=categories, 
             ordered=ordered) 
    return values

例如，

import pandas as pd 
import numpy as np 
import rpy2.robjects as robjects 
from rpy2.robjects import pandas2ri 
pandas2ri.activate() 
r = robjects.r 
r.library("randomForest") 
r.library("caret") 

def convert_factor(obj): 
    """ 
    Taken from jseabold's PR: https://github.com/pydata/pandas/pull/9187 
    """ 
    ordered = r["is.ordered"](obj)[0] 
    categories = list(obj.levels) 
    codes = np.asarray(obj) - 1 # zero-based indexing 
    values = pd.Categorical.from_codes(codes, categories=categories, 
             ordered=ordered) 
    return values 


df = pd.DataFrame(data=np.random.rand(100, 10), 
        columns=["a{}".format(i) for i in range(10)]) 
df["b"] = ['a' if x < 0.5 else 'b' for x in np.random.sample(size=100)] 
train = df.ix[df.a0 < .75] 
withheld = df.ix[df.a0 >= .75] 

rf = r.randomForest(robjects.Formula('b ~ .'), data=train) 
pr = convert_factor(r.predict(rf, withheld)) 

withheld['pr'] = pr 
print(withheld)

來源

2015-02-05 21:37:10 unutbu

由函數predict返回將R對象pr是一個「載體」，它可以認爲：直到這時，可以作爲一種解決方法使用作爲Python array.array，或numpy一維數組。

「加入」是不必要的，因爲pr中元素的排序對應於表withheld中的行。人們只需要添加pr作爲附加列withheld （見Adding new column to existing DataFrame in Python pandas）：

withheld['predictions'] = pd.Series(pr, 
            index=withheld.index)

默認情況下這將增加整數的柱（因爲ř因素被編碼爲整數）。一個可以自定義rpy2的轉換，而只是（見http://rpy.sourceforge.net/rpy2/doc-2.5/html/robjects_convert.html）：

注： rpy2的2.6.0版本將包括大熊貓Categorical向量的處理，使得下面不必要描述的轉換器的定製。

@robjects.conversion.ri2py.register(robjects.rinterface.SexpVector) 
def ri2py_vector(vector): 
    # based on 
    # https://bitbucket.org/rpy2/rpy2/src/a75413b09852991869332da615fa754923c32039/rpy/robjects/pandas2ri.py?at=default#cl-73 

    # special case for factors 
    if 'factor' in vector.rclass: 
     res = pd.Categorical.from_codes(np.asarray(vector) - 1, 
             categories = vector.do_slot('levels'), 
             ordered = 'ordered' in vector.rclass) 
    else: 
     # use the numpy converter first 
     res = numpy2ri.ri2py(obj) 
    if isinstance(res, recarray): 
     res = PandasDataFrame.from_records(res) 
    return res

由此，任何rpy2對象到一個非rpy2對象的轉換將返回一個大熊貓Categorical每當有一個R因子：

robjects.conversion.ri2py(pr)

您可決定增加的結果這是最後一次轉換到您的數據表。

請注意，轉換到非rpy2對象必須是顯式的（一個是調用轉換器）。如果你使用的是ipython，有一種方法可以使這個隱含的： https://gist.github.com/lgautier/e2e8709776e0e0e93b8d （和原始線程https://bitbucket.org/rpy2/rpy2/issue/230/rmagic-specific-conversion）。

來源

2015-02-06 01:20:35 lgautier

Rpy2和熊貓：從預測到熊貓數據幀加入輸出

回答

相關問題