2017-09-11 45 views
0

我有下述R腳本設置,其被設計使用插入符包建立從數據幀的模型:如何格式化使用rpy2的Python腳本以構建帶有R-caret功能的模型?

library(caret) 
library(broom) 

data<- data.table("mydata.csv") 

splitprob <- 0.8 

traintestindex <- createDataPartition(data$fluorescence, p=splitprob, list=F) 
testset <- data[-traintestindex,] 
trainingset <- data[traintestindex,] 

model <- train(fluorescence~., trainingset, method = "glmStepAIC", preProc = c("center","scale"), trControl = cvCtrl) 

final_model<- tidy(model$finalModel) 

write.csv(tidy, "model_glm.csv") 

我想能夠有這樣的代碼的功能內的被表示Python腳本。在生成一個熊貓數據框之後,它將被轉換成一個R數據框,並隨後運行插入符的列車函數,該函數的設置與上面的R腳本中的參數相同。

import pandas as pd 
from rpy2.robjects import r 
import sys 
import rpy2.robjects.packages as rpackages 
from rpy2.robjects.vectors import StrVector 
from rpy2.robjects import r, pandas2ri 

pandas2ri.activate() 
caret = rpackages.importr('caret') 
broom= rpackages.importr('broom') 

my_data= pd.read_csv("my_data.csv") 
r_dataframe= pandas2ri.py2ri(my_data) 

preprocessing= ["center", "scale"] 

center_scale= StrVector(preprocessing) 

cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100) 

model_R= caret.train("fluorescence~.", data= r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl) 

print(model_R.finalModel) 

然而,這個腳本明顯未正確配置,因爲我試圖在該行model_R= caret.train("fluorescence~., r_dataframe, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl")運行與rpy2產量SyntaxError: invalid syntax的Python腳本。我試圖遵循文檔中給出的語法(來源:https://rpy2.github.io/doc/latest/html/introduction.html?highlight=linear%20model),但是設置這種代碼的方式很稀疏。

爲了讓代碼正常工作,我的Python代碼中必須修復哪些內容才能從我的數據框中構建模型?

回答

0

我想出格式用於經由rpy2實現插入符功能:

import pandas as pd 
from rpy2.robjects import r 
import sys 
import rpy2.robjects.packages as rpackages 
from rpy2.robjects.vectors import StrVector 
from rpy2.robjects import r, pandas2ri 

pandas2ri.activate() 
caret = rpackages.importr('caret') 
broom= rpackages.importr('broom') 

my_data= pd.read_csv("my_data.csv") 
r_dataframe= pandas2ri.py2ri(my_data) 

preprocessing= ["center", "scale"] 
center_scale= StrVector(preprocessing) 

#these are the columns in my data frame that will consist of my predictors in the model 
predictors= ['predictor1','predictor2','predictor3'] 
predictors_vector= StrVector(predictors) 

#this column from the dataframe consists of the outcome of the model 
outcome= ['fluorescence'] 
outcome_vector= StrVector(outcome) 

#this line extracts the columns of the predictors from the dataframe 
columns_predictors= r_dataframe.rx(True, columns_vector) 

#this line extracts the column of the outcome from the dataframe 
column_response= r_dataframe.rx(True, column_response) 

cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100) 

model_R= caret.train(columns_predictors, columns_response, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl) 

print(model_R.rx('finalModel'))