2016-07-07 59 views
2

我想將使用Rpostgresql的成功R代碼轉換爲PL/R代碼,以避免將數據推入/移出postgreql數據庫。將R代碼推送到postgresql數據庫中的PL/R代碼

代碼是在data.table一個dcast:R中

#libs 
library(RPostgreSQL); 
library(data.table); 

# connect 
drv <- dbDriver("PostgreSQL"); 
con <- dbConnect(drv, dbname="postgres", user="postgres"); 

# load 
cli_ranges <- dbGetQuery(con, "SELECT custid, prod_ranges, is_cli from cli_ranges;") 

# DT 
setDT(cli_ranges) 
setkeyv(cli_ranges , c("prod_ranges")) 

# pivot 
cli_ranges.pivoted <- dcast(cli_ranges, custid ~ paste0("is_cli_", prod_ranges), fun=sum, value.var = "is_cli") 

# send back to DB 
dbWriteTable(con, "cli_ranges_pivoted", cli_ranges.pivoted, row.names=F) 

代碼工作正常&快。

現在我試圖把代碼放在一個PL/R功能,

CREATE OR REPLACE FUNCTION public.pivot() 
    RETURNS void AS 
$BODY$ 

[copy/paste R code] 

$BODY$ 
    LANGUAGE plr; 

...但將R代碼(dbWriteTable)的最後一行拋出:

ERROR: R interpreter expression evaluation error 
DETAIL: Error in (function (classes, fdef, mtable) : 
    unable to find an inherited method for function 'dbWriteTable' for signature '"logical", "character", "data.frame"' 
CONTEXT: In PL/R function pivot 

更改數據幀的data.table(as.data.frame(cli_ranges.pivoted))也不起作用。

一個技巧可能是爲了執行CREATE TABLE cli_ranges_pivoted AS SELECT pivot();返回data.table /幀,但我真的不知道該怎麼推data.frame作爲輸出...

cli_ranges表:

custid prod_ranges is_cli 
1  A   1 
1  B   1 
1  C   0 
2  A   1 
2  B   0 
2  C   1 
3  A   0 
3  B   1 
3  C   0 
4  A   1 
...  ...   ... 

後dcast(即樞轉)datafram是如下:

custid prod_ranges_A prod_ranges_B prod_ranges_C 
1  1    1    0 
2  1    0    1 
3  0    1    0 
4  1    ... 
... 

prod_ranges昌不同值的數量通常情況下,所以我可以預先定義旋轉後的列數。

信封:在PostgreSQL 9.5,R 3.3,PL/R 08.03.00.16,贏得10 64位

回答

1

,您可以先http://gpdb.docs.pivotal.io/4330/ref_guide/pl_r.html例2和3

或者,你可以嘗試修改像UNNEST功能Lukas Eklund和Erwind Brandstetter在本文中做過(我正在使用Lukas的解決方案): Unnest array by one level

CREATE OR REPLACE FUNCTION unnest_multidim(anyarray) 
RETURNS SETOF anyarray AS 
$BODY$ 
    SELECT array_agg($1[series2.i][series2.x]) FROM 
    (SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i 
    FROM 
    (SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1 
    ) series2 
GROUP BY series2.i 
$BODY$ 
LANGUAGE sql IMMUTABLE; 

,然後你可以嘗試返回一個數組,而做這樣的事情:

CREATE OR REPLACE FUNCTION r_norm(n integer, mean float8, std_dev float8) 
RETURNS float8[] 
AS $$ 
x<-rnorm(n,mean,std_dev); 
y<-rnorm(n,mean,std_dev); 
final<-cbind(as.data.frame(x), as.data.frame(y)); 
return(final) 

$$ LANGUAGE 'plr'; 

CREATE TABLE test_norm_var AS SELECT R_output[1] as col1, R_output[2] as col2 FROM unnest_multidim(r_norm(10,0,1)) R_output; 

SELECT col1 FROM test_norm_var; 

編輯

我不能讓dbWriteTable工作它的目的是爲路PL/R功能... 但是,您也可以嘗試此方法

CREATE OR REPLACE FUNCTION pivot() 
    RETURNS VOID as $$ 

library(RPostgreSQL); 
library(data.table); 

drv <- dbDriver("PostgreSQL"); 
con <- dbConnect(drv, dbname ="postgres"); 

fields <- list(custid = "numeric",prod_ranges = "varchar(128)", is_cli = "numeric") 

custid <- c(1,1,1,2,2,2) 
prod_ranges <- c("A","B","C","A","B","C") 
is_cli <- c(1,1,0,1,0,1) 

cli_ranges <- data.frame(custid,prod_ranges,is_cli, stringsAsFactors = default.stringsAsFactors()) 

setDT(cli_ranges) 
setkeyv(cli_ranges , c("prod_ranges")) 

cli_ranges.pivoted <- as.data.frame(dcast(cli_ranges, custid ~ paste0("is_cli_", prod_ranges), fun=sum, value.var = "is_cli")) 

create_query <- paste0("CREATE TABLE cli_ranges (",paste0(colnames(cli_ranges.pivoted), collapse = " numeric, "), 
" numeric) DISTRIBUTED BY (",colnames(cli_ranges)[1],")") 

dbGetQuery(con, create_query); 

values_string <- "(" 
for (i in 1:dim(cli_ranges.pivoted)[1]){ 
for (j in 1:dim(cli_ranges.pivoted)[2]){ 
    if (j != dim(cli_ranges.pivoted)[2]) { 
     values_string <- paste0(values_string,cli_ranges.pivoted[i,j],",") 
    } else { 
     values_string <- paste0(values_string,cli_ranges.pivoted[i,j]) 
    } 
    } 
    if (i != dim(cli_ranges.pivoted)[1]){ 
     values_string <- paste0(values_string,"),(") 
    } else { 
     values_string <- paste0(values_string,")") 
    } 
} 

insert_query <- paste0("INSERT INTO cli_ranges (",paste0(colnames(cli_ranges.pivoted), collapse = ", "), 
") VALUES ", values_string) 

dbGetQuery(con, insert_query); 

$$ LANGUAGE plr; 
+0

問題是,'prod_ranges'的數量會隨着時間而改變,也就是說我無法定義'dcast'會產生多少列(我的表是通過'prod_ranges'值轉置的) – ant1j

+0

您能舉一個例子嗎?你對dcast或輸出的輸入(填入假數字等)? – DDrake

+0

原始問題中提供的插圖 – ant1j