2014-02-06 102 views
4

我剛剛發現了這個bug,卻發現有人叫它"feature"。這使rbindlist不像do.call("rbind",l) 將尊重列名稱。此外,在文檔中沒有提到這種完全意外的行爲。這真的是故意的嗎?爲什麼rbindlist不尊重列名?

代碼例如:

> library(data.table) 
> DT1 <- data.table(a=1, b=2) 
> DT2 <- data.table(b=3, a=4) 
> DT1 
a b 
1: 1 2 
> DT2 
b a 
1: 3 4 

我期望rbind「荷蘭國際集團這些會產生具有= 1,4的列; b = 2,3。並得到rbind.data.tablerbind.data.frame,雖然rbind.data.table產生警告。

> rbind(DT1, DT2) 
a b 
1: 1 2 
2: 4 3 
Warning message: 
In data.table::.rbind.data.table(...) : 
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning. 
> rbind(as.data.frame(DT1), as.data.frame(DT2)) 
a b 
1 1 2 
2 4 3 
> do.call('rbind', list(DT1, DT2)) 
a b 
1: 1 2 
2: 4 3 
Warning message: 
In data.table::.rbind.data.table(...) : 
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning. 

rbindlist,但是,很高興地默默破壞數據:

> rbindlist(list(DT1, DT2)) 
a b 
1: 1 2 
2: 3 4 
+1

看一看這個[出色答卷(http://stackoverflow.com/a/15673654/1627235)。 –

+2

'rbindlist'針對速度進行了優化。匹配列名稱會適得其反,我希望默認行爲不會改變。但是,可以免費提交功能請求。 – Roland

+0

斯文,我鏈接到我的文章。這對我來說似乎並不特別權威。羅蘭,如果你正在破壞數據,速度毫無用處。默默地在那。此外,如果名稱不被尊重,那麼使用具有命名列的數據結構有什麼意義? – James

回答

5

該功能在commit 1266 of v1.9.3現已實現。從NEWS

o 'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented 
    entirely in C. Closes #5249  
    -> use.names by default is FALSE for backwards compatibility (doesn't bind by 
    names by default) 
    -> rbind(...) now just calls rbindlist() internally, except that 'use.names' 
    is TRUE by default, for compatibility with base (and backwards compatibility). 
    -> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE. 
    -> At least one item of the input list has to have non-null column names. 
    -> Duplicate columns are bound in the order of occurrence, like base. 
    -> Attributes that might exist in individual items would be lost in the bound result. 
    -> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible. 
    -> And incredibly fast ;). 
    -> Documentation updated in much detail. Closes DR #5158. 

有了這個,你可以設置use.names=TRUE通過名稱綁定。爲了向後兼容,默認設置爲FALSE。或者,您可以使用rbind(..),其中use.names=TRUE也是爲了向後兼容。

有關更多示例,請參見this post,對於基準,請參閱this post

實例:

1)只需設置use.names=TRUE

DT1 <- data.table(x=1, y=2) 
DT2 <- data.table(y=1, x=2) 

rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE) 
# x y 
# 1: 1 2 
# 2: 2 1 

DT1 <- data.table(x=1, y=2) 
DT2 <- data.table(z=2, y=1) 

# returns error when fill=FALSE but can't be bound without fill=TRUE 
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE) 
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) : 
    # Answer requires 3 columns whereas one or more item(s) in the input 
    # list has only 2 columns. ... 

2)也結合重複的列名中出現的順序:

DT1 <- data.table(x=1, x=2, y=10, y=20, y=30) 
DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30) 

rbindlist(list(DT1,DT2), use.names=TRUE) 

#  x x y y y 
# 1: 1 2 10 20 30 
# 2: -2 -1 -10 -20 -30 

3)使用fill=TRUE,如果你想通過名稱綁定,並填寫缺少的列

DT1 <- data.table(x=1, y=2) 
DT2 <- data.table(y=2, z=-1) 

rbindlist(list(DT1, DT2), fill=TRUE) 
#  x y z 
# 1: 1 2 NA 
# 2: NA 2 -1 

HTH

相關問題