2014-12-04 28 views
0

我正在運行中的R,其接收,看起來像一個錯誤的食譜:使用R,left_join方法不接受數據類型

> left_join(ann2012full,agglevel) Joining by: "agglvl_code" Error in data.table::setkeyv(y, by$x) : x is not a data.table

這兩個變量是ann2012full,一個300萬+ obs。 15個變量,和agglevel,一個56 obs。 2個變量,取自2個.csv文件。

根據其他帖子,還有其他人對dplyr的類似問題存在問題,但對於我來說by方法的框架並不清楚。是否有人能夠重複left_join函數,因爲它在更新之前?

兩個瓦爾有一個交叉點,並且功能出現在錯誤之前報告Joining by: "agglvl_code"承認:有問題的變量

> intersect(names(ann2012full),names(agglevel)) 
[1] "agglvl_code" 

第幾行......

head(ann2012full) 
    area_fips own_code industry_code agglvl_code size_code year qtr disclosure_code annual_avg_estabs_count annual_avg_emplvl 
1:  01000  0   10   50   0 2012 A         116233   1828248 
2:  01000  1   10   51   0 2012 A         1252    56031 
3:  01000  1   102   52   0 2012 A         1252    56031 
4:  01000  1   1021   53   0 2012 A          599    11734 
5:  01000  1   1022   53   0 2012 A          2    13 
6:  01000  1   1023   53   0 2012 A          17    161 
    total_annual_wages taxable_annual_wages annual_contributions annual_avg_wkly_wage avg_annual_pay 
1:  76768801894   13424728725   419383612     808   41990 
2:   4194319351     0     0     1440   74857 
3:   4194319351     0     0     1440   74857 
4:   719641114     0     0     1179   61330 
5:    436204     0     0     662   34437 
6:   12253089     0     0     1468   76343 

head(agglevel) 
    agglvl_code         agglvl_title 
1   10       National, Total Covered 
2   11   National, Total -- by ownership sector 
3   12  National, by Domain -- by ownership sector 
4   13 National, by Supersector -- by ownership sector 
5   14 National, NAICS Sector -- by ownership sector 
6   15 National, NAICS 3-digit -- by ownership sector 

什麼樣的問題看起來像str()...

> str(ann2012) 
Classes ‘data.table’ and 'data.frame': 3556289 obs. of 15 variables: 
$ area_fips    : chr "01000" "01000" "01000" "01000" ... 
$ own_code    : int 0 1 1 1 1 1 1 1 1 1 ... 
$ industry_code   : chr "10" "10" "102" "1021" ... 
$ agglvl_code   : int 50 51 52 53 53 53 53 53 53 53 ... 
$ size_code    : int 0 0 0 0 0 0 0 0 0 0 ... 
$ year     : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ... 
$ qtr     : chr "A" "A" "A" "A" ... 
$ disclosure_code  : chr "" "" "" "" ... 
$ annual_avg_estabs_count: int 116233 1252 1252 599 2 17 46 32 27 4 ... 
$ annual_avg_emplvl  : int 1828248 56031 56031 11734 13 161 1799 6131 903 632 ... 
$ total_annual_wages  :Class 'integer64' num [1:3556289] 3.79e-313 2.07e-314 2.07e-314 3.56e-315 2.16e-318 ... 
$ taxable_annual_wages :Class 'integer64' num [1:3556289] 6.63e-314 0.00 0.00 0.00 0.00 ... 
$ annual_contributions :Class 'integer64' num [1:3556289] 2.07e-315 0.00 0.00 0.00 0.00 ... 
$ annual_avg_wkly_wage : int 808 1440 1440 1179 662 1468 1581 1231 370 1716 ... 
$ avg_annual_pay   : int 41990 74857 74857 61330 34437 76343 82237 64031 19257 89240 ... 
- attr(*, ".internal.selfref")=<externalptr> 
> str(agglevel) 
'data.frame': 56 obs. of 2 variables: 
$ agglvl_code : int 10 11 12 13 14 15 16 17 18 21 ... 
$ agglvl_title: chr "National, Total Covered" "National, Total -- by ownership sector" "National, by Domain -- by ownership sector" "National, by Supersector -- by ownership sector" ... 

我有10個庫加載這個配方;總共有28個裝載。

> search() 
[1] ".GlobalEnv"    "package:tcltk"   "package:microbenchmark" "package:rbenchmark"  "package:choroplethr" 
[6] "package:RColorBrewer" "package:maps"   "package:ggplot2"  "package:stringr"  "package:dplyr"   
[11] "package:plyr"   "package:sqldf"   "package:RSQLite"  "package:DBI"   "package:gsubfn"   
[16] "package:proto"   "package:data.table"  "package:bit64"   "package:bit"   "tools:rstudio"   
[21] "package:stats"   "package:graphics"  "package:grDevices"  "package:utils"   "package:datasets"  
[26] "package:methods"  "Autoloads"    "package:base" 

***********************************找到解決辦法***** **************************

我得到了底部:我使用merge,而不是left_join,指定by爲多於NULL。那麼,究竟是什麼......

codes <- c('agglevel','industry','ownership','size') 
ann2012full <- ann2012 
for(i in 1:length(codes)){ 
    eval(parse(text=paste('ann2012full <- left_join(ann2012full, ',codes[i],')', sep=''))) 
} 

現在是...

codes <- c('agglevel','industry','ownership','size') 
ann2012full <- ann2012 
for(i in 1:length(codes)){ 
    barTitle <- intersect(names(ann2012full),names(eval(parse(text=codes[i])))) 
    eval(parse(text= paste('ann2012full <- merge(ann2012full, ',codes[i],',by="',barTitle,'")', sep=''))) 
} 

然而,似乎***_join在dplyr方法有缺陷,仍與最新的更新來解決。如果還有其他意見,我很樂意聽到它們,因爲它僅適用於修改後的代碼merge

謝謝,

+0

你能提供幾行ann2012full和agglevel嗎?此外,你可以顯示什麼str()返回兩個?最後,如果你顯示你已經加載的「庫的數量」,它可以幫助我們。 – lawyeR 2014-12-05 03:36:54

+0

看起來'dplyr'的行爲在新版本中可能已經改變。 – Arun 2014-12-05 19:26:07

+0

它似乎是這樣的:[rstudio](http://blog.rstudio.org/2014/10/13/dplyr-0-3-2/)有關於10/14/14改進的一些細節,但我沒有看到連接,並且更改顯示爲可選。 – double0darbo 2014-12-05 20:39:45

回答

3

我認爲你是在做同樣的問題,因爲我和有同樣的問題。這裏有兩個問題(至少對我來說,當我做到這一點時)。

第一個問題是,如果您遵循配方,您的其中一組數據是數據表,另一組數據是日期框架。第二組是as.data.table(code)

第二個問題是,在正在加入的字段中,它們是一個數據集中的整數,另一個數據集中的字符。所以這需要修復(只是做as.numeric()

編輯:此代碼是你想要的,並在我的機器上正常工作(除非你需要改變2013年至2012年與您的數據相匹配)。

codes <- c('agglevel', 'industry', 'ownership', 'size') 
ann2013full <- ann2013 
ann2013full$agglvl_code <- as.numeric(ann2013full$agglvl_code) 
ann2013full$own_code <- as.numeric(ann2013full$own_code) 
ann2013full$size_code <- as.numeric(ann2013full$size_code) 
for(code in codes){ 
    eval(parse(text = paste('ann2013full <- left_join(ann2013full,as.data.table(', code,'))', sep = ''))) 
} 
相關問題