2015-11-01 60 views
14

我正在使用dplyr,並且我有一個分組data.frame。我試圖在這個grouped_df的select函數刪除列,但得到的錯誤消息不能刪除列 - 選擇()與dplyr

> tbl %>% select(-names) 
Error: corrupt 'grouped_df', contains 42 rows, and 965 rows in groups 

我的數據如下。

> print(tbl_df(tbl), n = 1000) 
Source: local data frame [42 x 15] 

        household          names x2003 x2004 x2005 x2006 x2007 x2008 x2009 x2012 last.avail last.avail.year absChange.last annChange.last   translation 
         (chr)          (fctr) (int) (int) (int) (int) (int) (int) (int) (int)  (int)   (dbl)   (int)   (dbl)    (fctr) 
1    all households          bostad66950 73340 72350 77750  77750   2012   18470 0.030594980   Accomodation 
2    all households       fritid och kultur 45140 46140 49260 48640 49720 55120 53970 61170  61170   2012   16030 0.034341864 Leisure and culture 
3    all households         transport 41930 40430 45870 48850 47280 50250 42650 49940  49940   2012   8010 0.019614408  Transportation 
4    all households        köpta livsmedel 28420 30000 29130 30420 30750 34130 34780 34570  34570   2012   6150 0.022004509  Bought Groceries 
5    all households hyra/avgift för hyres-/borätt (inkl garage) 27310 27720 28860 30000 28990 29660 30740  NA  30740   2009   3430 0.019914330 Rent for accomodation 
6    all households       hushållstjänster 11360 12030 13200 12390 8520 10250 13530 22900  22900   2012   11540 0.081007165 Household services 
7   cohabit with child          bostad 78240 83040 81390 79180 90490 95630 100060 100980  100980   2012   22740 0.028754709   Accomodation 
8   cohabit with child       fritid och kultur 67110 67640 67290 64600 74290 71890 77200 81180  81180   2012   14070 0.021373640 Leisure and culture 
9   cohabit with child         transport 58350 62440 70010 69560 68730 75290 65510 71340  71340   2012   12990 0.022584342  Transportation 
10   cohabit with child        köpta livsmedel 45190 45660 45720 44980 48250 52880 52770 52710  52710   2012   7520 0.017250361  Bought Groceries 
11   cohabit with child       hushållstjänster 19840 21380 25690 21430 17190 19060 24730 37440  37440   2012   17600 0.073108900 Household services 
12   cohabit with child        räntor (brutto) 27090 25230 24390 24500 28510 36030 33080  NA  33080   2009   5990 0.033854485   Rents (net) 
13  cohabit without child          bostad 60340 63230 63560 61760 67100 74160 70440 78510  78510   2012   18170 0.029679783   Accomodation 
14  cohabit without child       fritid och kultur 51120 48780 57700 57320 57620 67220 62460 68400  68400   2012   17280 0.032884345 Leisure and culture 
15  cohabit without child         transport 49740 46310 55580 57730 56770 54910 52720 59360  59360   2012   9620 0.019839931  Transportation 
16  cohabit without child        köpta livsmedel 31130 33700 31900 33000 33990 37330 37980 37090  37090   2012   5960 0.019654591  Bought Groceries 
17  cohabit without child        drift av bil 24370 21790 25170 27530 25140 28180 26650  NA  26650   2009   2280 0.015017696   Car expenses 
18  cohabit without child       hushållstjänster 11650 12400 12260 12310 8580 11920 13950 26370  26370   2012   14720 0.095016005 Household services 
19 other cohabit with child       fritid och kultur 67680 75550 78020 75800 88870 80070 84490 116020  116020   2012   48340 0.061715253 Leisure and culture 
20 other cohabit with child          bostad 73850 68740 84800 86510 89290 106540 89650 100580  100580   2012   26730 0.034920030   Accomodation 
21 other cohabit with child         transport 66950 79620 75730 77800 81010 93790 77960 98660  98660   2012   31710 0.044022982  Transportation 
22 other cohabit with child        köpta livsmedel 54070 53790 50680 51440 53720 64170 62050 63690  63690   2012   9620 0.018360752  Bought Groceries 
23 other cohabit with child        drift av bil 32690 34180 37530 36200 38280 38990 36390  NA  36390   2009   3700 0.018031437   Car expenses 
24 other cohabit with child       hushållstjänster 15690 21000 20810 20370 9990 11880 19710 32460  32460   2012   16770 0.084128145 Household services 
25   other households          bostad 62860 68680 69950 72840 70700 91510 84480 86020  86020   2012   23160 0.035466655   Accomodation 
26   other households       fritid och kultur 49940 48530 55280 57970 54470 61130 65280 67920  67920   2012   17980 0.034758001 Leisure and culture 
27   other households         transport 50590 41980 57370 64960 52780 61460 59770 59630  59630   2012   9040 0.018435074  Transportation 
28   other households        köpta livsmedel 35370 35210 35360 41560 35040 43770 45940 43270  43270   2012   7900 0.022652258  Bought Groceries 
29   other households        drift av bil 21440 21580 25640 30070 28260 30070 32010  NA  32010   2009   10570 0.069079862   Car expenses 
30   other households hyra/avgift för hyres-/borätt (inkl garage) 29550 32320 25170 24600 29480 35290 25920  NA  25920   2009   -3630 -0.021607942 Rent for accomodation 
31    single parent          bostad 67890 67250 71200 75210 71000 73490 74710 81820  81820   2012   13930 0.020953501   Accomodation 
32    single parent       fritid och kultur 34900 35860 43600 46770 43540 46160 45840 51000  51000   2012   16100 0.043049627 Leisure and culture 
33    single parent hyra/avgift för hyres-/borätt (inkl garage) 43360 44020 45160 49430 45370 44090 48740  NA  48740   2009   5380 0.019685026 Rent for accomodation 
34    single parent         transport 27230 30810 28810 28410 30500 30390 29360 34890  34890   2012   7660 0.027925124  Transportation 
35    single parent        köpta livsmedel 26420 27910 28160 29100 28310 33020 35910 33740  33740   2012   7320 0.027546212  Bought Groceries 
36    single parent       hushållstjänster 9490 11690 13770 8650 7250 10390 11490 17140  17140   2012   7650 0.067891620 Household services 
37 single parent without child          bostad 45660 47110 48750 50850 51610 55720 56020 61090  61090   2012   15430 0.032876143   Accomodation 
38 single parent without child       fritid och kultur 28270 31890 31140 30210 28480 35650 32840 41770  41770   2012   13500 0.044329701 Leisure and culture 
39 single parent without child hyra/avgift för hyres-/borätt (inkl garage) 31900 32160 33010 36300 34300 35330 37800  NA  37800   2009   5900 0.028687635 Rent for accomodation 
40 single parent without child         transport 26730 22980 24530 29310 28440 31680 20150 28800  28800   2012   2070 0.008322088  Transportation 
41 single parent without child        köpta livsmedel 15330 16930 16150 17630 17280 18390 19370 19580  19580   2012   4250 0.027561531  Bought Groceries 
42 single parent without child       hushållstjänster 6570 6590 6840 7080 3780 4300 7000 12310  12310   2012   5740 0.072257733 Household services 

這是什麼問題,以及如何解決這個問題?

+4

試試'ungroup()'即'tbl%>%ungroup()%>%select(-names)' – akrun

+0

Works。這種行爲背後的機制是什麼?你知道我在哪裏可以閱讀更多關於此? – uncool

+2

我的猜測:錯誤告訴你這個問題 - 你的對象被破壞了,可能是與'grouped_df'相關的屬性。 'ungroup'刪除這些。這個開放的bug也可能是一個線索:https://github.com/hadley/dplyr/issues/1385如果沒有,也許你可以提交一個新的bug。 – Frank

回答

18

如果將要刪除的變量用作分組變量,我們需要在select中使用該變量之前ungroup。在當前dplyr版本(dplyr_0.4.3)是這樣的話,但它可能會或可能不會在未來dplyr版本改變

tbl %>% 
    ungroup() %>% 
    select(-names) 

由於corrupted grouped data一個例子,假設如果我們試圖從'刪除列「Y」 DF3'

dat3 %>% 
    select(-y) 
#Error: corrupt 'grouped_df', contains 1100 rows, and 1000 rows in groups 

通過檢查str(dat3)

str(dat3) 
#Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1100 obs. of 2 variables: 
# $ group: Factor w/ 3 levels "A","B","C": 2 3 2 2 2 2 1 2 2 1 ... 
# $ y : num 1.396 -0.892 1.065 0.801 -0.368 ... 
# - attr(*, "vars")=List of 1 
# ..$ : symbol group 
# - attr(*, "drop")= logi TRUE 
# - attr(*, "indices")=List of 3 
# ..$ : int 6 9 12 13 14 16 18 21 25 27 ... 
# ..$ : int 0 2 3 4 5 7 8 10 11 15 ... 
# ..$ : int 1 17 24 28 35 37 39 43 47 49 ... 
# - attr(*, "group_sizes")= int 323 365 312 
# - attr(*, "biggest_group_size")= int 365 
# - attr(*, "labels")='data.frame':  3 obs. of 1 variable: 
# ..$ group: Factor w/ 3 levels "A","B","C": 1 2 3 
# ..- attr(*, "vars")=List of 1 
# .. ..$ : symbol group 
# ..- attr(*, "drop")= logi TRUE 

我們發現attr是通過rbind ING增加,而是如果我們用bind_rows

dat4 <- bind_rows(dat1, dat2) 
str(dat4) 
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  1100 obs. of 2 variables: 
# $ group: chr "B" "C" "B" "B" ... 
# $ y : num 1.396 -0.892 1.065 0.801 -0.368 ... 

我們可以從 'DAT4'

dat4 %>% 
    select(-y) 

由於OP沒有表現出 'TBL' 是怎麼去除 'Y' 列創建時,我們只能假設它是使用一些由數據集通過添加屬性而破壞的方法創建的。

+0

事情是。分組變量僅在「房屋」欄中。 – uncool

+1

@uncool你沒有顯示代碼來獲得'tbl'和'dput'的小數據集,它給出了損壞的分組df。這將有助於我們更好地解釋。一般來說,如果分組變量將被刪除,則需要取消分組。 – akrun

+1

@uncool我更新了一個複製錯誤的示例。 – akrun