將一個向量拆分爲R塊

157

我必須在R中將向量拆分爲n個相同大小的塊。我找不到任何基本函數來完成此操作。谷歌也沒有讓我到任何地方。所以這就是我想到的，希望它能幫助某個地方的某個人。將一個向量拆分爲R塊

x <- 1:10 
n <- 3 
chunk <- function(x,n) split(x, factor(sort(rank(x)%%n))) 
chunk(x,n) 
$`0` 
[1] 1 2 3 

$`1` 
[1] 4 5 6 7 

$`2` 
[1] 8 9 10

任何意見，建議或改進，真的歡迎和讚賞。

乾杯，塞巴斯蒂安

來源

2010-07-23 Sebastian

是的，這是非常不清楚，你得到的是解決「大小相等的n個塊」。但是，也許這也讓你在那裏：x < - 1:10; n < - 3; split（x，cut（x，n，labels = FALSE）） – mdsumner 2010-07-23 14:08:03

問題中的解決方案和前面的註釋中的解決方案都不正確，因爲如果向量具有重複條目，則它們可能無法正常工作。試試這個： > foo <-c（rep（1,12），rep（2,3），rep（3,3）） [1] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 3 3 >塊（FOO，2）（給出錯誤的結果） >塊（FOO，3）（也有錯） – mathheadinclouds 2013-04-29 09:21:35

（繼續前面的評論）爲什麼呢？ rank（x）不需要是整數 > rank（c（1,1,2,3）） [1] 1.5 1.5 3.0 4.0 所以這就是問題中的方法失敗的原因。（x，n）分割（x，cut（seq_along（x），n，labels = FALSE）） – mathheadinclouds 2013-04-29 09:33:14

你可以合併拆分/切割，通過mdsummer的建議，與位數來創造出組：

split(x,cut(x,quantile(x,(0:n)/n), include.lowest=TRUE, labels=FALSE))

這爲你的例子同樣的結果，但不對於偏斜變量。

來源

2010-07-23 14:22:55 SiggyF

到樁數更多的變種...

> x <- 1:10 
> n <- 3

注意，你不需要在這裏使用factor功能，但你還是要sort O/W你的第一個載體可以1 2 3 10：

> chunk <- function(x, n) split(x, sort(rank(x) %% n)) 
> chunk(x,n) 
$`0` 
[1] 1 2 3 
$`1` 
[1] 4 5 6 7 
$`2` 
[1] 8 9 10

或者你可以指定字符索引，副左側數字的上方蜱：

> my.chunk <- function(x, n) split(x, sort(rep(letters[1:n], each=n, len=length(x)))) 
> my.chunk(x, n) 
$a 
[1] 1 2 3 4 
$b 
[1] 5 6 7 
$c 
[1] 8 9 10

或者您可以使用存儲在向量中的純字詞名稱。請注意，使用sort在x獲得連續的值按字母順序排列的標籤：

> my.other.chunk <- function(x, n) split(x, sort(rep(c("tom", "dick", "harry"), each=n, len=length(x)))) 
> my.other.chunk(x, n) 
$dick 
[1] 1 2 3 
$harry 
[1] 4 5 6 
$tom 
[1] 7 8 9 10

來源

2010-07-23 14:38:42

這將分成不同的看法給你有什麼，但仍是一個相當不錯的表結構，我認爲：

chunk.2 <- function(x, n, force.number.of.groups = TRUE, len = length(x), groups = trunc(len/n), overflow = len%%n) { 
    if(force.number.of.groups) { 
    f1 <- as.character(sort(rep(1:n, groups))) 
    f <- as.character(c(f1, rep(n, overflow))) 
    } else { 
    f1 <- as.character(sort(rep(1:groups, n))) 
    f <- as.character(c(f1, rep("overflow", overflow))) 
    } 

    g <- split(x, f) 

    if(force.number.of.groups) { 
    g.names <- names(g) 
    g.names.ordered <- as.character(sort(as.numeric(g.names))) 
    } else { 
    g.names <- names(g[-length(g)]) 
    g.names.ordered <- as.character(sort(as.numeric(g.names))) 
    g.names.ordered <- c(g.names.ordered, "overflow") 
    } 

    return(g[g.names.ordered]) 
}

哪將根據你想要的格式給你以下內容：

> x <- 1:10; n <- 3 
> chunk.2(x, n, force.number.of.groups = FALSE) 
$`1` 
[1] 1 2 3 

$`2` 
[1] 4 5 6 

$`3` 
[1] 7 8 9 

$overflow 
[1] 10 

> chunk.2(x, n, force.number.of.groups = TRUE) 
$`1` 
[1] 1 2 3 

$`2` 
[1] 4 5 6 

$`3` 
[1] 7 8 9 10

使用這些設置運行幾個定時：

set.seed(42) 
x <- rnorm(1:1e7) 
n <- 3

然後我們有以下結果：

> system.time(chunk(x, n)) # your function 
    user system elapsed 
29.500 0.620 30.125 

> system.time(chunk.2(x, n, force.number.of.groups = TRUE)) 
    user system elapsed 
    5.360 0.300 5.663

編輯：從as.factor改變（）來as.character（）在我的功能使得它快兩倍。

來源

2010-07-23 14:39:04

split(x,matrix(1:n,n,length(x))[1:length(x)])

也許這是更爲明確的，但同樣的想法：
split(x,rep(1:n, ceiling(length(x)/n),length.out = length(x)))

如果你想訂購，扔掉它周圍

來源

2010-07-23 16:30:26 frankc

232

一個班輪分裂d成大小20的大塊：

split(d, ceiling(seq_along(d)/20))

更多細節：我認爲所有你需要的是seq_along()，split()和ceiling()：

> d <- rpois(73,5) 
> d 
[1] 3 1 11 4 1 2 3 2 4 10 10 2 7 4 6 6 2 1 1 2 3 8 3 10 7 4 
[27] 3 4 4 1 1 7 2 4 6 0 5 7 4 6 8 4 7 12 4 6 8 4 2 7 6 5 
[53] 4 5 4 5 5 8 7 7 7 6 2 4 3 3 8 11 6 6 1 8 4 
> max <- 20 
> x <- seq_along(d) 
> d1 <- split(d, ceiling(x/max)) 
> d1 
$`1` 
[1] 3 1 11 4 1 2 3 2 4 10 10 2 7 4 6 6 2 1 1 2 

$`2` 
[1] 3 8 3 10 7 4 3 4 4 1 1 7 2 4 6 0 5 7 4 6 

$`3` 
[1] 8 4 7 12 4 6 8 4 2 7 6 5 4 5 4 5 5 8 7 7 

$`4` 
[1] 7 6 2 4 3 3 8 11 6 6 1 8 4

來源

2010-07-23 19:22:21 Harlan

+18

問題要求大小相同的「n」個塊。這會讓你獲得數量未知的大小爲「n」的塊。我遇到了同樣的問題，並使用@mathheadinclouds提供的解決方案。 – rrs 2014-04-21 18:26:59

從d1的輸出中可以看出，這個答案不會將d分成相等大小的組（4顯然更短）。因此它不回答這個問題。 – Calimo 2015-01-23 16:39:58

@rrs：split（d，ceiling（seq_along（d）/（length（d）/ n））） – gkcn 2015-06-05 11:45:13

chunk2 <- function(x,n) split(x, cut(seq_along(x), n, labels = FALSE))

來源

2013-04-29 09:37:48 mathheadinclouds

我需要同樣的功能，並已經閱讀以前的解決方案，但我也需要有平衡塊是在年底，即如果我有10個元素將它們分成3個向量，那麼我的結果應該分別具有3,3,4個元素的向量。所以我用以下（我離開未優化了可讀性的代碼，否則沒有必要有很多變量）：

chunk <- function(x,n){ 
    numOfVectors <- floor(length(x)/n) 
    elementsPerVector <- c(rep(n,numOfVectors-1),n+length(x) %% n) 
    elemDistPerVector <- rep(1:numOfVectors,elementsPerVector) 
    split(x,factor(elemDistPerVector)) 
} 
set.seed(1) 
x <- rnorm(10) 
n <- 3 
chunk(x,n) 
$`1` 
[1] -0.6264538 0.1836433 -0.8356286 

$`2` 
[1] 1.5952808 0.3295078 -0.8204684 

$`3` 
[1] 0.4874291 0.7383247 0.5757814 -0.3053884

來源

2013-06-23 07:41:00

這裏的另一種變體。

注：此示例你的第二個參數

所有塊是統一的，除了最後指定塊的大小;
最後的最壞情況會更小，從不會超過塊大小。

chunk <- function(x,n) 
{ 
    f <- sort(rep(1:(trunc(length(x)/n)+1),n))[1:length(x)] 
    return(split(x,f)) 
} 

#Test 
n<-c(1,2,3,4,5,6,7,8,9,10,11) 

c<-chunk(n,5) 

q<-lapply(c, function(r) cat(r,sep=",",collapse="|")) 
#output 
1,2,3,4,5,|6,7,8,9,10,|11,|

來源

2013-09-14 16:41:11 eAndy

感謝@Sebastian這個function

chunk <- function(x,y){ 
     split(x, factor(sort(rank(row.names(x))%%y))) 
     }

來源

2014-12-05 15:24:25 WillJ

如果你不喜歡split()，你不介意的NA填充你的短尾巴：

chunk <- function(x, n) { if((length(x)%%n)==0) {return(matrix(x, nrow=n))} else {return(matrix(append(x, rep(NA, n-(length(x)%%n))), nrow=n))} }

返回的列的ma trix（[，1：ncol]）是您正在尋找的機器人。

來源

2014-12-23 17:42:01 verbamour

如果你不喜歡split()和你不喜歡matrix()（其晃來晃去，NAS），還有就是：

chunk <- function(x, n) (mapply(function(a, b) (x[a:b]), seq.int(from=1, to=length(x), by=n), pmin(seq.int(from=1, to=length(x), by=n)+(n-1), length(x)), SIMPLIFY=FALSE))

像split()，它返回一個列表，但它不浪費時間或帶有標籤的空間，因此可能會更高效。

來源

2014-12-23 18:26:24 verbamour

嘗試GGPLOT2功能，cut_number：

library(ggplot2) 
x <- 1:10 
n <- 3 
cut_number(x, n) # labels = FALSE if you just want an integer result 
#> [1] [1,4] [1,4] [1,4] [1,4] (4,7] (4,7] (4,7] (7,10] (7,10] (7,10] 
#> Levels: [1,4] (4,7] (7,10] 

# if you want it split into a list: 
split(x, cut_number(x, n)) 
#> $`[1,4]` 
#> [1] 1 2 3 4 
#> 
#> $`(4,7]` 
#> [1] 5 6 7 
#> 
#> $`(7,10]` 
#> [1] 8 9 10

來源

2015-01-09 13:41:45

這不適用於拆分[this comment]中定義的'x'，'y'或'z'（https://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-在-R＃comment84830680_3318333）。具體來說，它根據應用對結果進行分類，結果可能會也可能不會。 – Kalin 2018-02-21 17:42:04

相反，[此評論]（https://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r#comment84830878_3318333）。 – Kalin 2018-02-21 17:48:49

simplified version... 
n = 3 
split(x, sort(x%%n))

來源

2016-04-20 21:03:46 zhan2383

我喜歡這個，因爲它給你儘可能大小的塊（對於分割大任務例如適應有限的RAM或跨多個線程運行任務而言是很好的）。 – alexvpickering 2016-07-21 22:13:20

這很有用，但請記住這隻適用於數字向量。 – 2016-08-24 17:49:43

我需要一個函數，它接受一個data.table的參數（在引號），另一種說法是對的數量上限該原始data.table的子集中的行。此功能將產生data.tables的任何數量的上限允許：

library(data.table)  
split_dt <- function(x,y) 
    { 
    for(i in seq(from=1,to=nrow(get(x)),by=y)) 
     {df_ <<- get(x)[i:(i + y)]; 
      assign(paste0("df_",i),df_,inherits=TRUE)} 
    rm(df_,inherits=TRUE) 
    }

此功能給了我一系列data.tables的名字命名的DF_ [數字]與起始行從原來的data.table 。最後的data.table可以是簡短的，並填充了NAs，因此您必須將其歸入任何剩餘的數據。這種類型的功能很有用，因爲某些GIS軟件限制了您可以導入多少個地址引腳。因此，不建議將數據表分成更小的塊，但可能無法避免。通過簡單地使用索引拆分矢量

來源

2017-03-26 21:24:53 rferrisx

簡單的功能 - 無需過度複雜化這個

vsplit <- function(v, n) { 
    l = length(v) 
    r = l/n 
    return(lapply(1:n, function(i) { 
     s = max(1, round(r*(i-1))+1) 
     e = min(l, round(r*i)) 
     return(v[s:e]) 
    })) 
}

來源

2018-02-08 14:30:34

將一個向量拆分爲R塊

回答

相關問題