2017-09-01 20 views
0

我想創建一個新列,其中每個值都是我的數據中該行其他值的隨機子集。創建新列是其他列的隨機子集

# Example data: 
df <- data.frame(matrix(nrow = 57, ncol = 6)) %>% 
    mutate(
    X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1) 
) 

# my failed attempt at a new column 
df %>% 
    rowwise() %>% 
    mutate(X7 = str_c(df[, sample(1:6, 3, replace = F)]), sep = ", ") 
+0

忘記'rowwise'和使用'樣品(1:6,1,替換= F) '。只有一列不是3。順便說一句,爲什麼'str_c'?你不想用數字填充'X7'嗎?像這樣你會有角色。 –

+0

@RuiBarradas我希望X7的每個值都是來自其自己行的3個隨機值的向量。 – Joe

回答

2

解決方案使用tidyverse。關鍵是按行分割數據並應用函數來爲每個行子集採樣值。 map_df可以實現上述任務並將所有的輸出結合到一個數據幀中。 df2是最終的輸出。

# Load package 
library(tidyverse) 

# Set seed 
set.seed(123) 

# Create example data frame 
df <- data.frame(matrix(nrow = 57, ncol = 6)) %>% 
    mutate(
    X1 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X2 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X3 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X4 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X5 = round(rnorm(n = 57, mean = 0, sd = 1), 1), 
    X6 = round(rnorm(n = 57, mean = 0, sd = 1), 1) 
) 

# Process the data 
df2 <- df %>% 
    rowid_to_column() %>% 
    split(f = .$rowid) %>% 
    map_df(function(dt){ 
    dt_sub <- dt %>% 
     select(-rowid) %>% 
     select(sample(1:6, 3, replace = FALSE)) %>% 
     unite(X7, everything(), sep = ", ") 
    return(dt_sub) 
    }) %>% 
    bind_cols(df) %>% 
    select(paste0("X", 1:7)) 

df2 
    X1 X2 X3 X4 X5 X6    X7 
1 -0.6 0.6 0.5 0.1 0.9 0.1 0.1, 0.5, 0.9 
2 -0.2 0.1 0.3 0.0 -1.0 0.2 0.1, 0.3, 0.2 
3 1.6 0.2 0.1 2.1 2.0 1.6 1.6, 2.1, 0.1 
4 0.1 0.4 -0.6 -0.7 -0.1 -0.2 0.1, 0.4, -0.6 
5 0.1 -0.5 -0.8 -1.1 0.2 0.2 0.1, 0.2, -0.5 
6 1.7 -0.3 -1.0 0.0 -0.7 1.2 -1, -0.7, -0.3 
7 0.5 -1.0 0.1 0.3 -0.6 1.1 0.5, -0.6, -1 
... 
1

我認爲最好的辦法是使用基礎R功能replicatesamplesapply

inx <- t(replicate(nrow(df), sample(1:6, 3, replace = F))) 
df$X7 <- sapply(seq_len(nrow(df)), function(i) 
      paste(df[i, inx[i, ]], collapse = ", ")) 
+0

@ycw完成。錯誤更正。 –

1

這是dplyr溶液:

library(dplyr) 

df %>% 
    group_by(idx = seq(n())) %>% 
    do({ 
    res <- select(., -idx) 
    bind_cols(res, X7 = toString(sample(unlist(res), 
             3, replace = FALSE))) 
    }) %>% 
    ungroup() %>% 
    select(-idx) 

其結果是:

# A tibble: 57 x 7 
     X1 X2 X3 X4 X5 X6    X7 
    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <chr> 
1 0.4 0.4 -0.1 3.4 0.9 -0.4 0.4, 0.9, 0.4 
2 1.5 0.9 -0.7 1.5 -1.1 -0.3 -0.7, 1.5, -1.1 
3 -0.1 -0.5 -0.6 -0.8 -0.3 2.3 -0.3, 2.3, -0.8 
4 0.7 -1.0 0.3 0.2 -0.5 -0.3 -1, 0.3, -0.3 
5 0.6 0.9 0.4 1.9 -0.7 -2.0 0.4, -2, 0.9 
6 0.3 0.7 1.3 0.6 1.3 -0.2 0.7, -0.2, 1.3 
7 0.5 0.3 1.1 -0.2 -0.4 -0.8 0.5, 1.1, 0.3 
8 0.4 -1.9 0.8 -0.6 -1.1 0.4 0.4, -1.9, -0.6 
9 0.2 -1.5 -1.9 1.0 0.0 0.6  0, 1, 0.6 
10 -0.2 0.7 -0.5 1.4 0.3 -0.1 -0.2, 0.3, -0.5 
+1

@ycw好主意,謝謝指出!我相應地修改了我的答案。 –