如何創建組成員的表格或數據框（從長格式數據中按組分組）？

我正在處理一些聚類分析結果。我正在嘗試爲我正在進行的每個羣集分析創建羣集成員表。如何創建組成員的表格或數據框（從長格式數據中按組分組）？

例如：

test_data <- data.frame(
     Cluster = sample(1:5,100,replace=T), 
     Item = sample(LETTERS[1:20],5, replace=F)) 

head(test_data) 
    Cluster Item 
1  2 R 
2  5 F 
3  1 T 
4  5 Q 
5  3 B 
6  3 J

我想產生這樣的：

Cluster_1 Cluster_2 Cluster_3 Cluster_4 Cluster_5 
     T   R   C   P   L 
     K   O   J   M   Q 
     I   H   B   N   F 
     D         G   E 
     S            A

我第一次嘗試spread，但並沒有與這些數據進行工作

spread(test_data, item,group)

錯誤：行重複標識符

spread(test_data, group,item)

錯誤：重複標識符行

然後我試圖：

test_frame <- split.data.frame(test_data,test_data$group)

但是，這會導致數據幀的列表，以及每個組的數據幀。我沒有能夠成功地將它變成我想要的東西。

我試過unnest和unlist，但由於每個組的成員元素數量不同，這些功能會給出錯誤。

引入NA就沒問題。

有沒有一種簡單的方法可以實現我忽略的功能？

來源

2017-07-28 JLC

test_data <- data.frame(
     Cluster = sample(1:5,100,replace=T), 
      Item = sample(LETTERS[1:20],5, replace=T),stringsAsFactors = FALSE) 

m <- with(test_data,tapply(Item,paste("Cluster",Cluster,sep="_"),I)) 
e <- data.frame(sapply(m,`length<-`,max(lengths(m)))) 
    print(e,na.print="")

來源

2017-07-28 22:48:49 Onyambu

簡潔並做好工作 - 謝謝！ – JLC

重做了我的答案。所有在基地R.合理簡潔：

test_data <- data.frame(
    Cluster = sample(1:5,100,replace=T), 
    Item = sample(LETTERS[1:20],5, replace=T), stringsAsFactors=FALSE) 

clusters <- unique(test_data$Cluster) 

test_data <- lapply(clusters, function(i) { 
    test_data[test_data$Cluster == i,]$Item }) 

n_max <- Reduce(f=max, x=lapply(test_data, FUN=length)) 

test_data <- lapply(test_data, function(i) {length(i) <- n_max; i}) 

test_data <- Reduce(x=test_data, f=cbind) 

test_data <- as.data.frame(test_data) 

names(test_data) <- paste0('Cluster_', clusters) 

test_data

來源

2017-07-28 21:32:02

謝謝！不幸的是，我得到一個錯誤與上面的代碼：「錯誤的減少（，cbind）：參數‘初始化’缺失，沒有默認值」 – JLC

現在，它的丟失數據： 'STR（test_data3）指定的int 1 - attr（*，「names」）= chr「Cluster_」 – JLC

最近的編輯接近，但它將每個元素強制轉換爲整數。我的數據實際上是字符串（項目名稱），所以我需要將這些字符串保持爲字符。 – JLC

這是一個解決方案，使用tidyverse。 test_final是最終的輸出。

# Load package 
library(tidyverse) 

# Set seed for reproducibility 
set.seed(123) 

# Create example data frame 
test_data <- data.frame(
    Cluster = sample(1:5,100,replace=T), 
    Item = sample(LETTERS[1:20],5, replace=T)) 

# Split the data frame into a list of data frames 
test_list <- test_data %>% 
    mutate(Item = as.character(Item)) %>% 
    arrange(Cluster) %>% 
    split(f = .$Cluster) 

# Find out the maximum row number of each data frame 
max_row <- max(map_int(test_list, nrow)) 

# Design a function to process each data frame in test_list 
process_fun <- function(dt, max_row){ 

    # Append NA to the Item column 
    dt_vec <- dt$Item 
    dt_vec2 <- c(dt_vec, rep(NA, max_row - nrow(dt))) 
    # Get the cluster number 
    clusterNum <- unique(dt$Cluster) 
    # Create a new data frame 
    dt2 <- data_frame(Item = dt_vec2) 
    # Change column name 
    colnames(dt2) <- paste("Cluster", clusterNum, sep = "_") 
    return(dt2) 
} 

# Process the data 
test_final <- test_list %>% 
    map(process_fun, max_row = max_row) %>% 
    bind_cols()

來源

2017-07-28 22:04:21 www

這個作品 - 謝謝！ – JLC

我很高興它的工作原理。如果您認爲此答案有用，請通過檢查此帖子左上角的綠色標記來接受此問題。 – www

如何創建組成員的表格或數據框（從長格式數據中按組分組）？

回答

相關問題