將逗號分隔的條目轉換爲列

我有一個包含多列的數據集，其中一列是反應時間列。這些反應時間用逗號分隔以表示不同試驗的反應時間（相同參與者的反應時間）。將逗號分隔的條目轉換爲列

例如：行1（即：從與會者1的數據）具有的列下下面的「反應時間」

reaction_times 
2000,1450,1800,2200

因此這些參與者1爲試驗1,2,3,4的反應時間。

我現在想要創建一個新的數據集，其中這些試驗的反應時間都形成單個列。這樣我可以計算每個試驗的平均反應時間。

   trial 1 trial 2 trial 3 trial 4 
participant 1: 2000  1450  1800  2200

我嘗試了「colsplit」，從「reshape2」 -package但這似乎並沒有拆我的數據到新的列（也許是因爲我的數據全在1個細胞）。

有什麼建議嗎？

來源

2011-12-11 user1092247

我認爲你正在尋找strsplit（）函數;

a = "2000,1450,1800,2200" 
strsplit(a, ",") 
[[1]]                                      
[1] "2000" "1450" "1800" "2200"

注意strsplit返回一個列表，在這種情況下，只有一個元素。這是因爲strsplit將向量作爲輸入。因此，您還可以將單個單元格字符的長矢量放入函數中，並獲取該矢量的分割列表。在一個更相關的例子這個樣子：

# Create some example data 
dat = data.frame(reaction_time = 
     apply(matrix(round(runif(100, 1, 2000)), 
        25, 4), 1, paste, collapse = ","), 
        stringsAsFactors=FALSE) 
splitdat = do.call("rbind", strsplit(dat$reaction_time, ",")) 
splitdat = data.frame(apply(splitdat, 2, as.numeric)) 
names(splitdat) = paste("trial", 1:4, sep = "") 
head(splitdat) 
    trial1 trial2 trial3 trial4 
1 597 1071 1430 997 
2 614 322 1242 1140 
3 1522 1679  51 1120 
4 225 1988 1938 1068 
5 621 623 1174  55 
6 1918 1828 136 1816

最後，計算出每人平均：

apply(splitdat, 1, mean) 
[1] 1187.50 361.25 963.75 1017.00 916.25 1409.50 730.00 1310.75 1133.75 
[10] 851.25 914.75 881.25 889.00 1014.75 676.75 850.50 805.00 1460.00 
[19] 901.00 1443.50 507.25 691.50 1090.00 833.25 669.25

來源

2011-12-11 13:57:00

哇，偉大而快速的迴應保羅，dankjewel！工作就像一個魅力:) 如果我沒有弄錯，你也可以使用「colMeans」和「rowMeans」，而不是'apply（splitdat，1，mean）'？ PS：對不起，我不能投票給你，顯然我需要15聲望？！ – user1092247

你是對的課程:)。但我認爲使用apply也很好，因爲它更靈活。你是否也來自荷蘭？ –

謝謝！是的，我也來自荷蘭:) – user1092247

一記漂亮的，如果比較重手，方法是結合使用read.csv與textConnection。假設你的數據在一個數據幀，df：

x <- read.csv(textConnection(df[["reaction times"]]))

來源

2011-12-11 14:53:59

根本不看重我。看起來很靈巧，觸手可及。 –

優雅的解決方案！看看我們的解決方案如何在速度方面比較真正的大數據集，會很有趣。 –

也可以完美運行（我真的可以批准這兩個解決方案嗎？） – user1092247

老問題，但我碰到它another recent question（這似乎無關）。

這兩個現有的答案都是合適的，但我想分享一個與我創建的名爲「splitstackshape」的包有關的答案，該答案速度快且語法簡單。

下面是一些樣本數據：

這是分裂：

library(splitstackshape) 
cSplit(dat, "reaction_time", ",") 
# reaction_time_1 reaction_time_2 reaction_time_3 reaction_time_4 
# 1:    532   1889   1374    761 
# 2:    745   1322    769   1555 
# 3:   1146   1259   1540   1869 
# 4:   1817    125    996    425 
# 5:    404    413   1436   1304 
# 6:   1797    354   1984    252

和可選，如果需要採取rowMeans：使用

rowMeans(cSplit(dat, "reaction_time", ",")) 
# [1] 1139.00 1097.75 1453.50 840.75 889.25 1096.75

來源

2014-11-09 05:58:22 A5C1D2H2I1M1N2O1R2T1

優秀的軟件包 - 感謝分享，使它更簡單明瞭！ – user1092247

另一種選擇dplyr和tidyr與Paul Hiemstra的示例數據是：

# create example data 
data = data.frame(reaction_time = 
        apply(matrix(round(runif(100, 1, 2000)), 
            25, 4), 1, paste, collapse = ","), 
      stringsAsFactors=FALSE) 
head(data) 

# clean data 
data2 <- data %>% mutate(split_reaction_time = str_split(as.character(reaction_time), ",")) %>% unnest(split_reaction_time) 
data2$col_names <- c("trial1", "trial2", "trial3", "trial4") 
data2 <- data2 %>% spread(key = col_names, value = split_reaction_time) %>% select(-reaction_time) 
head(data2)

來源

2017-08-11 18:07:46 sdevine188

將逗號分隔的條目轉換爲列

回答

相關問題