您的想法很有用,並且有很多套件可以很快完成。
在tidyverse包中,您所描述的操作分別封裝在名爲gather
和spread
的函數中。有一個非常酷的cheatsheet produced by R Studio涵蓋了這些類型的數據爭奪活動。
與您的數據的技巧是,通常,蔓延期望有一個獨特的一組列。好消息是,你可以解決這個問題的方式有兩種:
1.新的獨特的列上創建一個佔位符變量,並傳播使用佔位符作爲關鍵
library(tidyr)
library(dplyr)
output <-
input %>%
# bring all of the data into a long table
gather(Plate, Well, V1:V5) %>%
# remove the column with the old column names,
# this column will cause problems in spread if not removed
select(-Plate) %>%
# create the placeholder variable
group_by(Well) %>%
mutate(NewColumn = seq(1, n())) %>%
# spread the data out based on the new column header
spread(NewColumn, Species)
根據使用以及是否需要它,您可以在傳播函數之前或之後重命名標題列。
OR:
2.更改所需的輸出咯,給你每個物種的一列
library(tidyr)
library(dplyr)
output <-
input %>%
# bring all of the data into a long table
gather(Plate, Well, V1:V5) %>%
# remove the column with the old column names,
# this column will cause problems in spread if not removed
select(-Plate) %>%
# count the number of unique combinations of well and species
count(Well, Species) %>%
# spread out the counts
# fill = 0 sets the values where no combinations exist to 0
spread(Species, n, fill = 0)
這給你一個不同的輸出,但我提到它,因爲它可以更容易查看是否有多個相同數據集的實例(例如,兩個相同的物種),並將數據很好地設置用於未來的分析。參考
重現數據:
input <- tibble(
Species = c(
"Eupeodes corollae",
"Diptera",
"Episyrphus balteatus",
"Aphidie",
"Ericaphis",
"Hemiptera"
),
V1 = c("1-G3 ", "1-A10", "2-C3", "1-B9", "1-B9", "1-B9"),
V2 = c("1-F1", "1-B2", "2-A10", "1-D7", "1-D7", "1-D7"),
V3 = c("1-E11", "1-C1" , "1-C11", "2-A3", "2-A3", "2-A3"),
V4 = c("1-C10", "1-G7", "1-A10", "1-C8", "1-C8", "1-C8"),
V5 = c("1-A3", "1-E11", "2-B4", "2-C11", "2-C11", "2-C11")
)
這是的cheatsheet爲更簡單的問題真正有用的和你全面的回答非常有幫助。我特別讚賞第一種方法的解釋。現在我已經能夠製作出我之前使用過的表格。 –