如果您只有一行NA
,只需找到行號,然後將您的數據分爲兩個數據幀。
否則,如果多行包含NA
,你可以使用包dplyr
:
library(dplyr)
index_of_na <- which(is.na(data$V2)) # find rows which contain na
number_in_each_block <- index_of_na - lag(index_of_na,1) # find number of rows in each block, including the terminating na
number_in_each_block[[1]] <- index_of_na[[1]] # set the size of first block to the first entry in index_of_na
number_in_each_block[[length(number_in_each_block) + 1]] <- nrow(data) - index_of_na[[length(index_of_na)]] # count the last block if it is not terminated by na
list_of_groups_in_data <- paste0("group_", seq_along(number_in_each_block)) # call the groups group_1, group_2, etc...
group_name <- rep(list_of_groups_in_data, number_in_each_block) # make a vector with the same number of rows as the data
data <- cbind(data, group_name) # now we have named each row with a group name.
#then use dplyr group_by to calculate the mean of each group
data <-
data %>%
group_by(group_name) %>%
mutate(mean_of_groups = mean(V4, na.rm = TRUE))
使用
data=read.table(text='V1 V2 V3 V4 \n chr1 3686375 3686400 6 \n chr1 3686400 3686425 8 \n NextbedGraphsection NA NA NA \n chr1 3840175 3840200 2 \n chr1 3840200 3840225 3 \n chr1 3840225 3840250 4',head=TRUE, sep="")
我們得到:
> print(data)
Source: local data frame [6 x 6]
Groups: group_name
V1 V2 V3 V4 group_name mean_of_groups
1 chr1 3686375 3686400 6 group_1 7
2 chr1 3686400 3686425 8 group_1 7
3 NextbedGraphsection NA NA NA group_1 7
4 chr1 3840175 3840200 2 group_2 3
5 chr1 3840200 3840225 3 group_2 3
6 chr1 3840225 3840250 4 group_2 3
你能告訴我怎麼把專欄放在我的頭上an_of_groups'到一個新的矩陣重複元素排除
使用dplyr::summarise
代替mutate
在代碼的最後一塊:
data <-
data %>%
group_by(group_name) %>%
summarise(mean_of_groups = mean(V4, na.rm = TRUE))
這給:
> data
Source: local data frame [2 x 2]
group_name mean_of_groups
1 group_1 7
2 group_2 3
請添加一個'dput (頭(...))'的數據。 – Alex 2014-10-09 05:29:23