如何分隔一個NA包含NA的列？

我覺得它很簡單，我有一個data.frame包含一個「時間」列。它看起來像這樣：

------------------------- 
> head(Times,10) 
    Times 
1  NA 
2 0.448 
3 0.130 
4  NA 
5  NA 
6 0.462 
7 0.427 
8 0.946 
9 0.227 
10 NA 
> 
------------------------

這個想法是，第一個NA表示一個序列的開始，因此，下面的時間應該來自相同的標籤。到達下一個NA條目後，序列結束。

我現在想創建一個新的data.frame，它將NA之間的數字轉換爲列，並按行排序。

Time1 Time2 Time3 Time4 
1 0.448 0.130 0.123 
2 0.462 0.427 0.946 0.227 
> 
---------------------------------

你能幫忙嗎？

來源

2015-09-22 Mitja Farias

'0.123'從哪裏來？ –

我很困惑你的樣本目標df。行是序列，但在您的示例中，您的第一行應該只有兩列。新的df [1,3] ==「」是否正確？ –

另外，在這方面的工作讓我覺得，如果你可以通過在每次運行之間刪除第二個「NA」來減少你的管理。如果你有一個「NA」，它足以表示一個序列的結束和下一個的開始。 –

Times <- read.table(text = "Times 
1  NA 
2 0.448 
3 0.130 
4  NA 
5  NA 
6 0.462 
7 0.427 
8 0.946 
9 0.227 
10 NA", header = TRUE) 

#identify values that belong together 
Times$ind <- cumsum(is.na(Times$Times)) %/% 2 + 1 

Times <- na.omit(Times) #remove NA values 

#identify columns 
Times$col <- unlist(tapply(Times$ind, factor(Times$ind), seq_along)) 

#reshape to wide format 
reshape(Times, timevar = "col", idvar = "ind", direction = "wide") 
# ind Times.1 Times.2 Times.3 Times.4 
#2 1 0.448 0.130  NA  NA 
#6 2 0.462 0.427 0.946 0.227

我已經使用base R的樂趣。如果你需要更高效的東西，你應該使用package data.table。

來源

2015-09-22 17:16:33 Roland

這個「解決方案」效果很好。非常感謝！ –

下面是使用dplyr和tidyr一個解決方案：

library(dplyr) 
library(tidyr) 
Times %>% filter(!(is.na(Times) & is.na(lead(Times)))) %>% 
      mutate(series = cumsum(is.na(Times))) %>% 
      filter(!is.na(Times)) %>% 
      group_by(series) %>% 
      mutate(count = paste0("Times.", row_number())) %>% 
      spread(count, Times) 

Source: local data frame [2 x 5] 

    series Times.1 Times.2 Times.3 Times.4 
    (int) (dbl) (dbl) (dbl) (dbl) 
1  1 0.448 0.130  NA  NA 
2  2 0.462 0.427 0.946 0.227

來源

2015-09-22 18:24:21 jeremycg

使用data.table v1.9.6（使用的數據來自@羅蘭的回答）：

require(data.table) # v1.9.6+ 
setDT(Times)[, `:=`(grp = seq_len(.N), rle = rle), by = .(rle = rleid(is.na(Times)))] 
dcast(na.omit(Times, by="Times"), rle ~ grp, value.var="Times") 
# rle  1  2  3  4 
# 1: 2 0.448 0.130 NA NA 
# 2: 4 0.462 0.427 0.946 0.227

您可以使用paste0("Times", rle)得到的列名顯示在您的Q.

來源

2015-09-22 20:41:43 Arun

1.9.6終於在CRAN上？ :)我更新了他的軟件包... – Roland

install.packages中的警告： package'data.table v1.9.6'is not available（for R version 3.0.0）.... d –

如何分隔一個NA包含NA的列？

回答

相關問題