2016-08-30 43 views
1

我的工作有關問題的數據幀,並從匹配的標準我該怎麼讓基於計算索引數據幀行[R

# Create dataframe 

position <- c("START" , "MIDDLE", "END" ,"START" , "MIDDLE", 
      "MIDDLE", "MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE", 
      "MIDDLE", "MIDDLE", "MIDDLE" ,"END", "START" , 
      "START" , "START" , "MIDDLE", "MIDDLE", "END", 
      "START" , "START", "MIDDLE", "MIDDLE", "MIDDLE", 
      "END" ,"START", "MIDDLE", "MIDDLE", "MIDDLE", 
      "END", "START" , "MIDDLE", "MIDDLE", "MIDDLE", 
      "MIDDLE" ,"MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" , 
      "MIDDLE" ,"MIDDLE", "MIDDLE", "MIDDLE", "MIDDLE", 
      "MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE" , 
      "MIDDLE", "MIDDLE", "MIDDLE", "END") 

text <-c("First line", "Middle Line", "Last Line", "First line","Middle Line", 
    "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line", 
    "Middle Line", "Middle Line", "Middle Line", "Last Line", "First line", 
    "First line", "First line", "Middle Line", "Middle Line", "Last Line", 
    "First line", "First line", "Middle Line", "Middle Line", "Middle Line", 
    "Last Line", "First line", "Middle Line", "Middle Line", "Middle Line", 
    "Last Line", "First line", "Middle Line", "Middle Line", "Middle Line", 
    "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line", 
    "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line", 
    "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line", 
    "Middle Line", "Middle Line", "Middle Line", "Last Line") 

檢索基於指數的特定行的要害顯示線,如下列:

> head(a_df) 
position  text 
1 START First line 
2 MIDDLE Middle Line 
3  END Last Line 

基本上我希望能夠顯示整個數據幀的子集,每個子​​集應包含開始/中間和結束行。

做一些閱讀網上我試圖生成指標如下:

# Generate indices 
index_start <- with(a_df, grep("START", a_df$position)) 
index_end <- with(a_df, grep("END", a_df$position)) 

這使所需的輸出:

index_start 
[1] 1 4 15 16 17 21 22 27 32 
> index_end 
[1] 3 14 20 26 31 54 

我實現了指數不均衡(我消除這些不平衡),但我想知道如何我可以使用上面的輸出種子值在下面的子集命令:

a_df[c(1:3),] 
a_df[c(4:14),] 
a_df[c(17:20),] 
a_df[c(22:26),] 
a_df[c(27:31),] 
a_df[c(32:54),] 

在此先感謝 喬納森

回答

2

的代碼在OP的帖子顯示,這是不明確的序列中選擇「index_start」的元素,而是基於,好像我們需要得到的「最後一個元素index_start'小於'index_end'中的元素。爲了得到最後一個元素,我們創建findInterval和使用tapply分組變量,得到「index_start」的最後一個元素,與tail

然後,我們得到「index_start1」的相應元件之間的序列,「index_end '並根據它與Map對數據集行進行子集化,得到listdata.frame s。

index_start1 <- unname(tapply(index_start, findInterval(index_start, index_end), 
          FUN = tail, 1))  
index_start1 
#[1] 1 4 17 22 27 32 

lst <- Map(function(x, y) a_df[x:y,], index_start1, index_end) 
lst 
#[[1]] 
# position  text 
#1 START First line 
#2 MIDDLE Middle Line 
#3  END Last Line 

#[[2]] 
# position  text 
#4  START First line 
#5 MIDDLE Middle Line 
#6 MIDDLE Middle Line 
#7 MIDDLE Middle Line 
#8 MIDDLE Middle Line 
#9 MIDDLE Middle Line 
#10 MIDDLE Middle Line 
#11 MIDDLE Middle Line 
#12 MIDDLE Middle Line 
#13 MIDDLE Middle Line 
#14  END Last Line 

#[[3]] 
# position  text 
#17 START First line 
#18 MIDDLE Middle Line 
#19 MIDDLE Middle Line 
#20  END Last Line 

#[[4]] 
# position  text 
#22 START First line 
#23 MIDDLE Middle Line 
#24 MIDDLE Middle Line 
#25 MIDDLE Middle Line 
#26  END Last Line 

#[[5]] 
# position  text 
#27 START First line 
#28 MIDDLE Middle Line 
#29 MIDDLE Middle Line 
#30 MIDDLE Middle Line 
#31  END Last Line 

#[[6]] 
# position  text 
#32 START First line 
#33 MIDDLE Middle Line 
#34 MIDDLE Middle Line 
#35 MIDDLE Middle Line 
#36 MIDDLE Middle Line 
#37 MIDDLE Middle Line 
#38 MIDDLE Middle Line 
#39 MIDDLE Middle Line 
#40 MIDDLE Middle Line 
#41 MIDDLE Middle Line 
#42 MIDDLE Middle Line 
#43 MIDDLE Middle Line 
#44 MIDDLE Middle Line 
#45 MIDDLE Middle Line 
#46 MIDDLE Middle Line 
#47 MIDDLE Middle Line 
#48 MIDDLE Middle Line 
#49 MIDDLE Middle Line 
#50 MIDDLE Middle Line 
#51 MIDDLE Middle Line 
#52 MIDDLE Middle Line 
#53 MIDDLE Middle Line 
#54  END Last Line 

注:這是更好地保持「data.frame的在list因爲大部分的操作都可以在list環境內進行。

+1

非常感謝迅速響應akrun。 map函數看起來是我將來使用它的一個強大代碼。非常感謝。 –