損壞的R代碼選擇文本文件中的特定行和單元格並將其放入數據框中

這是this question的擴展，需要更改該擴展以在文本文件中容納更多行Bands。我想要的是從類似於下面的文本文件中選擇「基本統計」行，然後在數據框中組織它們，如問題底部的那個。 Here's如果您想直接使用它，則指向該文件的鏈接。損壞的R代碼選擇文本文件中的特定行和單元格並將其放入數據框中

Filename: /blah/blah/blah.txt 
ROI: red_2 [Red] 12 points 

Basic Stats  Min   Max  Mean  Stdev 
    Band 1 0.032262 0.124425 0.078073 0.028031 
    Band 2 0.021072 0.064156 0.037923 0.012178 
    Band 3 0.013404 0.066043 0.036316 0.014787 
    Band 4 0.005162 0.055781 0.015526 0.013255 

Histogram   DN  Npts Total Percent  Acc Pct 
Band 1  0.032262   1  1 8.3333  8.3333 
Bin=0.00036 0.032624   0  1 0.0000  8.3333 
      0.032985   0  1 0.0000  8.3333 
      0.033346   0  1 0.0000  8.3333

這是我使用的代碼：

dat <- readLines('/blah/blah/blah.txt') 
# create an index for the lines that are needed: Basic stats and Bands 
ti <- rep(which(grepl('ROI:', dat)), each = 8) + 1:8 
# create a grouping vector of the same length 
grp <- rep(1:203, each = 8) 

# filter the text with the index 'ti' 
# and split into a list with grouping variable 'grp' 
lst <- split(dat[ti], grp) 
# loop over the list a read the text parts in as dataframes 
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', header = TRUE, blank.lines.skip = TRUE)) 

# bind the dataframes in the list together in one data.frame 
DF <- do.call(rbind, lst) 
# change the name of the first column 
names(DF)[1] <- 'ROI' 

# get the correct ROI's for the ROI-column 
DF$ROI <- sub('.*: (\\w+).*$', '\\1', dat[grepl('ROI: ', dat)]) 
DF

輸出看起來是這樣的：

$ROI 
[1] "red_2" "red_3" "red_4" "red_5" "red_6" "red_7" "red_8" "red_9" "red_10" "bcs_1" "bcs_2" 
[12] "bcs_3" "bcs_4" "bcs_5" "bcs_6" "bcs_7" "bcs_8" "bcs_9" "bcs_10" "red_11" "red_12" "red_12" 
[23] "red_13" "red_14" "red_15" "red_16" "red_17" "red_18" "red_19" "red_20" "red_21" "red_22" "red_23" 
[34] "red_24" "red_25" "red_24" "red_25" "red_26" "red_27" "red_28" "red_29" "red_30" "red_31" "red_33" 

$<NA> 
[1] "Basic Stats\t  Min\t  Max\t Mean\t Stdev" 

$<NA> 
[1] "Basic Stats\t  Min\t  Max\t Mean\t Stdev" 
etc...

當它看起來應該這樣這樣的：

ROI  Band   Min  Max   Mean Stdev 
red_2 Band 1 0.032262 0.124425 0.078073 0.028031 
red_2 Band 2 0.021072 0.064156 0.037923 0.012178 
red_2 Band 3 0.013404 0.066043 0.036316 0.014787 
red_2 Band 4 0.005162 0.055781 0.015526 0.013255 
red_3 Band 1 values... 
red_4 Band 2 
red_4 Band 3 
red_4 Band 4

我想要一些幫助。

來源

2017-03-05 JAG2024

我建議整理首先使用bash的數據（Unix工具），並在之後加載到R（例如' bands.txt ...'）。 – liborm

請提供dput（dat）或一個子集，以便我們可以複製和粘貼。 – Djork

對於這個文件您必須修改the approach I proposed here。對於鏈接的文本文件（test2.txt）我提出以下方法：

dat <- readLines('test2.txt') 

len <- sum(grepl('ROI:', dat)) 
ti <- rep(which(grepl('ROI:', dat)), each = 7) + 0:6 
grp <- rep(1:len, each = 7) 

lst <- split(dat[ti], grp) 
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', skip = 1, header = TRUE, blank.lines.skip = TRUE)) 

names(lst) <- sub('.*: (\\w+).*$', '\\1', dat[grepl('ROI: ', dat)]) 

library(data.table) 
DT <- rbindlist(lst, idcol = 'ROI') 
setnames(DT, 2, 'Band')

這給期望的結果：

> DT 
     ROI  Band  Min  Max  Mean Stdev 
    1: red_1  Band 1 0.013282 0.133982 0.061581 0.034069 
    2: red_1  Band 2 0.009866 0.112935 0.042688 0.026618 
    3: red_1  Band 3 0.008304 0.037059 0.018434 0.007515 
    4: red_1  Band 4 0.004726 0.040089 0.018490 0.009605 
    5: red_2  Band 1 0.032262 0.124425 0.078073 0.028031 
    ---              
1220: bcs_49  Band 4 0.002578 0.010578 0.006191 0.002285 
1221: bcs_50  Band 1 0.032775 0.072881 0.051152 0.012593 
1222: bcs_50  Band 2 0.020029 0.085993 0.042864 0.018628 
1223: bcs_50  Band 3 0.012770 0.034367 0.023056 0.006581 
1224: bcs_50  Band 4 0.005804 0.024798 0.014049 0.005744

來源

2017-03-06 15:48:51 Jaap

令人驚歎！十分優雅。你能解釋一下'ti'中'+ 0.6'的含義嗎？謝謝！ @Jaap – JAG2024

@ JAG2024該行所做的是創建需要讀取的行的向量。檢查例如'5 + 0：2'的結果 – Jaap

損壞的R代碼選擇文本文件中的特定行和單元格並將其放入數據框中

回答

相關問題