這是this question的擴展,需要更改該擴展以在文本文件中容納更多行Bands
。我想要的是從類似於下面的文本文件中選擇「基本統計」行,然後在數據框中組織它們,如問題底部的那個。 Here's如果您想直接使用它,則指向該文件的鏈接。損壞的R代碼選擇文本文件中的特定行和單元格並將其放入數據框中
Filename: /blah/blah/blah.txt
ROI: red_2 [Red] 12 points
Basic Stats Min Max Mean Stdev
Band 1 0.032262 0.124425 0.078073 0.028031
Band 2 0.021072 0.064156 0.037923 0.012178
Band 3 0.013404 0.066043 0.036316 0.014787
Band 4 0.005162 0.055781 0.015526 0.013255
Histogram DN Npts Total Percent Acc Pct
Band 1 0.032262 1 1 8.3333 8.3333
Bin=0.00036 0.032624 0 1 0.0000 8.3333
0.032985 0 1 0.0000 8.3333
0.033346 0 1 0.0000 8.3333
這是我使用的代碼:
dat <- readLines('/blah/blah/blah.txt')
# create an index for the lines that are needed: Basic stats and Bands
ti <- rep(which(grepl('ROI:', dat)), each = 8) + 1:8
# create a grouping vector of the same length
grp <- rep(1:203, each = 8)
# filter the text with the index 'ti'
# and split into a list with grouping variable 'grp'
lst <- split(dat[ti], grp)
# loop over the list a read the text parts in as dataframes
lst <- lapply(lst, function(x) read.table(text = x, sep = '\t', header = TRUE, blank.lines.skip = TRUE))
# bind the dataframes in the list together in one data.frame
DF <- do.call(rbind, lst)
# change the name of the first column
names(DF)[1] <- 'ROI'
# get the correct ROI's for the ROI-column
DF$ROI <- sub('.*: (\\w+).*$', '\\1', dat[grepl('ROI: ', dat)])
DF
輸出看起來是這樣的:
$ROI
[1] "red_2" "red_3" "red_4" "red_5" "red_6" "red_7" "red_8" "red_9" "red_10" "bcs_1" "bcs_2"
[12] "bcs_3" "bcs_4" "bcs_5" "bcs_6" "bcs_7" "bcs_8" "bcs_9" "bcs_10" "red_11" "red_12" "red_12"
[23] "red_13" "red_14" "red_15" "red_16" "red_17" "red_18" "red_19" "red_20" "red_21" "red_22" "red_23"
[34] "red_24" "red_25" "red_24" "red_25" "red_26" "red_27" "red_28" "red_29" "red_30" "red_31" "red_33"
$<NA>
[1] "Basic Stats\t Min\t Max\t Mean\t Stdev"
$<NA>
[1] "Basic Stats\t Min\t Max\t Mean\t Stdev"
etc...
當它看起來應該這樣這樣的:
ROI Band Min Max Mean Stdev
red_2 Band 1 0.032262 0.124425 0.078073 0.028031
red_2 Band 2 0.021072 0.064156 0.037923 0.012178
red_2 Band 3 0.013404 0.066043 0.036316 0.014787
red_2 Band 4 0.005162 0.055781 0.015526 0.013255
red_3 Band 1 values...
red_4 Band 2
red_4 Band 3
red_4 Band 4
我想要一些幫助。
我建議整理首先使用bash的數據(Unix工具),並在之後加載到R(例如' blah/blah awk'/ * Band /'> bands.txt ...')。 – liborm
請提供dput(dat)或一個子集,以便我們可以複製和粘貼。 – Djork