2015-06-22 235 views
1

我有興趣在標題名稱前跳過我的數據框的一些行。我如何通過在ID_REF之前掃描所有行或如果ID_REF不存在,請檢查ILMN_的模式並刪除所有保留第一個的行(如果不包含#)。跳過fread的一些行

# GEOarchive matrix file.    
ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS 1688628068_A.BEAD_STDERR 1688628068_A.Detection Pval 
ILMN_1343291 62821.84   135        413.9399      0 
ILMN_1343292 3255.167   131        47.76587      0 
ILMN_1343293 42924.91   152        539.3026      0 
ILMN_1343294 55255.21   100        746.1457      0 
+1

看起來您的列名比列多。 '1688628068_A.Detection Pval'是單列嗎?如果文件有'#'需要跳過,'read.table('yourfile.txt',header = TRUE,fill = TRUE'')應該讀取它。 – akrun

+0

@akrun是的,這是一個單列 – Hashim

+0

一個選項是將文件中的列名更改爲「1688628068_A.Detection_Pval」,並且沒有使用'fill = TRUE'來讀取 – akrun

回答

3

在linux中,你可以使用awkfread或者它可以與read.table用管道輸送。在這裏,我用awk

pth <- '/home/akrun/file.txt' #change it to your path 
v1 <- sprintf("awk '/^(ID_REF|LMN)/{ matched = 1} matched {$1=$1; print}' OFS=\",\" %s", pth) 

fread

library(data.table) 
fread(v1) 
#   ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS 
#1: ILMN_1343291    62821.840      135 
#2: ILMN_1343292    3255.167      131 
#3: ILMN_1343293    42924.910      152 
#4: ILMN_1343294    55255.210      100 
# 1688628068_A.BEAD_STDERR 1688628068_A.Detection_Pval 
#1:    413.93990       0 
#2:     47.76587       0 
#3:    539.30260       0 
#4:    746.14570       0 

或者使用read.table

read.table(pipe(v1), header=TRUE, sep=',', check.names=FALSE) 
#  ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS 
#1 ILMN_1343291    62821.840      135 
#2 ILMN_1343292    3255.167      131 
#3 ILMN_1343293    42924.910      152 
#4 ILMN_1343294    55255.210      100 
# 1688628068_A.BEAD_STDERR 1688628068_A.Detection_Pval 
#1    413.93990       0 
#2     47.76587       0 
#3    539.30260       0 
#4    746.14570       0 

注意閱讀改變了分隔符,:我從1688628068_A.Detection Pval改變了列名1688628068_A.Detection_Pval

由於某種原因,多餘的空格會造成fread問題。與read.table這不是一個問題。因此,以下工作也可以正常使用read.table

v2 <- sprintf("awk '/^(ID_REF|ILMN)/{ matched = 1} matched { print}' %s", pth) 

read.table(pipe(v2), header=TRUE, check.names=FALSE) 
#  ID_REF 1688628068_A.AVG_Signal 1688628068_A.Avg_NBEADS 
#1 ILMN_1343291    62821.840      135 
#2 ILMN_1343292    3255.167      131 
#3 ILMN_1343293    42924.910      152 
#4 ILMN_1343294    55255.210      100 
# 1688628068_A.BEAD_STDERR 1688628068_A.Detection_Pval 
#1    413.93990       0 
#2     47.76587       0 
#3    539.30260       0 
#4    746.14570       0 
+1

謝謝,它工作正常 – Hashim