連接具有條件

-5

我想1個文件的一些行連接成1列，但它必須依賴於內容，是整個文件變量。連接具有條件

我的數據文件的簡化版本：

>xy|number|Name 
ABCABCABC 
ABCABCABC 
ABCABCABC 
ABC 
>xy|number2|Name2 
ABCABCABC 
ABCABC 
>xy|number3|Name3 
ABCABCABC 
ABCABCABC 
ABCABCABC 
ABCAB

我希望它在像這樣結束：（空間意味着不同的列）

xy number Name ABCABCABCABCABCABCABCABCABCABC 
xy number2 Name2 ABCABCABCABCABC 
xy number3 Name3 ABCABCABCABCABCABCABCABCABCABCAB

來源

2013-01-02 user1941884

我敢肯定，這可以在R上完成，但它幾乎可以肯定是錯誤的語言爲任務（和你有什麼打算用這些結構中的R辦？）。如果他想要做後期處理，R，並且該文件是不是巨大的考慮命令式語言如Perl或C. –

@MatthewLUndberg，我不明白爲什麼R是錯誤的語言來做到這一點。 – nograpes

@nograpes只是一個猜測。 –

這裏是一個類似的解決方案，以@ MatthewLundberg，但使用cumsum來分割向量。

file<-scan('~/Desktop/data.txt','character') 
h<-grepl('^>',file) 
file[h]<-gsub('^>','',paste0(file[h],'|'),'') 
l<-split(file,cumsum(h)) 
do.call(rbind,strsplit(sapply(l,paste,collapse=''),'[|]')) 

# [,1] [,2]  [,3] [,4]        
# 1 "xy" "number" "Name" "ABCABCABCABCABCABCABCABCABCABC" 
# 2 "xy" "number2" "Name2" "ABCABCABCABCABC"     
# 3 "xy" "number3" "Name3" "ABCABCABCABCABCABCABCABCABCABCAB"

來源

2013-01-02 04:41:08 nograpes

+1。非常好.. –

確保在這裏，'file'不是一個因素。 –

我只是寫了一些類似的東西，但第二行和第四行解壓縮......現在沒有意義了......用'scan'，'what = character（）'讀取文件，這將是一個完整的答案擁有。 – John

dat <- read.table(file, header=FALSE) 

h <- grep('^>', dat$V1) 
m <- matrix(c(h, c(h[-1]-1, length(dat$V1))), ncol=2) 
gsub('[|]', ' ', 
     sub('>', '', 
     apply(m, 1, function(x) 
      paste(dat$V1[x[1]], paste(dat$V1[(x[1]+1):x[2]], collapse='')) 
      ) 
     ) 
    ) 
## [1] "xy number Name ABCABCABCABCABCABCABCABCABCABC"  
## [2] "xy number2 Name2 ABCABCABCABCABC"     
## [3] "xy number3 Name3 ABCABCABCABCABCABCABCABCABCABCAB"

來源

2013-01-02 03:54:05

不知何故，我在這裏失去了號碼和名字，但你給了我很多很棒的信息，謝謝！ – user1941884

糟糕，我的壞。它完美的工作，謝謝你！ – user1941884

東西供你考慮的情況下，要與結果data.frame：

raw <- ">xy|number|Name 
ABCABCABC 
ABCABCABC 
ABCABCABC 
ABC 
>xy|number2|Name2 
ABCABCABC 
ABCABC 
>xy|number3|Name3 
ABCABCABC 
ABCABCABC 
ABCABCABC 
ABCAB" 

s <- readLines(textConnection(raw))  # s is vector of strings 

first.line <- which(substr(s,1,1) == ">") # find first line of set 
N <- length(first.line) 
first.line <- c(first.line, length(s)+1) # add first line past end 

# Preallocate data.frame (good idea if large) 
d <- data.frame(X1=rep("",N), X2=rep("",N), X3=rep("",N), X4=rep("",N), 
       stringsAsFactors=FALSE) 

for (i in 1:N) 
{ 
    w <- unlist(strsplit(s[first.line[i]],">|\\|")) # Parse 1st line 
    d$X1[i] <- w[2] 
    d$X2[i] <- w[3] 
    d$X3[i] <- w[4] 
    d$X4[i] <- paste(s[ (first.line[i]+1) : (first.line[i+1]-1) ], collapse="") 
} 


d 
    X1  X2 X3        X4 
1 xy number Name ABCABCABCABCABCABCABCABCABCABC 
2 xy number2 Name2     ABCABCABCABCABC 
3 xy number3 Name3 ABCABCABCABCABCABCABCABCABCABCAB

我希望在默認情況下[R左對齊的字符串時，它會顯示他們在一個data.frame。

來源

2013-01-02 07:12:55

連接具有條件

回答

相關問題