在子目錄中的幾個文件上循環子集，並將文件輸出到帶有後綴的新目錄中

我已經想出了代碼的一部分，我將在下面介紹，但是我發現很難通過文件列表：在子目錄中的幾個文件上循環子集，並將文件輸出到帶有後綴的新目錄中

library(Hmisc) 
filter_173 <- c("kp|917416", "kp|835898", "kp|829747", "kp|767311") 
# This is a vector of values that I want to exclude from the files 
setwd("full_path_of_directory_with_desired_files") 
filepath <- "//full_path_of_directory_with_desired_files" 
list.files(filepath) 
predict_files <- list.files(filepath, pattern="predict.txt") 
# all files that I want to filter have _predict.txt in them 
predict_full <- file.path(filepath, predict_files) 
# generates full pathnames of all desired files I want to filter 
sample_names <- sample_names <- sapply(strsplit(predict_files , "_"), `[`, 1)

現在，這裏是一個簡單的過濾我想用一個具體的示例文件做的一個例子，這個偉大的工程。如何在一個循環中predict_full

test_predict <- read.table("a550673-4308980_A05_RepliG_rep2_predict.txt", header = T, sep = "\t") 
# this is a file in my current working directory that I set with setwd above 
test_predict_filt <- test_predict[test_predict$target_id %nin% filter_173] 
    write.table(test_predict_filt, file = "test_predict")

最後重複這對所有的文件名我怎麼把已過濾的文件夾中具有相同的名稱作爲原始後綴過濾？

predict_filt <- file.path(filepath, "filtered") 
# Place filtered files in 
filtered/ subdirectory 
filtPreds <- file.path(predict_filt, paste0(sample_names, "_filt_predict.txt"))

我總是陷入循環！儘管我們共享的所有代碼都適用於您的計算機上適當的路徑名，但很難共享一個100％可重複的示例，因爲每個人的工作目錄和文件路徑都是唯一的。

來源

2017-05-13 Manasi Shah

這應該工作循環通過每個文件，並把它們寫出到你需要的文件名規範的新位置。請確保首先更改目錄路徑。

filter_173 <- c("kp|917416", "kp|835898", "kp|829747", "kp|767311") #This is a vector of values that I want to exclude from the files 

filepath <- "//full_path_of_directory_with_desired_files" 
filteredpath <- "//full_path_of_directory_with_filtered_results/" 

# Get vector of predict.txt files 
predict_files <- list.files(filepath, pattern="predict.txt") 

# Get vector of full paths for predict.txt files 
predict_full <- file.path(filepath, predict_files) 

# Get vector of sample names 
sample_names <- sample_names <- sapply(strsplit(predict_files , "_"), `[`, 1) 

# Set for loop to go from 1 to the number of predict.txt files 
for(i in 1:length(predict_full)) 
{ 
    # Load the current file into a dataframe 
    df.predict <- read.table(predict_full[i], header=T, sep="\t") 

    # Filter out the unwanted rows 
    df.predict <- df.predict[!(df.predict$target_id %in% filter_173)] 

    # Write the filtered dataframe to the new directory 
    write.table(df.predict, file = file.path(filteredpath, paste(sample_names[i],"_filt_predict.txt",sep = ""))) 
}

來源

2017-05-13 02:27:16

嗨馬特，感謝您的回答。這個循環對我打算做的事很好。對於任何未來的讀者，只需要注意一些警告，我想過濾與變量'target_id'中的'filter_173'向量相對應的行，因此在末尾添加'，'很重要。同樣在你的'write.table'中，'sep'應該在用'）'關閉文件函數之後出現在'write.table'中默認的'quotes = T'和'row.names = T'。如果它們對製表符分隔的文件（我猜測它們會是）敏感，那麼這會破壞你未來的腳本。 –

所以有效下面是工作循環： '爲（I在1：長度（predict_full）） { ＃加載當前文件到一個數據幀 df.predict < - 函數read.table（predict_full [I]，標題= T 9月= 「\ t」的）＃篩選出不想要的行 df.predict < - [！以％filter_173（df.predict $ target_id％），] df.predict ＃寫過濾數據幀到新目錄 write.table（df.predict，file = file.path（filteredpath，paste（sample_names [i]，「_ filt_predict.txt」），sep =「\ t」，row.names = F，quotes = F）） }' –

在子目錄中的幾個文件上循環子集，並將文件輸出到帶有後綴的新目錄中

回答

相關問題