2014-01-09 60 views
0

我有一個日期戳記在他們的名字,我試圖只導入一定範圍的日期的文件。使用循環來按日期範圍子集一個向量

首先我加載的所有文件可爲R作爲載體:

files <- c("FileName_2013_06_10_00_00_00.txt", "FileName_2013_06_11_00_00_00.txt", 
"FileName_2013_06_12_00_00_00.txt", "FileName_2013_06_13_00_00_00.txt", 
"FileName_2013_06_14_00_00_00.txt", "FileName_2013_06_15_00_00_00.txt", 
"FileName_2013_06_16_00_00_00.txt", "FileName_2013_06_17_00_00_00.txt", 
"FileName_2013_06_18_00_00_00.txt", "FileName_2013_06_19_00_00_00.txt", 
"FileName_2013_06_20_00_00_00.txt", "FileName_2013_06_21_00_00_00.txt", 
"FileName_2013_06_22_00_00_00.txt", "FileName_2013_06_23_00_00_00.txt", 
"FileName_2013_06_24_00_00_00.txt", "FileName_2013_06_25_00_00_00.txt", 
"FileName_2013_06_26_00_00_00.txt", "FileName_2013_06_27_00_00_00.txt", 
"FileName_2013_06_28_00_00_00.txt", "FileName_2013_06_29_00_00_00.txt", 
"FileName_2013_06_30_00_00_00.txt", "FileName_2013_07_01_00_00_00.txt", 
"FileName_2013_07_02_00_00_00.txt", "FileName_2013_07_03_00_00_00.txt", 
"FileName_2013_07_04_00_00_00.txt", "FileName_2013_07_05_00_00_00.txt", 
"FileName_2013_07_06_00_00_00.txt", "FileName_2013_07_07_00_00_00.txt", 
"FileName_2013_07_08_00_00_00.txt", "FileName_2013_07_09_00_00_00.txt", 
"FileName_2013_07_10_00_00_00.txt", "FileName_2013_07_11_00_00_00.txt", 
"FileName_2013_07_12_00_00_00.txt", "FileName_2013_07_13_00_00_00.txt", 
"FileName_2013_07_14_00_00_00.txt", "FileName_2013_07_15_00_00_00.txt") 

每個文件名代表FileName_yyyy_mm_dd_HH_MM_SS.txt

這些,我只希望導入隨後的日子裏(YearMonth,和Day是我在尋找的唯一標準):

datesub <- c("FileName_2013_06_25_00_00_00.txt", "FileName_2013_06_26_00_00_00.txt", 
"FileName_2013_06_27_00_00_00.txt", "FileName_2013_06_28_00_00_00.txt", 
"FileName_2013_06_29_00_00_00.txt", "FileName_2013_06_30_00_00_00.txt", 
"FileName_2013_07_01_00_00_00.txt", "FileName_2013_07_02_00_00_00.txt", 
"FileName_2013_07_03_00_00_00.txt", "FileName_2013_07_04_00_00_00.txt", 
"FileName_2013_07_05_00_00_00.txt", "FileName_2013_07_06_00_00_00.txt", 
"FileName_2013_07_07_00_00_00.txt") 

易enoug h至做一個子集(files[files %in% datesub]),然而,併發症出現,因爲該文件有時有這樣的格式:

  • FileName_2013_06_27_12_21_13.txt
  • FileName_2013_06_28_00_00_00comb.txt
  • 或事先實例的任何組合。

在使用正則表達式將數據導入到R之前,我嘗試了對數據進行子集化處理,但只要嘗試做了兩個多月的事情就開始變得混亂。

我該如何子集數據?我認爲可以使用for循環,但我不確定。

我向所有人和任何建議開放。如果我的問題不夠清楚,請告訴我,我會盡力澄清。

回答

1

使用正則表達式得到的只是來自datesub的Y_m_d片,然後再次使用正則表達式來獲得Y_m_d片相匹配的文件:

datesubclean <- sapply(
    regmatches(datesub, regexec("^FileName_([0-9]{4}_[0-9]{2}_[0-9]{2})", datesub)), 
    `[`, 2L 
) 
files.sub <- sapply(datesubclean, grep, x=files, value=T) 
unname(files.sub) 
# [1] "FileName_2013_06_25_00_00_00.txt" "FileName_2013_06_26_00_00_00.txt" 
# [3] "FileName_2013_06_27_00_00_00.txt" "FileName_2013_06_28_00_00_00.txt" 
# [5] "FileName_2013_06_29_00_00_00.txt" "FileName_2013_06_30_00_00_00.txt" 
# [7] "FileName_2013_07_01_00_00_00.txt" "FileName_2013_07_02_00_00_00.txt" 
# [9] "FileName_2013_07_03_00_00_00.txt" "FileName_2013_07_04_00_00_00.txt" 
# [11] "FileName_2013_07_05_00_00_00.txt" "FileName_2013_07_06_00_00_00.txt" 
# [13] "FileName_2013_07_07_00_00_00.txt" 

然後,所有你需要做的就是遍歷文件名稱並打開它們。

regexec是一個特殊的正則表達式的功能,使我們能夠獲取拍攝比賽(的東西,在正則表達式中的括號),並regmatches能夠讀取的特殊對象regexec產生。第一個sapply只是從regmatches輸出中獲得第二個元素,因爲除子模式捕獲之外,regmatches還返回完全匹配作爲第一個元素。

+0

謝謝!我仍然不明白你做了什麼,所以今晚我有一些閱讀。編輯:我可以使用seq.Date生成'datesubclean',所以現在至少我明白髮生了什麼:) – amzu

+0

看看執行'regmatches ([0-9] {4} _ [0-9] {2} _ [0-9] {2})「,datesub))'。試圖理解'regexec'的輸出更具挑戰性(並且通常是不必要的)。 – BrodieG