2011-04-21 60 views
0

我有一個csv文件,我試圖在bash中進行解析。每行的第一個字段是格式爲yyyy-mm-dd hh:mm:ss的時間戳。每10分鐘生成六行,我在下面添加了一個小樣本。使用bash解析文件,查找第一個唯一值

我想要做的是獲得每天的前6行。每天的第一項可以在00:00:xx和00:10:xx之間隨時進行,因此「00:0」的grep不起作用。

2010-04-23 00:04:39,0,0,4666724,3217665,28866,28866,0.92,65,
2010-04-23 00:04:39.1,0.4666724, 3217663,20832,20832,0.62,65,
2010-04-23 00:04:42.2,0.4666724,3217662,14702,14702,0.46,65,
2010-04-23 00:04: 430.3,0.4666724,3217664,27739,27739,0.92,65,
2010-04-23 00:04:39,4,0,4666724,3217664,25105,25105,0.77,65,
2010 -04-23 00:04:430.5,0,4666724,3217664,24546,24546,0.77,65,
2010-04-23 00:14:40.0,0.4666724,3217665,29226,29226 ,0.92,65,
2010-04-23 00:14:430.1,0,4666724,3217663,21552,21552,0.62,65,
2010-04-23 00:14:42,90,90,62,62,63,66,65 ,
2010-04-23 00:14:430.3,0,4666724,3217664,28459,28459,0.92,65,
2010-04-23 00:14:430.4,0,4666724,3217664 ,25825,25825,0.77,65,
2010-04-23 00:14:35,906,4666724,3217664,25266,25266,0.77,65,
2010-04-23 00:24:43 ,0.0,0,4666724,3217665,29586,29586,0.92,65,
2010-04-23 00:24:430.1,0,4666724,3217663,22272,22272,0.77,65,

2010-04-24 00:05:02,0.0,0,4666724,3217701,71388,71388,2.31,65,
2010-04-24 00:05:02,0.1,0,4666724,3217701,70264,70264,2.31,65,
2010-04-24 00:05:02,0.2,0,4666724,3217700, 61254,61254,2.00,65,
2010-04-24 00:05:02,0.3,0,4666724,3217701,71011,71011,2.31,65,
2010-04-24 00:05:02, 0.4,0,4666724,3217701,68111,68111,2.15,65,
2010-04-24 00:05:02,0.5,0,4666724,3217702,69904,69904,2.31,65,

思路, 註釋? 鮑勃

回答

1

它可以是如用grep 2個圖案作爲簡單:

grep -e " 00:0" -e " 00:10" myFIle.csv 

第一模式會匹配00:0000:09之間和第二圖案會發現00:10

+0

這是好的,但在天,當有在00:00的條目,它也將在00:10拿起條目。感謝提醒我關於-e – Jay 2011-04-21 19:41:10

1

應該很容易用Perl:

perl -ane '$l = 0 if $F[0] ne $d; print if $l++ < 6; $d = $F[0]' file 
+0

好的解決方案...我真的需要學習一些perl的一天。 – Jay 2011-04-21 19:42:49

1

下面使用read與自定義IFS(=輸入字段分隔符)設定爲分割輸入線到日期時間字段,其餘的,則使用bash'substring操作符從ISO日期時間提取日期,然後基本繼續打印下N行。在echo的位置,您可能想要對結果進行任何處理,因爲read + echo不會完全保留輸入。

function first_n_of_each_day() { 
    local N="$1" 
    local lastDateTime="" 
    local I=0 
    while IFS=',' read DATETIME OTHER ; do 
     local DATE="${DATETIME:0:10}" 
     if [ "$DATE" != "$lastDateTime" ] ; then 
      I=0 
      lastDateTime="$DATE" 
     fi 
     if [ $I -lt "$N" ] ; then 
      let ++I 
      # line matches: 
      echo "$DATETIME,$OTHER" 
     fi 
    done 
} 
first_n_of_each_day 6 < file.csv 
+0

這就是它!我的解決方案開始是這樣的,但是我的大腦在這個過程中轉向了木薯粉。謝謝! – Jay 2011-04-21 19:42:23

2

的AWK版本eugene y的回答

awk ' 
    $1 != date {count = 0; date = $1} 
    ++count <= 6 {print} 
' filename 
+0

+1這真是一個簡單和乾淨的解決方案,以解決這個問題。 – anubhava 2011-04-22 03:24:37