2012-02-13 69 views
2

我試圖將HHMMSS轉換爲HH:MM:SS,我可以成功轉換它,但由於文件大小,我的腳本需要2個小時才能完成。有沒有更好的辦法(最快的方式)來完成這一任務如何將HHMMSS轉換爲HH:MM:SS Unix?

Data File 
data.txt 

10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,071600, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,072200,072200, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,072600,072600, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073200,073200, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073500,073500, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,073700,073700, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,073900,073900, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,074400,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,090200, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,090900,090900, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,091500,091500, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,091900,091900, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092500,092500, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092900,092900, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,093200,093200, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,093500,093500, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,094500,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,170100, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,170400,170400, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,170700,170700, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171000,171000, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171500,171500, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,171900,171900, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172500,172500, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172900,172900, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,173500,173500, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,174100,, 

我的代碼:script.sh

#!/bin/bash 
awk -F"," '{print $5}' Data.txt > tmp.txt # print first line first string before , to tmp.txt i.e. all Numbers will be placed into tmp.txt 
sort tmp.txt | uniq -d > Uniqe_number.txt # unique values be stored to Uniqe_number.txt 
rm tmp.txt # removes tmp file 
while read line; do 
echo $line 
cat Data.txt | grep ",$line," > Numbers/All/$line.txt # grep Number and creats files induvidtually 
awk -F"," '{print $5","$4","$7","$8","$9","$10","$11}' Numbers/All/$line.txt > Numbers/All/tmp_$line.txt 
mv Numbers/All/tmp_$line.txt Numbers/Final/Final_$line.txt 
done < Uniqe_number.txt 
ls Numbers/Final > files.txt 
dos2unix files.txt 
bash time_replace.sh  

當你執行上面的腳本,它會調用time_replace.sh腳本

我code for time_replace.sh

#!/bin/bash 
for i in `cat files.txt` 
do 
while read aline 
do 
TimeDep=`echo $aline | awk -F"," '{print $6}'` 
#echo $TimeDep 
finalTimeDep=`echo $TimeDep | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` 
#echo $finalTimeDep 
########## 
TimeAri=`echo $aline | awk -F"," '{print $7}'` 
#echo $TimeAri 
finalTimeAri=`echo $TimeAri | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'` 
#echo $finalTimeAri 
sed -i 's/',$TimeDep'/',$finalTimeDep'/g' Numbers/Final/$i 
sed -i 's/',$TimeAri'/',$finalTimeAri'/g' Numbers/Final/$i 
############################ 
done < Numbers/Final/$i 
done 

任何更好的解決方案?

感謝任何幫助。

感謝 斯里蘭卡

+0

所以,你改變'10,SRI,AA,20091210,8503,ABCXYZ,d,N,TMP,072200,072200,'到:10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00,'? – 2012-02-13 22:14:41

+1

我很震驚,只需要2個小時就可以跑步。 – 2012-02-13 22:15:46

+0

是邁克這是正確的 – user790049 2012-02-13 22:19:16

回答

0

目前還不清楚你的所有排序和uniqing是。我假設您的數據文件每行只有一個條目,並且您需要將第10個和第11個逗號分隔的字段從HHMMSS更改爲HH:MM:SS。

while IFS=, read -a line ; do 
    echo -n ${line[0]},${line[1]},${line[2]},${line[3]}, 
    echo -n ${line[4]},${line[5]},${line[6]},${line[7]}, 
    echo -n ${line[8]},${line[9]}, 
    if [ -n "${line[10]}" ]; then 
     echo -n ${line[10]:0:2}:${line[10]:2:2}:${line[10]:4:2} 
    fi 
    echo -n , 
    if [ -n "${line[11]}" ]; then 
     echo -n ${line[11]:0:2}:${line[11]:2:2}:${line[11]:4:2} 
    fi 
    echo "" 
done < data.txt 

執行部分是${variable:offset:length}結構,讓您提取子出來的變量。

+0

謝謝克里斯,喬納森和邪惡。我採用了邪惡的解決方案(這對我來說更容易理解)。 – user790049 2012-02-14 00:01:17

1

如果有文件量大,則管道可能是什麼會比什麼都重要影響性能 - 雖然過程可以便宜,如果你正在做一個巨大的量處理然後減少您通過管道傳遞數據的時間量可以獲得紅利。

所以你可能會更好的在awk(或perl)中編寫整個腳本。例如,awk可以將輸出發送到一個任意文件,所以你的第一個文件中的lop可以用一個awk腳本來替代,這個腳本可以實現這一點。您也不需要使用臨時文件。

我認爲排序只是爲了跟蹤進度,因爲您知道有多少個數字。但是,如果你不喜歡的排序,你可以簡單地這樣做:

#!/bin/sh 
awk -F ',' ' 
{ 
    print $5","$4","$7","$8","$9","$10","$11 > Numbers/Final/Final_$line.txt 
}' datafile.txt 
ls Numbers/Final > files.txt 

另外,如果你需要排序,你可以做sort -t, -k5,4,10(或任何領域的排序關鍵字的實際需要來定)。

至於格式化日期時間,awk也做功能,所以你實際上可以有一個awk腳本,看起來像這樣。這將取代這兩個上面的腳本,同時保留相同的功能(至少,據我可以做一個快速分析)...(注意!未經測試,所以可能包含vauge語法錯誤):

#!/usr/bin/awk 
BEGIN { 
    FS="," 
} 
function formattime (t) 
{ 
    return substr(t,1,2)":"substr(t,3,2)":"substr(t,5,2) 
} 
{ 
    print $5","$4","$7","$8","$9","formattime($10)","formattime($11) > Numbers/Final/Final_$line.txt 
} 

可以保存,文件模式700,並直接調用爲:

dostuff.awk filename 

其他awk的選項包括更改領域原位,所以如果你想保持整個原始文件,但與格式的日期時間,您可以對上述內容進行修改。在print塊更改爲:

{ 
    $10=formattime($10) 
    $11=formattime($11) 
    print $0 
} 

如果不這樣做,你需要的一切,希望它給一些想法,這將有助於該代碼。

0

在Perl中,這是接近孩子們的遊戲:

#!/usr/bin/env perl 
use strict; 
use warnings; 
use English(-no_match_vars); 

local($OFS) = ","; 
while (<>) 
{ 
    my(@F) = split /,/; 
    $F[9] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[9]; 
    $F[10] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[10]; 
    print @F; 
} 

如果你不想使用English,你可以寫local($,) = ",";代替;它控制輸出字段分隔符,選擇使用逗號。該代碼讀取文件中的每一行,拆分它的逗號,需要最後兩個領域,從零算起,以及(如果他們不爲空),在對數字之間插入冒號。我相信'Code Golf'解決方案會縮短很多,但如果您知道任何Perl,這個解決方案就會變得非常簡單。

這將是迄今爲止比腳本更快,這不僅是因爲它沒有進行排序任何東西,也因爲所有的處理是在單次通過文件在一個單一的過程中完成的。每行輸入運行多個進程(如代碼中所示),當文件很大時,會造成性能災難。

您所提供的樣本數據的輸出是:

10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,07:16:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,07:26:00,07:26:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:32:00,07:32:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:35:00,07:35:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,07:37:00,07:37:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,07:39:00,07:39:00, 
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:44:00,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,09:02:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:09:00,09:09:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:15:00,09:15:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,09:19:00,09:19:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:25:00,09:25:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:29:00,09:29:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,09:32:00,09:32:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,09:35:00,09:35:00, 
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:45:00,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,17:01:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,17:04:00,17:04:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,17:07:00,17:07:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:10:00,17:10:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:15:00,17:15:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,17:19:00,17:19:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:25:00,17:25:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:29:00,17:29:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:35:00,17:35:00, 
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:41:00,, 
相關問題