2016-11-03 53 views
0

我正在寫一個bash腳本來對文件進行一些文本處理。文件的一部分包含必須重新格式化的值的網格,以便具有正確數量的列。bash調整文本網格

這裏是一個網格的例子,在這種情況下,必須格式化爲16列;

702.0 697.0 687.0 685.0 693.0 700.0 693.0 681.0 676.0 684.0 
    694.0 700.0 704.0 710.0 710.0 710.0 
    711.0 704.0 697.0 690.0 693.0 699.5 696.0 692.0 680.0 687.0 
    696.0 705.0 709.0 714.0 716.0 714.0 
    722.0 711.0 708.0 700.0 696.0 703.0 701.0 692.0 678.0 684.0 
    695.0 707.0 712.0 713.0 716.0 717.0 
    727.0 718.0 712.0 707.0 705.0 706.5 701.0 692.0 680.0 683.0 
    693.0 706.0 714.0 718.0 720.0 718.0 
    732.0 728.0 725.0 718.0 715.0 708.0 699.0 693.0 683.0 681.0 
    694.0 703.0 711.0 715.0 723.0 727.0 
    738.0 735.0 732.0 721.0 723.0 712.0 702.0 696.0 690.0 681.0 
    693.0 701.0 709.0 712.0 720.0 726.0 
    736.5 736.5 734.0 728.0 726.5 718.8 714.5 707.5 701.0 687.0 
    684.5 695.5 703.0 708.0 716.0 721.5 
    736.0 734.0 727.0 726.0 723.0 720.0 723.0 713.0 708.0 699.0 
    678.0 686.0 696.0 706.0 712.0 714.0 
    729.0 726.0 717.0 716.0 715.0 717.0 720.0 714.0 710.0 700.0 
    678.0 679.0 689.0 700.0 702.0 708.0 
    722.0 719.0 713.0 709.0 705.0 711.0 719.0 716.0 706.0 697.0 
    680.0 679.0 682.0 694.0 698.0 702.0 
    712.0 713.0 707.0 704.0 697.0 708.5 719.0 715.0 705.0 693.0 
    678.0 680.0 682.0 683.0 685.0 691.0 
    707.0 706.0 702.0 693.0 699.0 710.5 712.0 707.0 701.0 687.0 
    677.0 687.0 686.0 686.0 680.0 682.0 

這是我迄今爲止的腳本,它沒有按照預期執行;

#!/bin/bash 

Target=${1:-"grid.dat"} 
Outfile="grid.new.dat" 
ColumnCount="16" 

RawGrid=() 
while read line; do 
    RawGrid+=($line) 
done < <(cat ${Target}) 
echo "${#RawGrid[@]} cells found!" 

echo "" > ${Outfile} 
for ((i=0; i < ${#RawGrid[@]}; i+=1)); do 
    echo -n " ${RawGrid[$i]}" >> $Outfile 
    ((i % ${ColumnCount} == 0)) && ((i > 0)) && echo "" >> $Outfile # New row 
done 

我特別堅持的部分是使用正確數量的列打印網格。也許我沒有正確填充數組?

+1

試'的grep -o '[0-9] \ +' ip.txt | pr -16ats' – Sundeep

+0

如果列數大於20〜,這似乎不起作用。它還會刪除每列之前的空格,必須予以保留。 – BZP

+0

間距丟失,同意..但這應該工作,不管列數 – Sundeep

回答

0

下面是我如何解決這個問題;

#!/bin/bash 

unset IFS # reset internal field separator 

Target=${1:-"grid.dat"} 
Outfile="grid.new.dat" 
ColumnCount="16" 
Separator=" " 

Counter=0 # total cell count 
while read line; do 
    # Default IFS is space, so this "just works" 
    for field in $line; do 
     Counter=$((Counter+1)) 
     printf "$field" >> $Outfile 
     # If counter is divisible by 16, insert a new line. Otherwise, insert 
     # separator 
     if [ $((Counter%ColumnCount)) -eq 0 ]; then 
      printf "\n" >> $Outfile 
     else 
      printf "$Separator" >> $Outfile 
     fi 
    done 
done < ${Target} # no need for cat here 
echo "${Counter} cells found!" 

這是我的MBP比腳本的速度慢一點在你的問題與原dataset-

$ command time -l ./original.sh 
192 cells found! 
     0.03 real   0.01 user   0.00 sys 
    2596864 maximum resident set size 
     0 average shared memory size 
     0 average unshared data size 
     0 average unshared stack size 
     1815 page reclaims 
     0 page faults 
     0 swaps 
     0 block input operations 
     3 block output operations 
     0 messages sent 
     0 messages received 
     1 signals received 
     2 voluntary context switches 
     126 involuntary context switches 

$ command time -l ./new.sh 
192 cells found! 
     0.04 real   0.01 user   0.01 sys 
    2555904 maximum resident set size 
     0 average shared memory size 
     0 average unshared data size 
     0 average unshared stack size 
     644 page reclaims 
     0 page faults 
     0 swaps 
     0 block input operations 
     1 block output operations 
     0 messages sent 
     0 messages received 
     0 signals received 
     0 voluntary context switches 
     333 involuntary context switche 

然而,因爲它只存儲最新行+流場+反在內存中,它現在可以處理無數的行。讓我們嘗試一個更大的數據集...

$ for i in {1..1000}; do cat grid.dat >> monster-grid.dat; done 
$ command time -l ./new.sh monster-grid.dat 
192000 cells found! 
     28.88 real  11.38 user  10.02 sys 
    2936832 maximum resident set size 
     0 average shared memory size 
     0 average unshared data size 
     0 average unshared stack size 
     737 page reclaims 
     0 page faults 
     0 swaps 
     0 block input operations 
     5 block output operations 
     0 messages sent 
     0 messages received 
     0 signals received 
     2 voluntary context switches 
    356677 involuntary context switches 

$ command time -l ./original.sh monster-grid.dat 
192000 cells found! 
     266.32 real  222.08 user  13.82 sys 
    12320768 maximum resident set size 
     0 average shared memory size 
     0 average unshared data size 
     0 average unshared stack size 
     4222 page reclaims 
     0 page faults 
     0 swaps 
     0 block input operations 
     23 block output operations 
     0 messages sent 
     0 messages received 
     1 signals received 
    179607 voluntary context switches 
    295379 involuntary context switches 

現在我們可以看到,表現在兩種情況相當差,但新一確實節省一些內存,現在實際上更快與更大的結果集比原始腳本。但是,是的,作爲@Sundeep表明,pr迄今爲止最好的答案 -

$ command time -l grep -o '[0-9.]\+' monster-grid.dat | pr -16ats > grid.new.dat 
     0.17 real   0.17 user   0.00 sys 
    2220032 maximum resident set size 
     0 average shared memory size 
     0 average unshared data size 
     0 average unshared stack size 
     562 page reclaims 
     0 page faults 
     0 swaps 
     0 block input operations 
     0 block output operations 
     0 messages sent 
     0 messages received 
     0 signals received 
     0 voluntary context switches 
     198 involuntary context switches