2013-03-31 128 views
0

我想解析一個文本文件,其中包含大多數文本和單個數字的行(在每行的開頭用'#'), 。該文件的第二部分由多個數字構成,全部與單個結構相關。由於我需要將這些輸出文件合併成幾百個案例,如果我可以將這些文件中的每一個處理爲單行數據,這將非常有幫助。我在使用bash/perl/awk的組合時遇到了麻煩。任何人都可以建議我可以做到這一點的方式? (下面的示例文件)。將複雜的文本文件解析爲一行字段名稱和值的第二行

謝謝您的考慮。

最良好的祝願,

-S

# Title Segmentation Statistics 
# 
# generating_program mri_segstats 
# cvs_version $Id: mri_segstats.c,v 1.75.2.9 2013/02/16 00:09:33 greve Exp $ 
# cmdline mri_segstats --seg mri/aseg.mgz --sum stats/aseg.stats --pv mri/norm.mgz --empty --brainmask mri/brainmask.mgz --brain-vol-from-seg --excludeid 0 --excl-ctxgmwm --supratent --subcortgray --in mri/norm.mgz --in-intensity-name norm --in-intensity-units MR --etiv --surf-wm-vol --surf-ctx-vol --totalgray --euler --ctab /mnt/glusterfs/salsoman/freesurfer/ASegStatsLUT.txt --subject WCA_0162_T1_FS 
# sysname Linux 
# hostname barley15.stanford.edu 
# machine x86_64 
# user  salsoman 
# anatomy_type volume 
# 
# SUBJECTS_DIR /mnt/glusterfs/salsoman/output/FS 
# subjectname WCA_0162_T1_FS 
# Measure BrainSeg, BrainSegVol, Brain Segmentation Volume, 1089921.000000, mm^3 
# Measure BrainSegNotVent, BrainSegVolNotVent, Brain Segmentation Volume Without Ventricles, 993734.000000, mm^3 
# Measure BrainSegNotVentSurf, BrainSegVolNotVentSurf, Brain Segmentation Volume Without Ventricles from Surf, 993214.631437, mm^3 
# Measure lhCortex, lhCortexVol, Left hemisphere cortical gray matter volume, 240339.518738, mm^3 
# Measure rhCortex, rhCortexVol, Right hemisphere cortical gray matter volume, 236468.599276, mm^3 
# Measure Cortex, CortexVol, Total cortical gray matter volume, 476808.118013, mm^3 
# Measure lhCorticalWhiteMatter, lhCorticalWhiteMatterVol, Left hemisphere cortical white matter volume, 191135.667925, mm^3 
# Measure rhCorticalWhiteMatter, rhCorticalWhiteMatterVol, Right hemisphere cortical white matter volume, 180013.845498, mm^3 
# Measure CorticalWhiteMatter, CorticalWhiteMatterVol, Total cortical white matter volume, 371149.513423, mm^3 
# Measure SubCortGray, SubCortGrayVol, Subcortical gray matter volume, 52383.000000, mm^3 
# Measure TotalGray, TotalGrayVol, Total gray matter volume, 604954.118013, mm^3 
# Measure SupraTentorial, SupraTentorialVol, Supratentorial volume, 991108.631437, mm^3 
# Measure SupraTentorialNotVent, SupraTentorialVolNotVent, Supratentorial volume, 902611.631437, mm^3 
# Measure SupraTentorialNotVentVox, SupraTentorialVolNotVentVox, Supratentorial volume voxel count, 900542.000000, mm^3 
# Measure Mask, MaskVol, Mask Volume, 1694747.000000, mm^3 
# Measure BrainSegVol-to-eTIV, BrainSegVol-to-eTIV, Ratio of BrainSegVol to eTIV, 0.624390, unitless 
# Measure MaskVol-to-eTIV, MaskVol-to-eTIV, Ratio of MaskVol to eTIV, 0.970881, unitless 
# Measure lhSurfaceHoles, lhSurfaceHoles, Number of defect holes in lh surfaces prior to fixing, 239, unitless 
# Measure rhSurfaceHoles, rhSurfaceHoles, Number of defect holes in rh surfaces prior to fixing, 227, unitless 
# Measure SurfaceHoles, SurfaceHoles, Total number of defect holes in surfaces prior to fixing, 466, unitless 
# Measure EstimatedTotalIntraCranialVol, eTIV, Estimated Total Intracranial Volume, 1745576.756023, mm^3 
# SegVolFile mri/aseg.mgz 
# SegVolFileTimeStamp 2013/03/27 19:34:08 
# ColorTable /mnt/glusterfs/salsoman/freesurfer/ASegStatsLUT.txt 
# ColorTableTimeStamp 2013/02/25 22:23:16 
# InVolFile mri/norm.mgz 
# InVolFileTimeStamp 2013/03/27 14:00:28 
# InVolFrame 0 
# PVVolFile mri/norm.mgz 
# PVVolFileTimeStamp 2013/03/27 14:00:28 
# Excluding Cortical Gray and White Matter 
# ExcludeSegId 0 2 3 41 42 
# VoxelVolume_mm3 1 
# TableCol 1 ColHeader Index 
# TableCol 1 FieldName Index 
# TableCol 1 Units  NA 
# TableCol 2 ColHeader SegId 
# TableCol 2 FieldName Segmentation Id 
# TableCol 2 Units  NA 
# TableCol 3 ColHeader NVoxels 
# TableCol 3 FieldName Number of Voxels 
# TableCol 3 Units  unitless 
# TableCol 4 ColHeader Volume_mm3 
# TableCol 4 FieldName Volume 
# TableCol 4 Units  mm^3 
# TableCol 5 ColHeader StructName 
# TableCol 5 FieldName Structure Name 
# TableCol 5 Units  NA 
# TableCol 6 ColHeader normMean 
# TableCol 6 FieldName Intensity normMean 
# TableCol 6 Units  MR 
# TableCol 7 ColHeader normStdDev 
# TableCol 7 FieldName Itensity normStdDev 
# TableCol 7 Units  MR 
# TableCol 8 ColHeader normMin 
# TableCol 8 FieldName Intensity normMin 
# TableCol 8 Units  MR 
# TableCol 9 ColHeader normMax 
# TableCol 9 FieldName Intensity normMax 
# TableCol 9 Units  MR 
# TableCol 10 ColHeader normRange 
# TableCol 10 FieldName Intensity normRange 
# TableCol 10 Units  MR 
# NRows 45 
# NTableCols 10 
# ColHeaders Index SegId NVoxels Volume_mm3 StructName normMean normStdDev normMin normMax normRange 
    1 4  41962 41962.4 Left-Lateral-Ventricle   22.0753 10.2057  3.0000 94.0000 91.0000 
    2 5  2150  2149.7 Left-Inf-Lat-Vent     37.5636 16.3886  5.0000 89.0000 84.0000 
    3 7  8273  8273.3 Left-Cerebellum-White-Matter  88.0903 11.6908 21.0000 123.0000 102.0000 
    4 8  35427 35427.4 Left-Cerebellum-Cortex   56.4255 12.5475  2.0000 92.0000 90.0000 
    5 10  6087  6086.7 Left-Thalamus-Proper    92.2098 11.7928 50.0000 124.0000 74.0000 
    6 11  5101  5100.7 Left-Caudate      75.0335  9.9708 29.0000 100.0000 71.0000 
    7 12  4773  4773.0 Left-Putamen      75.7113  6.2195 48.0000 95.0000 47.0000 
    8 13  1178  1177.6 Left-Pallidum      86.3354  6.2568 59.0000 104.0000 45.0000 
    9 14  2973  2973.1 3rd-Ventricle      27.5508 11.3394  9.0000 77.0000 68.0000 
10 15  2403  2403.0 4th-Ventricle      26.8237 11.9581  6.0000 79.0000 73.0000 
11 16  18347 18347.2 Brain-Stem      82.1731 12.0144 15.0000 116.0000 101.0000 
12 17  3824  3824.2 Left-Hippocampus     66.7333  8.6661 26.0000 100.0000 74.0000 
13 18  2087  2087.1 Left-Amygdala      63.9856  7.2932 37.0000 91.0000 54.0000 
14 24  2094  2094.0 CSF        36.2929 14.6972 12.0000 90.0000 78.0000 
15 26  340  340.0 Left-Accumbens-area    69.8967  8.7139 37.0000 87.0000 50.0000 
16 28  2969  2969.5 Left-VentralDC     94.9737 13.6527 44.0000 122.0000 78.0000 
17 30  76  75.9 Left-vessel      58.3205 11.6736 27.0000 80.0000 53.0000 
18 31  1103  1102.6 Left-choroid-plexus    51.7182 16.3692 12.0000 100.0000 88.0000 
19 43  38108 38108.2 Right-Lateral-Ventricle   20.2269 10.2570  0.0000 92.0000 92.0000 
20 44  2165  2165.0 Right-Inf-Lat-Vent    30.2048 13.6808  0.0000 80.0000 80.0000 
21 46  9715  9715.4 Right-Cerebellum-White-Matter  86.9395  8.3909 25.0000 115.0000 90.0000 
22 47  41688 41688.2 Right-Cerebellum-Cortex   57.5291 10.3208  9.0000 91.0000 82.0000 
23 49  4769  4769.3 Right-Thalamus-Proper    82.0576 12.2446 18.0000 106.0000 88.0000 
24 50  4587  4587.1 Right-Caudate      69.9613 12.7863 11.0000 103.0000 92.0000 
25 51  4694  4694.4 Right-Putamen      69.9372  7.9141 48.0000 91.0000 43.0000 
26 52  1407  1406.8 Right-Pallidum     88.0501  5.7841 57.0000 105.0000 48.0000 
27 53  3160  3159.6 Right-Hippocampus     63.3511  8.9283 17.0000 95.0000 78.0000 
28 54  1877  1877.4 Right-Amygdala     57.3686  8.5163 20.0000 83.0000 63.0000 
29 58  376  376.0 Right-Accumbens-area    70.4901  9.9104 41.0000 96.0000 55.0000 
30 60  2973  2972.7 Right-VentralDC     89.6143 14.1755 29.0000 120.0000 91.0000 
31 62  105  105.1 Right-vessel      50.1458 12.1126 21.0000 78.0000 57.0000 
32 63  2843  2842.7 Right-choroid-plexus    46.3759 13.8319  6.0000 115.0000 109.0000 
33 72  68  67.9 5th-Ventricle      42.4444 11.2861 26.0000 83.0000 57.0000 
34 77  25325 25325.0 WM-hypointensities    71.8650 16.2379  5.0000 112.0000 107.0000 
35 78   0  0.0 Left-WM-hypointensities   0.0000  0.0000  0.0000  0.0000  0.0000 
36 79   0  0.0 Right-WM-hypointensities   0.0000  0.0000  0.0000  0.0000  0.0000 
37 80  153  153.1 non-WM-hypointensities   50.4551 16.1478 18.0000 88.0000 70.0000 
38 81   0  0.0 Left-non-WM-hypointensities  0.0000  0.0000  0.0000  0.0000  0.0000 
39 82   0  0.0 Right-non-WM-hypointensities  0.0000  0.0000  0.0000  0.0000  0.0000 
40 85  350  349.6 Optic-Chiasm      66.0833 15.7641 24.0000 102.0000 78.0000 
41 251  806  805.6 CC_Posterior      119.2646 18.1322 57.0000 150.0000 93.0000 
42 252  252  251.7 CC_Mid_Posterior     109.1685 16.3862 51.0000 150.0000 99.0000 
43 253  295  295.4 CC_Central      113.3418 16.2739 77.0000 140.0000 63.0000 
44 254  294  293.7 CC_Mid_Anterior     115.1645 17.9396 72.0000 149.0000 77.0000 
45 255  657  657.4 CC_Anterior      124.1047 22.5045 54.0000 166.0000 112.0000 

回答

0

你試過Talend Open Studio/Data Integration? TOS能夠自動執行如此複雜的轉換。數據轉換作業的最終可執行文件將是一個jar文件,您可以從shell腳本輕鬆調用該文件。開始使用TOS需要一段時間,但它非常強大。該產品已獲得GPL v2許可,並且具有相當活躍的community

當然你可以編寫一些awk/sed/perl orgy,你會得到一個結果,但在你的情況下,這樣的複雜轉換可能會變得難以理解和不可維護。

HTH,Michael

相關問題