我目前正在處理包含格式化爲數據塊的文件信息的大型數據集。我正在嘗試從文件路徑行獲取一段數據,並將其作爲新列添加到特定行上。該數據集包含格式化的,像這樣的文件信息:使用awk或sed格式化特定數據
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/aab17eb15d782d7b/af38f2bcc4998af0/0d8eb680024af333.jar
Inode Num: 22525898
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
45:97:2a:60:e3:69 3208 10
7a:8b:8e:20:7b:38 1982 10
b9:45:3d:f4:97:88 1849 10
Whole File Hash: 865999b40fd9
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/1e82b13443330bb3/12fd3e87b2f62dc8/6e1a9f0b0a281564.c
Inode Num: 31881221
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
e8:b0:cb:6f:76:ff 1344 10
19:c5:b2:aa:b3:60 613 10
11:7c:7e:76:4b:d5 1272 10
36:e0:59:49:b6:4a 581 10
9c:31:bc:8a:39:94 3296 10
01:f0:56:3a:e1:a9 1140 10
Whole File Hash: 4b28b44ae03d
我所想要做的是採取文件類型(.jar和.C在這個例子中),並追加到各自的塊散列行,以便最終格式化看起來像:
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/aab17eb15d782d7b/af38f2bcc4998af0/0d8eb680024af333.jar
Inode Num: 22525898
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
45:97:2a:60:e3:69 3208 10 .jar
7a:8b:8e:20:7b:38 1982 10 .jar
b9:45:3d:f4:97:88 1849 10 .jar
Whole File Hash: 865999b40fd9
File path: /d9b50a6f54d5a1f8/7b3d459a3454703c/a6d1040ea2c84e10/afcbe93ced71e5e6/2b517a561f5da8a6/1e82b13443330bb3/12fd3e87b2f62dc8/6e1a9f0b0a281564.c
Inode Num: 31881221
Chunk Hash Chunk Size (bytes) Compression Ratio (tenth)
e8:b0:cb:6f:76:ff 1344 10 .c
19:c5:b2:aa:b3:60 613 10 .c
11:7c:7e:76:4b:d5 1272 10 .c
36:e0:59:49:b6:4a 581 10 .c
9c:31:bc:8a:39:94 3296 10 .c
01:f0:56:3a:e1:a9 1140 10 .c
Whole File Hash: 4b28b44ae03d
我已經有awk的代碼拉文件類型和塊散列線:
awk 'match($0,/\..+/) {print substr($0,RSTART,RLENGTH)}'
awk '/Chunk Hash/{flag=1;next}/Whole File Hash:/{flag=0}flag'
我只是對如何使用這些連接件不知道wk(或sed)將文件類型作爲新列附加到其各自數據塊中的每一行上。另一件需要注意的是,我正試圖在bash腳本中做到這一點,如果這有所作爲。
某些行加倍,應刪除從地址範圍塊的'p'命令。 – SLePort
@Kenavoz呃,是的,'N'沒有'-n'選項打印......謝謝! –
這很好,謝謝! –