我有一個製表符分隔在Unix文件具有數據問題

-3

我必須確保每行有4列，但輸入的數據是相當混亂：我有一個製表符分隔在Unix文件具有數據問題

第一行頭。
第二行有效，因爲它有4列。
第三個也是有效的（這是確定如果描述字段爲空）

ID字段和「上帝保佑我」的最後一列PNumber是不是空字段。

正如人們可以看到第4行因爲「描述列」中的換行符而混亂了，它跨越了多行。

ID Name Description Phnumber 
1051 John 5674 I am doing good, is this task we need to fix 908342 
1065 Rohit    9876246 
10402 rob I am  
    doing good, 
    is this task we need to fix  908341 
105552 "Julin rob hain" i know what to do just let me do it  
    " 
    " 
    " 
    " 
    " 
    " 
908452 1051 Dave I am doing reporting this week 88889999

也許截圖將使它更容易看出問題

每一行都會以數字開頭，並與多家結束。每行應該有4列。

期望的輸出

ID  Name Description           Phnumber 
1051 John 5674 I am doing good, is this task we need to fix 908342 
1065 Rohit              9876246 
10402 rob I am doing good, 563 is this task we need to fix  908341 
105552 "Julin rob hain" i know what to do just let me do it   908452 
1051 Dave I am doing reporting this week      88889999

的數據是採樣數據的實際文件具有12列。是列之間可以有數字和少數是日期字段（如2017-03-02）

來源

2017-03-25 dragon007

這是一個AI機器人假裝嗎？ – karakfa

@krivamsh我清理了一下你的問題，但你仍然應該**添加你想問**的實際問題。現在它只定義和說明問題 – fvu

有一百萬種方法可以從樣本輸入中每行生成4列。發佈相關的期望輸出結果給我們一個你實際需要的排列的線索，並且至少告訴我們算法應該是什麼來識別列，即使你沒有自己的代碼嘗試發佈（見[問]）。 –

這奏效了

貓FILE_NAME | perl -0pe's/\ n（？！（[0-9] {6} | $）\ t）// g'| perl -pe's/\ r（？！（[0-9] {6} | $）\ t）// g'| sed'/^$/d'

來源

2017-04-04 17:41:19 dragon007

-1

awk來救援！

假定所有的數字字段不會出現，除了第一和最後一個領域

awk 'NR==1; 
    NR>1 {for(i=1;i<=NF;i++) 
      {if($i~/[0-9]+/) s=!s; printf "%s", $i (s?OFS:RS)}}' file 


ID Name Description Phnumber 
1051 John I am doing good, is this task we need to fix 908342 
10423 rob I am doing good, is this task we need to fix 908341 
1052 Julin rob hain i know what to do just let me do it " " " " " " 908452 
1051 Dave I am doing reporting this week 88889999

也許設置OFS到\t有更多的結構

來源

2017-03-25 16:39:02 karakfa

我有一個製表符分隔在Unix文件具有數據問題

回答

相關問題