2016-04-11 35 views
1

對於csv文件中第一列通常爲空的數據集,我想以編程方式在每行不以雙精度開頭的行中插入額外的逗號引號。 從下面的示例中,您將看到只有某些行以"開頭,這些行是我想忽略的行。每隔一行應該額外獲得,基於第一個字符,前提一個字符以輸入行

"1967 Universe",1967,"1,141.0",650.6,73.0,417.5,222.6,119.6,309.8,176.0,390.8,225.0,217.8,130.0 
,1968,"1,353.4",694.7,84.3,574.4,234.8,119.2,350.1,182.1,477.3,233.9,291.1,159.5 
,1969,"1,322.3",624.8,85.2,612.4,215.8,104.7,317.0,149.9,470.5,215.6,319.0,154.6 
,1970,"1,351.5",646.8,88.1,616.7,218.3,93.1,287.4,148.1,502.9,246.5,342.9,159.1 
,1971,"1,924.6",906.1,132.9,885.7,303.6,127.3,421.1,208.1,725.4,338.6,474.6,232.1 
"1972 Universe",1972,"2,218.9","1,033.1",148.6,"1,037.2",333.3,147.3,440.8,230.1,905.4,391.5,539.3,264.3 
,1973,"1,819.5",882.1,117.0,820.5,271.9,141.6,361.4,197.5,763.2,323.9,423.1,219.0 
,1974,"1,074.4",643.8,64.4,366.2,165.4,97.2,241.3,154.5,390.1,223.7,277.6,168.4 
,1975,939.2,675.5,63.8,199.8,129.5,92.6,241.5,169.3,292.7,222.8,275.5,190.8 
,1976,"1,296.2",893.6,93.1,309.5,152.4,111.2,326.1,219.4,401.7,292.6,416.0,270.5 
,1977,"1,690.0","1,126.1",121.3,442.7,181.9,126.8,402.4,269.3,561.1,370.1,544.6,359.9 
"1978 Universe",1978,"1,800.5","1,182.6",130.6,487.3,194.4,132.9,388.0,260.8,667.6,439.3,550.5,349.6 
,1979,"1,551.8",981.5,125.4,444.8,166.9,110.2,289.1,182.4,628.0,392.1,467.7,296.7 
,1980,"1,190.6",710.4,114.5,365.7,117.9,75.7,192.0,107.9,561.9,333.0,318.9,193.7 
,1981,985.5,564.3,101.8,319.4,109.8,65.7,133.3,78.3,491.1,270.7,251.3,149.5 
,1982,"1,000.5",546.4,88.3,365.8,106.7,65.7,126.3,67.6,543.5,281.1,224.1,132.0 
,1983,"1,605.2",901.5,133.7,570.1,164.1,112.3,187.8,117.4,862.9,443.0,390.4,228.6 

我認爲這可能與sed做,但我還沒有與正則表達式熟悉,知道如何做一個有條件的更換。

我發現的唯一的例子是類似的

sed -ri 's/[^,]+/REPLACEMENT/' file.csv 

所以它可能看起來像

sed -ri 's/[^,]+/,/' file.csv 

,但我不認爲這是正確的

sed 's/\,/\,,/' 

幾乎可以工作,但它會在逗號後面的每個第一個實例之後替換逗號,而這並不是我所追求的。例如:

"2004 Universe",,2004,"2,070.1","1,613.4",90.4,366.2,197.0,131.8,370.5,295.5,960.8,756.1,541.9,430.0 
,,2005,"2,155.3","1,682.0",84.0,389.3,203.8,126.6,353.9,278.7,"1,039.0",826.8,558.6,450.0 

有人可以幫忙嗎?

+0

應該在哪裏逗號添加?在線的開始,或者在最後,還是在其他地方?理想情況下,您應該同時描述逗號所在的位置並顯示樣本輸入的所需輸出。 –

+0

@JonathanLeffler:最後的例子意味着逗號應該_prepended_到行;我已經相應修改了標題。 – mklement0

+1

@ mklement0 - 同意這一切都意味着'prepended',但這個問題不應該真的暗示它;它應該說明它。最後一個例子的麻煩是字符串字段後面有一個逗號,所以它實際上是最後一個'sed'沒有按照需要工作的例子 - 而不是實際需要的。我同意推論並不難,但最好的問題不會讓那些試圖回答推斷任何東西(至少關於期望的輸出)。 –

回答

2

擴展正則表達式選項-r將插入上開始逗號線的附加逗號。
circumflex表示行的開始。

1

有關問題的積極配方 - 前面加上,,開始的所有行 - 看karakfa's answer

要前置,所有線是"啓動,請嘗試以下操作:

sed -i -r 's/^[^"]/,&/' file.csv 
  • ^[^"]在開始捕捉任何單個字符不是一個"[^"]) (^)並將其替換爲,,後跟匹配正則表達式(&)的字符串,即該行的第一個字符。 (在您的示例輸入中也是,)。

    • ^的可能混淆的不同用途:作爲一方面的開口定界符/之後啓動的線錨,並且作爲反運算符作爲括號表達式中的第一個字符(字符設置)([^...])。

    • 需要注意的是非標準選項-r激活用於ERES支持(擴展正則表達式),而sed默認預計BREs裏面(基本正則表達式)。
      上面的作品同樣沒有-r,但是如果使用ERE是一個選項,那麼它們是可取的,因爲它們提供了更多的功能,它們的語法更像其他語言中的正則表達式。

  • 不匹配,即,開始與"任何線路,通過按原樣傳遞的任何行。

至於你試過

sed -ri 's/[^,]+/,/' file.csv 

比賽一個或多個+)字符以外,上線的任何位置,因爲括號表達式(字符設置)[^,]不是錨定到字符串的開頭(與在/之後的^)。

除此之外,您的替換字符串不納入匹配字符串,所以它會,取代的比賽,而不是前面加上它。

sed 's/\,/\,,/' 

又缺乏錨定在字符串開頭的正則表達式,因此將取代第一,上線的任何位置。 (此外,在這種情況下不需要\)。不需要

sed -i 's/^,/,&/' file 

+1

謝謝你的回答mklement0,你已經幫我理解了爲什麼我遇到了問題以及如何解決它!我同意,與我最終試圖做的相比,這個問題的積極表述可能是最好的方法。 我也會在正則表達式上做更多的閱讀,這樣我可以更好地理解匹配和錨定! 我試圖接受這兩個答案作爲解決方案,但它只會讓我選擇一個:\ – Lazarix

0

FILE.CSV:

"1967 Universe",1967,"1,141.0",650.6,73.0,417.5,222.6,119.6,309.8,176.0,390.8,225.0,217.8,130.0 
,1968,"1,353.4",694.7,84.3,574.4,234.8,119.2,350.1,182.1,477.3,233.9,291.1,159.5 
,1969,"1,322.3",624.8,85.2,612.4,215.8,104.7,317.0,149.9,470.5,215.6,319.0,154.6 
,1970,"1,351.5",646.8,88.1,616.7,218.3,93.1,287.4,148.1,502.9,246.5,342.9,159.1 
,1971,"1,924.6",906.1,132.9,885.7,303.6,127.3,421.1,208.1,725.4,338.6,474.6,232.1 
"1972 Universe",1972,"2,218.9","1,033.1",148.6,"1,037.2",333.3,147.3,440.8,230.1,905.4,391.5,539.3,264.3 
,1973,"1,819.5",882.1,117.0,820.5,271.9,141.6,361.4,197.5,763.2,323.9,423.1,219.0 

用途:

$ sed 's/^,/,,/' file.csv 

輸出:

"1967 Universe",1967,"1,141.0",650.6,73.0,417.5,222.6,119.6,309.8,176.0,390.8,225.0,217.8,130.0 
,,1968,"1,353.4",694.7,84.3,574.4,234.8,119.2,350.1,182.1,477.3,233.9,291.1,159.5 
,,1969,"1,322.3",624.8,85.2,612.4,215.8,104.7,317.0,149.9,470.5,215.6,319.0,154.6 
,,1970,"1,351.5",646.8,88.1,616.7,218.3,93.1,287.4,148.1,502.9,246.5,342.9,159.1 
,,1971,"1,924.6",906.1,132.9,885.7,303.6,127.3,421.1,208.1,725.4,338.6,474.6,232.1 
"1972 Universe",1972,"2,218.9","1,033.1",148.6,"1,037.2",333.3,147.3,440.8,230.1,905.4,391.5,539.3,264.3 
,,1973,"1,819.5",882.1,117.0,820.5,271.9,141.6,361.4,197.5,763.2,323.9,423.1,219.0 
相關問題