2015-11-26 87 views
0

我試圖用java逗號分隔的CSV文件轉換爲製表符分隔的csv文件spearated。然而,文件內部很少有值包含逗號。請參考下面的例子:轉換逗號分隔的CSV文件選項卡用java

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000 

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000 

所以任何人都可以幫助我如何處理這些值?

謝謝。

+0

那麼,什麼是模式?你提到了幾個有逗號的值?這些值只是數值嗎?這是所有單線還是多線? – Raf

+0

@Raf:我現在ahve更新的記錄。上面有2條記錄。此外導致問題的值是數字。例如,「8,455,844」。 – user1496783

回答

2

我認爲最好的辦法是依靠不改變模式。您曾提到,你必須具有逗號作爲千位分隔符的數字問題。我看到在你的文章中,這些數字是用雙引號括起來的。基於以下假設:

  1. 數雙引號括起來
  2. 有一個在每一行(如果多於一個,然後找到所有對雙引號,並將其存儲在只有這樣的號碼中的一個數組或列表,並檢查,以確保指數不會在每個

那麼你的做的範圍內)屬於下列內容:

  1. 獲取雙引號即第一指標154
  2. 獲得雙引號的第二個/最後一個索引,即159
  3. 用逗號替換所有逗號,前提是逗號的索引小於第一個雙引號的第一個索引或逗號的索引大於雙引號的最後一個索引(這應該跳過數的逗號與\噸代替)

下面的代碼不正是上面爲您:

import java.io.BufferedReader; 
import java.io.File; 
import java.io.FileReader; 
import java.io.PrintWriter; 
import java.util.ArrayList; 
import java.util.List; 

public class CsvToTabConvertor { 
    public static void main(String[] args) { 
     File file = new File("C:\\test_java\\csvtotab.txt"); 
     List<String> processedLines = new ArrayList<String>(); 

     try { 
      BufferedReader br = new BufferedReader(new FileReader(file)); 
      String line; 
      StringBuilder builder; 
      while((line=br.readLine()) != null) { 
       builder = new StringBuilder(line); 

       //find number in double quote - assuming there is only one number with double quotes 
       int doubleQuoteIndexStart = builder.indexOf("\""); 
       int doubleQuoteIndexLast = builder.lastIndexOf("\""); 

       //for each line, find all indexes of comma 
       int index = builder.indexOf(","); 

       //previous used to when there is consecutive comma 
       int prevIndex = 0; 

       while (index >= 0) { 
        if(index < doubleQuoteIndexStart || index > doubleQuoteIndexLast) { 
         builder.setCharAt(index, '\t'); 
        } 

        //get next index of comma 
        index = builder.indexOf(",", index + 1); 

        //check for consecutive commas 
        if(index != -1 && (prevIndex +1) == index) { 
         builder.setCharAt(index, ' '); 
         //get next index of comma 
         index = builder.indexOf(",", index + 1); 
        } 
       } 

       //add the line to list of lines for later storage to file 
       processedLines.add(builder.toString()); 
      } 

      //close the output stream 
      br.close(); 

      //write all the lines to the file 
      File outFile = new File("C:\\test_java\\csvtotab_processed.txt"); 
      PrintWriter writer = new PrintWriter(outFile); 
      for(int i = 0; i < processedLines.size(); i++) { 
       writer.println(processedLines.get(i)); 
      } 

      writer.close(); 
     } catch(Exception ex) { 
      //handle exception 
     } 
    } 
} 

輸入文件包含以下行:

Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000 
Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000 

處理後的輸出文件是如下:

Direct - House eBay House Advertiser 537121661  160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_MyeBay_US 538146889 2015-11-18 "8,455,844" 0 0 0 0.000000 USD 0.000000 0.000000 0.000000 
Direct - House eBay House Advertiser 537121661  160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_Search_SLR 538146895 2015-11-18 "20,175,240" 30 0 0 0.000000 USD 0.000000 0.000000 0.000000 

修改上面的代碼和它的邏輯,以滿足任何需求進一步。

+0

非常感謝@Raf :) – user1496783

+0

@ user1496783歡迎您,如果答案有助於解決您的問題,那麼您可以選擇將答案標記爲**接受**,您可以在這裏閱讀更多http:// stackoverflow .com/help/accepted-answer – Raf

+0

只是用於我自己的問題,工作得很好! @Raf歡呼我想用製表符替換逗號,然後用逗號完全停止,所以我只是複製/粘貼代碼並將其運行到相同的主文件中。第一個輸出是第二個輸入。不是最優雅的代碼,但做了這份工作。 –

相關問題