我認爲最好的辦法是依靠不改變模式。您曾提到,你必須具有逗號作爲千位分隔符的數字問題。我看到在你的文章中,這些數字是用雙引號括起來的。基於以下假設:
- 數雙引號括起來
- 有一個在每一行(如果多於一個,然後找到所有對雙引號,並將其存儲在只有這樣的號碼中的一個數組或列表,並檢查,以確保指數不會在每個
那麼你的做的範圍內)屬於下列內容:
- 獲取雙引號即第一指標154
- 獲得雙引號的第二個/最後一個索引,即159
- 用逗號替換所有逗號,前提是逗號的索引小於第一個雙引號的第一個索引或逗號的索引大於雙引號的最後一個索引(這應該跳過數的逗號與\噸代替)
下面的代碼不正是上面爲您:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.List;
public class CsvToTabConvertor {
public static void main(String[] args) {
File file = new File("C:\\test_java\\csvtotab.txt");
List<String> processedLines = new ArrayList<String>();
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder builder;
while((line=br.readLine()) != null) {
builder = new StringBuilder(line);
//find number in double quote - assuming there is only one number with double quotes
int doubleQuoteIndexStart = builder.indexOf("\"");
int doubleQuoteIndexLast = builder.lastIndexOf("\"");
//for each line, find all indexes of comma
int index = builder.indexOf(",");
//previous used to when there is consecutive comma
int prevIndex = 0;
while (index >= 0) {
if(index < doubleQuoteIndexStart || index > doubleQuoteIndexLast) {
builder.setCharAt(index, '\t');
}
//get next index of comma
index = builder.indexOf(",", index + 1);
//check for consecutive commas
if(index != -1 && (prevIndex +1) == index) {
builder.setCharAt(index, ' ');
//get next index of comma
index = builder.indexOf(",", index + 1);
}
}
//add the line to list of lines for later storage to file
processedLines.add(builder.toString());
}
//close the output stream
br.close();
//write all the lines to the file
File outFile = new File("C:\\test_java\\csvtotab_processed.txt");
PrintWriter writer = new PrintWriter(outFile);
for(int i = 0; i < processedLines.size(); i++) {
writer.println(processedLines.get(i));
}
writer.close();
} catch(Exception ex) {
//handle exception
}
}
}
輸入文件包含以下行:
Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000
Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000
處理後的輸出文件是如下:
Direct - House eBay House Advertiser 537121661 160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_MyeBay_US 538146889 2015-11-18 "8,455,844" 0 0 0 0.000000 USD 0.000000 0.000000 0.000000
Direct - House eBay House Advertiser 537121661 160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_Search_SLR 538146895 2015-11-18 "20,175,240" 30 0 0 0.000000 USD 0.000000 0.000000 0.000000
修改上面的代碼和它的邏輯,以滿足任何需求進一步。
那麼,什麼是模式?你提到了幾個有逗號的值?這些值只是數值嗎?這是所有單線還是多線? – Raf
@Raf:我現在ahve更新的記錄。上面有2條記錄。此外導致問題的值是數字。例如,「8,455,844」。 – user1496783