沒有爲CSV格式沒有正式的標準,但我們注意到,在一開始就 您引用的醜列:
"abc, defghijk. "Lmnopqrs, "tuv,"" wxyz.",
不符合什麼被認爲是CSV的Basic Rules, 因爲其中的兩個是: -
如果問題列服從規則1),那麼它不遵守規則2)。但我們可以解釋它,以遵守規則1) - 所以我們可以說它在哪裏結束 - 如果我們平衡雙引號,例如
[abc, defghijk. [Lmnopqrs, ]tuv,[] wxyz.],
平衡的最外層引號包圍列。平衡內部報價 可以只是缺乏任何其他內部指示,除了平衡 使它們內部。
我們希望能有規則,它將分析這個文本作爲一列, 始終與規則1),並且還將解析 做遵守規則2)也列。剛剛展示的平衡表明此 可以完成,因爲遵守兩個規則的列必須是可平衡的。
建議的規則是:
- A柱延伸到由0雙引號之前或 由最後的偶數雙引號的後面的第一個逗號。
如果有任何偶數雙引號到逗號,那麼我們就知道 我們可以平衡封閉的報價,並且在至少一種方式平衡休息。
你正在考慮的比較簡單的規則:
運行到報價後,我應該讀引述垃圾字符一個字符,直到我發現」,依次
會?如果它與某些列是做服從規則2),如
「超級‘’豪華」「卡車」遇到失敗,
更簡單的規則將在""luxurious""
後終止列。但由於 此欄符合規則2),相鄰的雙引號是「轉義」雙引號,沒有定界的意義。另一方面,建議的 規則仍然正確解析列,在truck"
後終止它。
這裏是一個演示程序,其中功能get_csv_column
通過建議的規則解析列 :
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace std;
/*
Assume `in` is positioned at start of column.
Accumulates chars from `in` as long as `in` is good
until either:-
- Have consumed a comma preceded by 0 quotes,or
- Have consumed a comma immediately preceded by
the last of an even number of quotes.
*/
std::string get_csv_column(ifstream & in)
{
std::string col;
unsigned quotes = 0;
char prev = 0;
bool finis = false;
for (int ch; !finis && (ch = in.get()) != EOF;) {
switch(ch) {
case '"':
++quotes;
break;
case ',':
if (quotes == 0 || (prev == '"' && (quotes & 1) == 0)) {
finis = true;
}
break;
default:;
}
col += prev = ch;
}
return col;
}
int main()
{
ifstream in("csv.txt");
if (!in) {
cout << "Open error :(" << endl;
exit(EXIT_FAILURE);
}
for (std::string col; in;) {
col = get_csv_column(in),
cout << "<[" << col << "]>" << std::endl;
}
if (!in && !in.eof()) {
cout << "Read error :(" << endl;
exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);
}
它包圍每一列中<[...]>
,不貼現換行符和 包括終端「」與每個列:
文件csv.txt
是:
...,"abc, defghijk. "Lmnopqrs, "tuv,"" wxyz.",...,
",","",
Year,Make,Model,Description,Price,
1997,Ford,E350,"Super, ""luxurious"", truck",
1997,Ford,E350,"Super, ""luxurious"" truck",
1997,Ford,E350,"ac, abs, moon",3000.00,
1999,Chevy,"Venture ""Extended Edition""","",4900.00,
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00,
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00,
輸出是:
<[...,]>
<["abc, defghijk. "Lmnopqrs, "tuv,"" wxyz.",]>
<[...,]>
<[
",",]>
<["",]>
<[
Year,]>
<[Make,]>
<[Model,]>
<[Description,]>
<[Price,]>
<[
1997,]>
<[Ford,]>
<[E350,]>
<["Super, ""luxurious"", truck",]>
<[
1997,]>
<[Ford,]>
<[E350,]>
<["Super, ""luxurious"" truck",]>
<[
1997,]>
<[Ford,]>
<[E350,]>
<["ac, abs, moon",]>
<[3000.00,]>
<[
1999,]>
<[Chevy,]>
<["Venture ""Extended Edition""",]>
<["",]>
<[4900.00,]>
<[
1999,]>
<[Chevy,]>
<["Venture ""Extended Edition, Very Large""",]>
<[,]>
<[5000.00,]>
<[
1996,]>
<[Jeep,]>
<[Grand Cherokee,]>
<["MUST SELL!
air, moon roof, loaded",]>
<[4799.00]>
相關(接近重複):http://stackoverflow.com/a/1603175/179910 –