2017-04-04 48 views
0

我有具有以下數據的CSV數據文件:Ç解析逗號分隔值與換行符

H1,H2,H3 
a,"b 
c 
d",e 

當我通過Excel作爲CSV文件打開,所以能夠顯示與片材列標題爲H1, H2, H3和列值:a for H1

multi line value as 
b 
c 
d 
for H2 

c for H3 我需要解析使用C程序文件,並有值拿起這個樣子。 但是,我下面的代碼將無法正常工作,因爲我具備多線值的列:

char buff[200]; 
char tokens[10][30]; 
fgets(buff, 200, stdin); 
char *ptok = buff; // for iterating 
char *pch; 
int i = 0; 
while ((pch = strchr(ptok, ',')) != NULL) { 
    *pch = 0; 
    strcpy(tokens[i++], ptok); 
    ptok = pch+1; 
} 
strcpy(tokens[i++], ptok); 

如何修改這個代碼片段,以適應列多行值? 請不要被字符串緩衝區的硬編碼值所困擾,這是POC的測試代碼。 而不是任何第三方圖書館,我想從第一原則的艱難的方式做到這一點。 請幫忙。

+0

解析一個CSV文件*看起來很簡單,因爲有很多角落和特殊情況很難記住。或者很難處理。例如,如果多行字符串包含逗號?嘗試找到一個能替你處理的庫。 –

+0

對於初學者來說,你應該看看如何讓你的代碼可以多行讀取,並且使用'buff'是任意大小而不是限制爲199個字符。 –

+0

請不要被字符串緩衝區的硬編碼值所困擾,這是POC的測試代碼。而不是任何第三方庫,我想從第一個原則難以實現 –

回答

1

在解析主要併發症「良好形成」 CSV在C是精確可變長度字符串和數組您正在通​​過使用固定長度字符串和數組避免的處理。 (其他併發症的處理沒有很好地形成CSV)

沒有這些併發症,解析是很簡單的:

(未經測試)

/* Appends a non-quoted field to s and returns the delimiter */ 
int readSimpleField(struct String* s) { 
    for (;;) { 
    int ch = getc(); 
    if (ch == ',' || ch == '\n' || ch == EOF) return ch; 
    stringAppend(s, ch); 
    } 
} 

/* Appends a quoted field to s and returns the delimiter. 
* Assumes the open quote has already been read. 
* If the field is not terminated, returns ERROR, which 
* should be a value different from any character or EOF. 
* The delimiter returned is the character after the closing quote 
* (or EOF), which may not be a valid delimiter. Caller should check. 
*/ 
int readQuotedField(struct String* s) { 
    for (;;) { 
    int ch; 
    for (;;) { 
     ch = getc(); 
     if (ch == EOF) return ERROR; 
     if (ch == '"') { 
     ch = getc(); 
     if (ch != '"') break; 
     } 
     stringAppend(s, ch); 
    } 
    } 
} 

/* Reads a single field into s and returns the following delimiter, 
* which might be invalid. 
*/ 
int readField(struct String* s) { 
    stringClear(s); 
    int ch = getc(); 
    if (ch == '"') return readQuotedField(s); 
    if (ch == '\n' || ch == EOF) return ch; 
    stringAppend(s, ch); 
    return readSimpleField(s); 
} 

/* Reads a single row into row and returns the following delimiter, 
* which might be invalid. 
*/ 
int readRow(struct Row* row) { 
    struct String field = {0}; 
    rowClear(row); 
    /* Make sure there is at least one field */ 
    int ch = getc(); 
    if (ch != '\n' && ch != EOF) { 
    ungetc(ch, stdin); 
    do { 
     ch = readField(s); 
     rowAppend(row, s); 
    } while (ch == ','); 
    } 
    return ch; 
} 

/* Reads an entire CSV file into table. 
* Returns true if the parse was successful. 
* If an error is encountered, returns false. If the end-of-file 
* indicator is set, the error was an unterminated quoted field; 
* otherwise, the next character read will be the one which 
* triggered the error. 
*/ 
bool readCSV(struct Table* table) { 
    tableClear(table); 
    struct Row row = {0}; 
    /* Make sure there is at least one row */ 
    int ch = getc(); 
    if (ch != EOF) { 
    ungetc(ch, stdin); 
    do { 
     ch = readRow(row); 
     tableAppend(table, row); 
    } while (ch == '\n'); 
    } 
    return ch == EOF; 
} 

以上是「從第一原則「 - 它甚至不使用標準的C庫字符串函數。但需要一些努力來理解和驗證。就個人而言,我會使用(f)lex,甚至可能使用yacc/bison(儘管它有點矯枉過正)來簡化代碼並使預期的語法更加明顯。但是在C中處理變長結構仍然需要第一步。