2011-05-05 36 views
2

我有數據的文件:問題與串溢出與strtok的

C0001|H|Espresso Classics|The traditional espresso favourites. 
C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours. 
C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying. 
C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite. 
C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins. 
C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours. 
C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice. 

和下面的代碼來標記它:

#define MAX_CAT_TOK 4 
    #define DATA_DELIM "|" 
    char *token[100]; 

    for(i = 0; i < MAX_CAT_TOK; i++) 
    { 
     if(i == 0) token[i] = strtok(input, DATA_DELIM); 
     else token[i] = strtok(NULL, DATA_DELIM); 
     printf("%s\n", token[i]); 
    } 

的問題是,一旦遵循一個較長的字符串的字符串打印時,較長字符串的數據將打印在較短字符串的末尾。我假設這與字符串匹配有關?

有人看到我在這裏做的錯嗎?

+2

什麼在DATA_DELIM? '#define DATA_DELIM「|」'或'#define DATA_DELIM「| \ n」'還是別的?你能否將你的代碼升級到從標準輸入中讀取完全可編譯的程序? – 2011-05-05 22:44:47

+0

除了(但並不重要)喬納森的建議是粘貼在預期的和實際的輸出,而不是試圖描述輸出。 – 2011-05-05 23:40:08

回答

0

這是我的工作代碼:

#include <string.h> 
#include <stdio.h> 

//#define DATA_DELIM "|" 
#define DATA_DELIM "|\n" 

int main(void) 
{ 
    enum { LINE_LENGTH = 4096 }; 
    char input[LINE_LENGTH]; 
#define MAX_CAT_TOK 4 
    char *token[100]; 

    while (fgets(input, sizeof(input), stdin) != 0) 
    { 
     printf("Input: %s", input); 
     for (int i = 0; i < MAX_CAT_TOK; i++) 
     { 
      if (i == 0) 
       token[i] = strtok(input, DATA_DELIM); 
      else 
       token[i] = strtok(NULL, DATA_DELIM); 
      printf("%d: %s\n", i, token[i] != 0 ? token[i] : "<<NULL POINTER>>"); 
     } 
    } 
    return 0; 
} 

在給定的數據,我得到:

Input: C0001|H|Espresso Classics|The traditional espresso favourites. 
0: C0001 
1: H 
2: Espresso Classics 
3: The traditional espresso favourites. 
Input: C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours. 
0: C0002 
1: H 
2: Espresso Espresions 
3: Delicious blend of espresso, milk, and luscious flavours. 
Input: C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying. 
0: C0003 
1: H 
2: Tea & Cocoa 
3: Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying. 
Input: C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite. 
0: C0004 
1: C 
2: Iced Chocolate 
3: Gloria Jean's version of a traditional favourite. 
Input: C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins. 
0: C0005 
1: C 
2: Mocha Chillers 
3: An icy blend of chocolate, coffee, milk and delicious mix-ins. 
Input: C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours. 
0: C0006 
1: C 
2: Espresso Chillers 
3: A creamy blend of fresh espresso, chocolate, milk, ice, and flavours. 
Input: C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice. 
0: C0007 
1: C 
2: On Ice 
3: Cool refreshing Gloria Jean's creations over ice. 

隨着單字符分隔符字符串,我得到一個額外的換行符後,每行編號爲3.

Thi看起來很像你想要的東西。所以,或者你的輸入有問題(你在閱讀時是否迴應),或者你已經設法找到strtok()的片狀實現,或者你是在Windows上,數據線有回車以及換行符,並且由於流浪車迴歸,您看到誤導性輸出。

其中,我懷疑最後一個(Windows和流浪回車)是最有可能的 - 儘管即使使用DOS格式的數據文件我也無法重現該問題(使用GCC在MacOS X 10.6.7上進行測試4.6.0)。

+0

採取這種方法解決了我的問題,謝謝。 – Chris 2011-05-07 02:39:38

2

我看不到任何錯誤。

我冒昧地編寫了一個可編譯的代碼版本,並將它放在ideone處。與您的版本比較...

#include <stdio.h> 
#include <string.h> 

int main(void) { 
    int i, j; 
    char *token[100]; 
    char *input; 
    char inputs[7][300] = { 
    "C0001|H|Espresso Classics|The traditional espresso favourites.", 
    "C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.", 
    "C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.", 
    "C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.", 
    "C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.", 
    "C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.", 
    "C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.", 
    }; 

    for (j = 0; j < 7; j++) { 
    input = inputs[j]; 
    for (i = 0; i < 4; i++) { 
     if (i == 0) { 
     token[i] = strtok(input, "|"); 
     } else { 
     token[i] = strtok(NULL, "|"); 
     } 
     printf("%s\n", token[i]); 
    } 
    } 
    return 0; 
} 
+0

如果他正確填充標記,這將會很有用。 OP中缺少需要知道的代碼。 – mattnz 2011-05-05 23:22:02

3

聽起來像是發生什麼事情是,您的緩衝區input不是正確地終止null。如果,也許它最初是全零,那麼處理的第一行就沒問題。如果一個更長的輸入存儲在它中,那麼它仍然會很好。但是當一個條目存儲在比之前更短的條目中(例如,在你的例子中是第四條線)時,如果它不是以NULL結尾,它可能會導致問題

例如,如果通過memcpy複製了新數據並且未包含空終止字符,則該行中第4項的標記化將包括該先前的數據。

如果是這種情況,那麼解決方案是確保input正確結束。

下試圖證明什麼,我想說:

strcpy(input, "a|b|c|some long data"); 
tokenize(input); // where tokenize is the logic shown in the OP calling strtok 
// note the use of memcpy here rather than strcpy to show the idea 
// and also note that it copies exactly 11 characters (doesn't include the null) 
memcpy(input, "1|2|3|short", 11); 
tokenize(input); 

在上面人爲的例子,在第二標記化第4項是:shortlong data

編輯 換句話說,問題似乎不在OP中顯示的代碼中。問題在於輸入是如何填充的。如果您在for循環之前添加printf以顯示正在分析的實際數據,您可能會發現它沒有正確的空終止。 4號線將有可能表明它包含了前行的遺留物:

printf("%s\n", input); 
+0

這是我遇到的確切問題。我正在使用strtok,它會自動添加空字符? – Chris 2011-05-06 07:09:15

+0

@Chris:是的,strtok在找到分隔符*的地方添加了null *。在你的情況下,它聽起來像字符串沒有正確終止*之前*做標記。當你把這個字符串放在'input'中時,確保它在結尾處有一個空值。 – 2011-05-06 12:08:01