問題與串溢出與strtok的

我有數據的文件：問題與串溢出與strtok的

C0001|H|Espresso Classics|The traditional espresso favourites. 
C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours. 
C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying. 
C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite. 
C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins. 
C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours. 
C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.

和下面的代碼來標記它：

#define MAX_CAT_TOK 4 
    #define DATA_DELIM "|" 
    char *token[100]; 

    for(i = 0; i < MAX_CAT_TOK; i++) 
    { 
     if(i == 0) token[i] = strtok(input, DATA_DELIM); 
     else token[i] = strtok(NULL, DATA_DELIM); 
     printf("%s\n", token[i]); 
    }

的問題是，一旦遵循一個較長的字符串的字符串打印時，較長字符串的數據將打印在較短字符串的末尾。我假設這與字符串匹配有關？

有人看到我在這裏做的錯嗎？

來源

2011-05-05 Chris

什麼在DATA_DELIM？ '#define DATA_DELIM「|」'或'#define DATA_DELIM「| \ n」'還是別的？你能否將你的代碼升級到從標準輸入中讀取完全可編譯的程序？ – 2011-05-05 22:44:47

除了（但並不重要）喬納森的建議是粘貼在預期的和實際的輸出，而不是試圖描述輸出。 – 2011-05-05 23:40:08

這是我的工作代碼：

#include <string.h> 
#include <stdio.h> 

//#define DATA_DELIM "|" 
#define DATA_DELIM "|\n" 

int main(void) 
{ 
    enum { LINE_LENGTH = 4096 }; 
    char input[LINE_LENGTH]; 
#define MAX_CAT_TOK 4 
    char *token[100]; 

    while (fgets(input, sizeof(input), stdin) != 0) 
    { 
     printf("Input: %s", input); 
     for (int i = 0; i < MAX_CAT_TOK; i++) 
     { 
      if (i == 0) 
       token[i] = strtok(input, DATA_DELIM); 
      else 
       token[i] = strtok(NULL, DATA_DELIM); 
      printf("%d: %s\n", i, token[i] != 0 ? token[i] : "<<NULL POINTER>>"); 
     } 
    } 
    return 0; 
}

在給定的數據，我得到：

Input: C0001|H|Espresso Classics|The traditional espresso favourites. 
0: C0001 
1: H 
2: Espresso Classics 
3: The traditional espresso favourites. 
Input: C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours. 
0: C0002 
1: H 
2: Espresso Espresions 
3: Delicious blend of espresso, milk, and luscious flavours. 
Input: C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying. 
0: C0003 
1: H 
2: Tea & Cocoa 
3: Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying. 
Input: C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite. 
0: C0004 
1: C 
2: Iced Chocolate 
3: Gloria Jean's version of a traditional favourite. 
Input: C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins. 
0: C0005 
1: C 
2: Mocha Chillers 
3: An icy blend of chocolate, coffee, milk and delicious mix-ins. 
Input: C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours. 
0: C0006 
1: C 
2: Espresso Chillers 
3: A creamy blend of fresh espresso, chocolate, milk, ice, and flavours. 
Input: C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice. 
0: C0007 
1: C 
2: On Ice 
3: Cool refreshing Gloria Jean's creations over ice.

隨着單字符分隔符字符串，我得到一個額外的換行符後，每行編號爲3.

Thi看起來很像你想要的東西。所以，或者你的輸入有問題（你在閱讀時是否迴應），或者你已經設法找到strtok()的片狀實現，或者你是在Windows上，數據線有回車以及換行符，並且由於流浪車迴歸，您看到誤導性輸出。

其中，我懷疑最後一個（Windows和流浪回車）是最有可能的 - 儘管即使使用DOS格式的數據文件我也無法重現該問題（使用GCC在MacOS X 10.6.7上進行測試4.6.0）。

來源

2011-05-05 23:51:02

採取這種方法解決了我的問題，謝謝。 – Chris 2011-05-07 02:39:38

我看不到任何錯誤。

我冒昧地編寫了一個可編譯的代碼版本，並將它放在ideone處。與您的版本比較...

#include <stdio.h> 
#include <string.h> 

int main(void) { 
    int i, j; 
    char *token[100]; 
    char *input; 
    char inputs[7][300] = { 
    "C0001|H|Espresso Classics|The traditional espresso favourites.", 
    "C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.", 
    "C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.", 
    "C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.", 
    "C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.", 
    "C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.", 
    "C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.", 
    }; 

    for (j = 0; j < 7; j++) { 
    input = inputs[j]; 
    for (i = 0; i < 4; i++) { 
     if (i == 0) { 
     token[i] = strtok(input, "|"); 
     } else { 
     token[i] = strtok(NULL, "|"); 
     } 
     printf("%s\n", token[i]); 
    } 
    } 
    return 0; 
}

來源

2011-05-05 22:43:08 pmg

如果他正確填充標記，這將會很有用。 OP中缺少需要知道的代碼。 – mattnz 2011-05-05 23:22:02

聽起來像是發生什麼事情是，您的緩衝區input不是正確地終止null。如果，也許它最初是全零，那麼處理的第一行就沒問題。如果一個更長的輸入存儲在它中，那麼它仍然會很好。但是當一個條目存儲在比之前更短的條目中（例如，在你的例子中是第四條線）時，如果它不是以NULL結尾，它可能會導致問題。

例如，如果通過memcpy複製了新數據並且未包含空終止字符，則該行中第4項的標記化將包括該先前的數據。

如果是這種情況，那麼解決方案是確保input正確結束。

下試圖證明什麼，我想說：

strcpy(input, "a|b|c|some long data"); tokenize(input); // where tokenize is the logic shown in the OP calling strtok // note the use of memcpy here rather than strcpy to show the idea // and also note that it copies exactly 11 characters (doesn't include the null) memcpy(input, "1|2|3|short", 11); tokenize(input);

在上面人爲的例子，在第二標記化第4項是：shortlong data。

編輯換句話說，問題似乎不在OP中顯示的代碼中。問題在於輸入是如何填充的。如果您在for循環之前添加printf以顯示正在分析的實際數據，您可能會發現它沒有正確的空終止。 4號線將有可能表明它包含了前行的遺留物：

printf("%s\n", input);

來源

2011-05-05 22:58:14

這是我遇到的確切問題。我正在使用strtok，它會自動添加空字符？ – Chris 2011-05-06 07:09:15

@Chris：是的，strtok在找到分隔符*的地方添加了null *。在你的情況下，它聽起來像字符串沒有正確終止*之前*做標記。當你把這個字符串放在'input'中時，確保它在結尾處有一個空值。 – 2011-05-06 12:08:01

問題與串溢出與strtok的

回答

相關問題