循環解析器

我想知道如何循環我做的解析器。我有幾個文本文件，我不知道該怎麼做。這是代碼。循環解析器

#include <stdio.h> 
#include <stdlib.h> 
#include <strings.h> 

int parse(char **argv) 
{ 
    /* code that converts a text file to a string called file_contents */ 

    char *target = NULL; 
    char *target2 = NULL; 
    char *start, *end; 

    const char *tag1 = "<item>"; 
    const char *tag2 = "</item>"; 

    if(start = strstr(file_contents, tag1)) 
    { 
     start += strlen(tag1); 
     if(end = strstr(start, tag2)) 
     { 
      target = (char *)malloc(end-start+1); 
      memcpy(target, start, end-start); 
      target[end - start] = '\0'; 
     } 

     const char *tag3 = "<title>"; 
     const char *tag4 = "</title>"; 

     if(start = strstr(target, tag3)) 
     { 
      start += strlen(tag3); 
      if(end = strstr(start, tag4)) 
      { 
       target2 = (char *)malloc(end-start+1); 
       memcpy(target2, start, end-start); 
       target2[end-start] = '\0'; 
       printf("%s\n", target2); 
      } 
     } 

     /* same code for other tags */ 

     } 
    } 

    free(target); 

    return 2; 
}

這是一個文本的示例。

<item> 
    <title>blah blah</title> 
    <otherTags>blah blah</otherTags> 
</item> <item> 
    <title>blah blah</title> 
    <otherTags>blah blah</otherTags> 
</item> <item> 
    <title>blah blah</title> 
    <otherTags>blah blah</otherTags> 
</item>

我的代碼只解析第一項。我是一個新手，所以引導我。謝謝。

來源

2015-05-03 estudyante

那麼，一個，你的標題說「循環」。這幾乎涉及所有情況......一個*循環*。嘗試一個？ – WhozCraig

我不知道該把它放在哪裏:(還有條件 – estudyante

我希望爲了你自己的緣故，你正在把解析器當做學習練習，因爲像你這樣的標記文件的解析器已經存在。只不過是簡化的XML，這意味着幾乎任何XML解析器都應該能夠處理它 –

它看起來像所有你需要做的是將您的if更改爲while，並保持指針在你走的時候在字符串中移動。我相信改變

if(start = strstr(file_contents, tag1))

到

start = file_contents; 
while(start = strstr(start, tag1))

會得到你想要的行爲（假設代碼工作的其餘部分）。只要你仍然從剩餘的字符串（從start開始）得到strstr的非NULL返回，它就會繼續循環。

正如我在我的評論中提到的，我還建議您查看遞歸解析，如果你想要的話;它似乎對您的情況會很好（免責聲明：我不是解析器專家）。除此之外，你的代碼看起來不錯，特別是對於一個自稱的新手！

編輯：看來，你的代碼需要一點調整，至少得到它的循環，我的建議去做。您應該避免複製字符串，只需以「嵌套」的方式遍歷它。只是重新安排你的if語句

//These really should be static or #define'd, but that's another post 
const char *tag1 = "<item>"; 
const char *tag2 = "</item>"; 
const char *tag3 = "<title>"; 
const char *tag4 = "</title>"; 

if(start = strstr(file_contents, tag1)) 
{ 
    start += strlen(tag1); 
    if(start = strstr(target, tag3)) 
    { 
     start += strlen(tag3); 
     if(end = strstr(start, tag4)) 
     { 
      target2 = (char *)malloc(end-start+1); 
      memcpy(target2, start, end-start); 
      target2[end-start] = '\0'; 
      printf("%s\n", target2); //Replacing this with fwrite would be faster 
            //with no malloc, but another post 
      free(target2); //Don't want to leak! 
     } //else, maybe return error code 
    } 

    /* same code for other tags */ 

    start = strstr(start, tag2); //Find end of <item> 
    start += strlen(tag2); //Goto remaining string 
}

如果這有效，那麼我前面提到的更改應該正確循環。如果你想堅持自己的方式，你需要一些其他的方式來跟蹤你的字符串的剩餘部分（你在評論中提到的strcpy可能工作，但這會增加很多開銷）。

來源

2015-05-03 13:25:55 sabreitweiser

它變成了一個無限循環，我應該將'end'複製到'file_contents'中嗎？或者這是一個壞主意？我正在考慮把'strcpy（file_contents，end）;'... – estudyante

I w這有點誤會你是如何在輸入過程中走過的。看起來您正在尋找''和''標籤，複製它們之間的任何內容，然後在您的新副本中查找嵌入式標籤。這不僅意味着我的代碼錯了，但我不知道這是一個好策略。在我看來，你應該找到''標籤，用它作爲查找嵌入式標籤的起點，然後找到嵌入式關閉標誌（打印出它們之間的標誌），然後找到你的''標籤;所有這些都在你的文件字符串中（使用'start'來跟蹤） – sabreitweiser

我將編輯我的答案以向你展示我的意思。 – sabreitweiser

回答

相關問題