如何在C中填充trie？

我正在嘗試編寫一個程序，它需要單詞並創建一個trie，每個節點的trie都是包含一個單獨字符的結構。如何在C中填充trie？

我有一個函數將char *解析爲單詞（假設char *僅包含小寫字母）。由於每個單詞都是從char *中獲取的，因此將其傳遞給函數addWordOccurrence(const char* word, const int wordLength, struct tNode root)。 addWordOccurrence()應該檢查單詞的第一個字母是否在root.branches[i]中，因爲我在循環中檢查每個可能的索引root.branches（對於字母表中的所有小寫字母都是0-25）。如果第一個字母不在root.branches中，則會創建一個包含新字母的新結構tNode。然後繼續到單詞的第二個字母比較它與新建結構的分支tNode等等......

我們嘗試的第一個單詞是「醫生」，我的特里採用第一個字母'd '並將其添加到root.branches[0]，然後將'o'添加到root.branches[0].branches[0]，這是正確的。但是，它將醫生的'd'添加到其分支的下17個索引（所以root.branches[0].branches[1] through [18]），這不應該是這種情況。請幫忙！

struct tNode{ 
    char c; 
    int occurrences; 
    struct tNode *branches; 
}; 

int addWordOccurrence(const char* word, const int wordLength, struct tNode root){ 
//declare fields 
int counter, i,k,firstNull; 
counter = 0; 
while(1){ 
    if(counter >= wordLength){ 
    break; 
    } 
    //traverse through the word letter by letter 
    for(i=0; i<wordLength; i++){ 
    //compare each letter to the branches of root until the letter is found or first null space 
    for(k=0; k<26; k++){ 
    //if the letter is a branch already set root to the struct of that letter in branches 
     if(root.branches[k].c == word[i]){ 
      root = root.branches[k]; 
      break; 
     } 
    } 
    //the current letter of the word is not in branches 
    //go through branches to find position to add the new tNode 
    for(firstNull=0; firstNull<26; firstNull++){ 
     //set firstNull equal to the index of the first null value in branches 
     if(root.branches[firstNull].c < 'a' || root.branches[firstNull].c > 'z'){ 
      break; 
     } 
    } 
    //add a new node to branches 
    root.branches[firstNull].c = word[i]; 
    root.branches[firstNull].occurrences = 0; 
    root.branches[firstNull].branches = malloc(sizeof(struct tNode) * 26); 
    if(counter != wordLength){ 
     root = root.branches[firstNull]; 
    } 
    counter++; 
    if(counter == wordLength-2){ 
     root.occurrences++; 
    } 
} 
} 
return 0; 
}

來源

2014-11-01 G. Robinson

你覺得第一個'break'在做什麼？我強烈的賭注是，這不是那樣做的。 – Gene 2014-11-02 00:54:25

最初在while循環結尾處的root.occurrences ++是在這個while之外，所以在讀完單詞的最後一個字母之後，它會增加'r'（如果單詞是'doctor'）tNode.occurrences最後一個字母添加了，但是當我調試它時，tNode.occurrence的值爲0，當它應該是1時，所以break是退出while循環...我改變了很多次，我是瘋狂地看着它，對此感到遺憾。 – 2014-11-02 01:31:50

一束與執行上的問題：

這是特里結構的一個奇怪的設計與具有字母的隨機排列。不得不在每個級別上對你想要的信件進行線性搜索，這首先會破壞執行trie的目的。
當你做root = root.branches[k];你正在創建一個變量的副本。現在在這種情況下可能會碰巧爲你工作，因爲通過指針訪問事物，但它實際上只是在尋求麻煩。
當你在循環中分配一個節點時，你不會初始化它，這意味着它充滿了垃圾/未知數據並導致問題。
你的實現是不必要的複雜，就像你的外環while (1)循環。

對於一個非常簡單的線索，我會做這樣的：

struct tNode { 
    bool isWord; 
    struct tNode *branches[26]; 
}; 

void addWordOccurrence (const char* word, const int wordLength, struct tNode* pRoot) { 
    int i; 
    int nodeIndex; 
    tNode* pCurrentNode = pRoot; 

    for (i = 0; i < wordLength; ++i) 
    { 
     nodeIndex = tolower(word[i]) - 'a'; 

     if (nodeIndex >= 0 && nodeIndex <= 25) 
     { 
      if (pCurrentNode->branches[nodeIndex] == NULL) 
      { 
       pCurrentNode->branches[nodeIndex] = calloc(1, sizeof(tNode)); 
      } 

      pCurrentNode = pCurrentNode->branches[nodeIndex]; 
     } 
    } 

    pCurrentNode->isWord = true; 
}

你可以使用struct tNode *branches;，但它實際上只是增加了一個分配步驟，你真的不需要。您使用字符的ASCII值將'a'和'branches[25]'分配爲'z'...不需要搜索真正殺死該特性的「空閒」點。最後，你需要一個終結者，如isWord，以便知道「醫生」是一個詞，而「docto」不是。

來源

2014-11-02 01:52:37 uesp

如何在C中填充trie？

回答

相關問題