2012-07-20 47 views
2

這是我第一次發佈問題。 因此,我正在做一個家庭作業計劃,並且對於一些我希望有人能夠幫助我的事情有點卡住。這是我在程序中需要做的事情:如何統計.txt文件中的多少個單詞?在C

  • 您的程序必須讀取含有標點符號的句子的文件。
  • 它會將句子解析爲單詞和標點符號。
  • 單詞將被輸入到一個字典和標點符號列表中。向詞典添加單詞時忽略大小寫。記住字典是按字典順序保存的。
  • 字典和列表中的每個條目都會計算單詞或標點符號在原始文本中出現的次數。
  • 閱讀文本(第一個字符爲$的行終止文本)後,打印出字典並列出計數。
  • 你的程序將在下一個讀取格式如下一行:字1 <字詞2
  • 這意味着在文本

我已經能夠進入文件(hw5輸入)和打印與WORD2取代WORD1它按照字典順序排除了大寫字母,我甚至有一個字數但不能打印在帶有字數的單獨行中。我仍然需要將這些單詞交換並再次打印該文件,但是用字數是我真正需要幫助的。以下是我迄今爲止:

#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <ctype.h> 

#define PUNCT " \n,\t!:;.-" 
#define MAX_STR_LEN 2048 

struct listNode 
{ 
    char *word; 
    struct listNode *next; 
    int wordCount; 
}; 

struct listNode *newListNode(const char * const); 
void insertWord(struct listNode *,const char * const); 
void deleteList(struct listNode *); 
void printList(struct listNode *); 

// Create new struct listNode 

struct listNode *newListNode(const char * const s) 
{ 
    struct listNode *n = 
     (struct listNode*)calloc(1,sizeof(struct listNode)); 
    n->word = (char *)calloc(strlen(s)+1,sizeof(*s)); 
    strcpy(n->word,s); 
    n->next = NULL; 
    n->wordCount = 1; 
    return n; 
} 

// Insert words into dictionary in ascending order 

void insertWord(struct listNode *head,const char * const s) 
{ 
    char *i; 
    int x = 0; 
    for(i = s; *i != '\0'; i++) { 
     *i = (char)tolower(*i); 
     x++; 
    } 

    i = i-x; 

// Gets rid of duplicate words and counts words 

    struct listNode *p = head, 
     *q = newListNode(i); 

    while ((p->next != NULL) && (strcmp(i,p->next->word) > 0)) 
    { 
     p = p->next; 
    } 
    if(p->next != NULL && strcmp(i,p->next->word) == 0) 
    { 
     p->next->wordCount++; 
    } else { 
     q->next = p->next; 
     p->next = q; 
    } 
} 

// Free all memory allocated for the list 

void deleteList(struct listNode *head) 
{ 
    struct listNode *p = head, *q; 
    while (p != NULL) 
    { 
     q = p->next; 
     free(p->word); 
     free(p); 
     p = q; 
    } 
} 

// Print the dictionary 

void printList(struct listNode *head) 
{ 
    struct listNode *p = head->next; 

    while (p != NULL) 
    { 
     printf("%s ",p->word); 
     p = p->next; 
    } 
    puts(""); 
} 

// Enter file and print words in lexicographic order 

int main(int argc, char *argv[]) 
{ 
    char line[MAX_STR_LEN], *s, fileName[MAX_STR_LEN]; 
    struct listNode *head = newListNode(""); 

    int i = 0; 
    char c; 

    FILE *p; 

    printf("Enter file name: "); 
    scanf("%s", fileName); 
     if((p = fopen(fileName, "r")) == NULL) 
     { 
      printf("File not found."); 
      return 0; 
     } 

    while((c = getc(p)) != '$') 
    { 
     line[i] = c; 
     i++; 
    } 

    line[i] = '\0'; 
    for(s = strtok(line,PUNCT); s != NULL; s = strtok(NULL,PUNCT)) 
    { 
     insertWord(head,s); 
    } 
    printf("Lexicographical order: "); 
    printList(head); 
    deleteList(head); 

    return 0; 
} 

和輸入文件(hw5輸入)是:

Call me Ishmael. Some years ago--never mind how long precisely-- 
having little or no money in my purse, and nothing particular 
to interest me on shore, I thought I would sail about a little 
and see the watery part of the world. It is a way I have 
of driving off the spleen and regulating the circulation. 
Whenever I find myself growing grim about the mouth; 
whenever it is a damp, drizzly November in my soul; whenever I 
find myself involuntarily pausing before coffin warehouses, 
and bringing up the rear of every funeral I meet; 
and especially whenever my hypos get such an upper hand of me, 
that it requires a strong moral principle to prevent me from 
deliberately stepping into the street, and methodically knocking 
people's hats off--then, I account it high time to get to sea 
as soon as I can. This is my substitute for pistol and ball. 
With a philosophical flourish Cato throws himself upon his sword; 
I quietly take to the ship. There is nothing surprising in this. 
If they but knew it, almost all men in their degree, some time 
or other, cherish very nearly the same feelings towards 
the ocean with me. 
$ 
substitute < replacement 
whale < zebra 
myself < oneself 

我需要它的下一行alphabeticly打印這樣的:

a - 4 
about - 1 
account - 1 
ago- 2 
and - 5 
etc.. 
+4

當使用谷歌搜索時,你應該尋找的關鍵詞是「tokeniser」或「tokenizer」 – user1438003 2012-07-20 19:35:21

+0

幾天前有人在做這件事,他們的代碼可能是可用的。 – 2012-07-20 19:35:45

+0

出於好奇:你受限於使用C嗎?我可能會用一些shell腳本執行這樣的任務 – stefan 2012-09-29 15:01:56

回答

0

微軟/亞馬遜類型的問題: 我會做一個僞代碼對你來說,C實現:

define a struct like: 

    struct node{ 
     int count; 
     char * word; 
     struct node *next; 
    }Node; 

    open the file for read; 
    for each line in the file do: 
     split the line, in other to have each work separately 
     for each work in the line do: 
      check if the work already exist in the list 
      if not, create a new node 
        node->word = word 
        node->count = 1 
      else: 
        node->count += 1 

    sort the list by the node->word param 

剩下的實施是鍛鍊!但是,如果你可以使用map你的生活會更簡單!

0

我能夠用我的單詞計數功能,所以我剛纔編輯的打印語句在我的printList功能:

printf("%15s (%2d)\n",p->word, p->wordCount); 

然後它能夠​​通過打印出字母排序,所有的字與字計數他們。

Enter file name: hw5-input 
Lexicographical order: 
       a (5) 
      about (2) 
     account (1) 
      ago (1) 
      all (1) 
     almost (1) 
      an (1) 
      and (7) 
      as (2) 
      ball (1) 
     before (1) 
     bringing (1) 
      but (1) 
      call (1) 
      can (1) 
      cato (1) 
     cherish (1) 
    circulation (1) 
     coffin (1) 
      damp (1) 
     degree (1) 
    deliberately (1) 
     driving (1) 
     drizzly (1) 
    especially (1) 
      every (1) 
     feelings (1) 
      find (2) 
     flourish (1) 
      for (1) 
      from (1) 
     funeral (1) 
      get (2) 
      grim (1) 
     growing (1) 
      hand (1) 
      hats (1) 
      have (1) 
     having (1) 
      high (1) 
     himself (1) 
      his (1) 
      how (1) 
      hypos (1) 
       i (9) 
      if (1) 
      in (4) 
     interest (1) 
      into (1) 
    involuntarily (1) 
      is (4) 
     ishmael (1) 
      it (5) 
      knew (1) 
     knocking (1) 
     little (2) 
      long (1) 
      me (5) 
      meet (1) 
      men (1) 
    methodically (1) 
      mind (1) 
      money (1) 
      moral (1) 
      mouth (1) 
      my (4) 
     myself (2) 
     nearly (1) 
      never (1) 
      no (1) 
     nothing (2) 
     november (1) 
      ocean (1) 
      of (4) 
      off (2) 
      on (1) 
      or (2) 
      other (1) 
      part (1) 
    particular (1) 
     pausing (1) 
     people's (1) 
    philosophical (1) 
     pistol (1) 
     precisely (1) 
     prevent (1) 
     principle (1) 
      purse (1) 
     quietly (1) 
      rear (1) 
    regulating (1) 
     requires (1) 
      sail (1) 
      same (1) 
      sea (1) 
      see (1) 
      ship (1) 
      shore (1) 
      some (2) 
      soon (1) 
      soul (1) 
     spleen (1) 
     stepping (1) 
     street (1) 
     strong (1) 
    substitute (1) 
      such (1) 
    surprising (1) 
      sword (1) 
      take (1) 
      that (1) 
      the (10) 
      their (1) 
      then (1) 
      there (1) 
      they (1) 
      this (2) 
     thought (1) 
     throws (1) 
      time (2) 
      to (5) 
     towards (1) 
      up (1) 
      upon (1) 
      upper (1) 
      very (1) 
    warehouses (1) 
     watery (1) 
      way (1) 
     whenever (4) 
      with (2) 
      world (1) 
      would (1) 
      years (1) 

仍然需要重新打印輸入文件與交換字雖然。任何人都有解決方案?

相關問題