這是我第一次發佈問題。 因此,我正在做一個家庭作業計劃,並且對於一些我希望有人能夠幫助我的事情有點卡住。這是我在程序中需要做的事情:如何統計.txt文件中的多少個單詞?在C
- 您的程序必須讀取含有標點符號的句子的文件。
- 它會將句子解析爲單詞和標點符號。
- 單詞將被輸入到一個字典和標點符號列表中。向詞典添加單詞時忽略大小寫。記住字典是按字典順序保存的。
- 字典和列表中的每個條目都會計算單詞或標點符號在原始文本中出現的次數。
- 閱讀文本(第一個字符爲$的行終止文本)後,打印出字典並列出計數。
- 你的程序將在下一個讀取格式如下一行:字1 <字詞2
- 這意味着在文本
我已經能夠進入文件(hw5輸入)和打印與WORD2取代WORD1它按照字典順序排除了大寫字母,我甚至有一個字數但不能打印在帶有字數的單獨行中。我仍然需要將這些單詞交換並再次打印該文件,但是用字數是我真正需要幫助的。以下是我迄今爲止:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define PUNCT " \n,\t!:;.-"
#define MAX_STR_LEN 2048
struct listNode
{
char *word;
struct listNode *next;
int wordCount;
};
struct listNode *newListNode(const char * const);
void insertWord(struct listNode *,const char * const);
void deleteList(struct listNode *);
void printList(struct listNode *);
// Create new struct listNode
struct listNode *newListNode(const char * const s)
{
struct listNode *n =
(struct listNode*)calloc(1,sizeof(struct listNode));
n->word = (char *)calloc(strlen(s)+1,sizeof(*s));
strcpy(n->word,s);
n->next = NULL;
n->wordCount = 1;
return n;
}
// Insert words into dictionary in ascending order
void insertWord(struct listNode *head,const char * const s)
{
char *i;
int x = 0;
for(i = s; *i != '\0'; i++) {
*i = (char)tolower(*i);
x++;
}
i = i-x;
// Gets rid of duplicate words and counts words
struct listNode *p = head,
*q = newListNode(i);
while ((p->next != NULL) && (strcmp(i,p->next->word) > 0))
{
p = p->next;
}
if(p->next != NULL && strcmp(i,p->next->word) == 0)
{
p->next->wordCount++;
} else {
q->next = p->next;
p->next = q;
}
}
// Free all memory allocated for the list
void deleteList(struct listNode *head)
{
struct listNode *p = head, *q;
while (p != NULL)
{
q = p->next;
free(p->word);
free(p);
p = q;
}
}
// Print the dictionary
void printList(struct listNode *head)
{
struct listNode *p = head->next;
while (p != NULL)
{
printf("%s ",p->word);
p = p->next;
}
puts("");
}
// Enter file and print words in lexicographic order
int main(int argc, char *argv[])
{
char line[MAX_STR_LEN], *s, fileName[MAX_STR_LEN];
struct listNode *head = newListNode("");
int i = 0;
char c;
FILE *p;
printf("Enter file name: ");
scanf("%s", fileName);
if((p = fopen(fileName, "r")) == NULL)
{
printf("File not found.");
return 0;
}
while((c = getc(p)) != '$')
{
line[i] = c;
i++;
}
line[i] = '\0';
for(s = strtok(line,PUNCT); s != NULL; s = strtok(NULL,PUNCT))
{
insertWord(head,s);
}
printf("Lexicographical order: ");
printList(head);
deleteList(head);
return 0;
}
和輸入文件(hw5輸入)是:
Call me Ishmael. Some years ago--never mind how long precisely--
having little or no money in my purse, and nothing particular
to interest me on shore, I thought I would sail about a little
and see the watery part of the world. It is a way I have
of driving off the spleen and regulating the circulation.
Whenever I find myself growing grim about the mouth;
whenever it is a damp, drizzly November in my soul; whenever I
find myself involuntarily pausing before coffin warehouses,
and bringing up the rear of every funeral I meet;
and especially whenever my hypos get such an upper hand of me,
that it requires a strong moral principle to prevent me from
deliberately stepping into the street, and methodically knocking
people's hats off--then, I account it high time to get to sea
as soon as I can. This is my substitute for pistol and ball.
With a philosophical flourish Cato throws himself upon his sword;
I quietly take to the ship. There is nothing surprising in this.
If they but knew it, almost all men in their degree, some time
or other, cherish very nearly the same feelings towards
the ocean with me.
$
substitute < replacement
whale < zebra
myself < oneself
我需要它的下一行alphabeticly打印這樣的:
a - 4
about - 1
account - 1
ago- 2
and - 5
etc..
當使用谷歌搜索時,你應該尋找的關鍵詞是「tokeniser」或「tokenizer」 – user1438003 2012-07-20 19:35:21
幾天前有人在做這件事,他們的代碼可能是可用的。 – 2012-07-20 19:35:45
出於好奇:你受限於使用C嗎?我可能會用一些shell腳本執行這樣的任務 – stefan 2012-09-29 15:01:56