2011-12-05 44 views
0

我試圖將雙引號之間的內容作爲賦值的一個標記進行計算。C++ scanner.h將雙引號之間的內容掃描爲一個標記:不會跳過引號內的空格

例如: 「世界你好」 = 1令牌 「你好」,「世界」 = 3級的令牌(因爲空間算作1令牌)

我創建的main.cpp和我添加「scanQuotesAsString」代碼給定的3個模塊:

  • scanner.cpp
  • scanner.h
  • scanpriv.h

現在,「hello world」會掃描2個令牌,而不會跳過這個空間。如果我添加(或skipspace,然後定期輸入諸如|打招呼世界|沒有引號跳過空格以及

我想我的問題是scanner.cpp,在過去的幾個功能:

/* 
* Private method: scanToEndOfIdentifier 
* Usage: finish = scanToEndOfIdentifier(); 
* ---------------------------------------- 
* This function advances the position of the scanner until it 
* reaches the end of a sequence of letters or digits that make 
* up an identifier. The return value is the index of the last 
* character in the identifier; the value of the stored index 
* cp is the first character after that. 
*/ 
int Scanner::scanToEndOfIdentifier() { 
    while (cp < len && isalnum(buffer[cp])) { 
     if ((stringOption == ScanQuotesAsStrings) && (buffer[cp] == '"')) 
      break; 
     cp++; 
    } 
    return cp - 1; 
} 


/* Private functions */ 
/* 
* Private method: scanQuotedString 
* Usage: scanQuotedString(); 
* ------------------- 
* This function advances the position of the scanner until the 
* current character is a double quotation mark 
*/ 
void Scanner::scanQuotedString() { 
    while ((cp < len && (buffer[cp] == '"')) || (cp < len && (buffer[cp] == '"'))){ 
     cp++; 
    } 

這裏是main.cc

#include "genlib.h" 
#include "simpio.h" 
#include "scanner.h" 
#include <iostream> 

/* Private function prototypes */ 

int CountTokens(string str); 

int main() { 
    cout << "Please enter a sentence: "; 
    string str = GetLine(); 

    int num = CountTokens(str); 
    cout << "You entered " << num << " tokens." << endl; 
    return 0; 
} 

int CountTokens(string str) { 

    int count = 0; 
    Scanner scanner;  // create new scanner object    
    scanner.setInput(str); // initialize the input to be scanned 

    //scanner.setSpaceOption(Scanner::PreserveSpaces); 
    scanner.setStringOption(Scanner::ScanQuotesAsStrings); 

    while (scanner.hasMoreTokens()) { // read tokens from the scanner 
     scanner.nextToken(); 
     count++; 
    } 
    return count; 
} 

這裏的scanner.cpp

/* 
* File: scanner.cpp 
* ----------------- 
* Implementation for the simplified Scanner class. 
*/ 
#include "genlib.h" 
#include "scanner.h" 
#include <cctype> 
#include <iostream> 
/* 
* The details of the representation are inaccessible to the client, 
* but consist of the following fields: 
* 
* buffer -- String passed to setInput 
* len -- Length of buffer, saved for efficiency 
* cp -- Current character position in the buffer 
* spaceOption -- Setting of the space option extension 
*/ 
Scanner::Scanner() { 
    buffer = ""; 
    spaceOption = PreserveSpaces; 
} 
Scanner::~Scanner() { 
/* Empty */ 
} 
void Scanner::setInput(string str) { 
    buffer = str; 
    len = buffer.length(); 
    cp = 0; 
} 
/* 
* Implementation notes: nextToken 
* ------------------------------- 
* The code for nextToken follows from the definition of a token. 
*/ 
string Scanner::nextToken() { 
    if (cp == -1) { 
     Error("setInput has not been called"); 
    } 
    if (stringOption == ScanQuotesAsStrings) scanQuotedString(); 
    if (spaceOption == IgnoreSpaces) skipSpaces(); 
    int start = cp; 
    if (start >= len) return ""; 
    if (isalnum(buffer[cp])) { 
     int finish = scanToEndOfIdentifier(); 
     return buffer.substr(start, finish - start + 1); 
    } 
    cp++; 
    return buffer.substr(start, 1); 
} 

bool Scanner::hasMoreTokens() { 
    if (cp == -1) { 
     Error("setInput has not been called"); 
    } 
    if (stringOption == ScanQuotesAsStrings) scanQuotedString(); 
    if (spaceOption == IgnoreSpaces) skipSpaces(); 
    return (cp < len); 
} 

void Scanner::setSpaceOption(spaceOptionT option) { 
    spaceOption = option; 
} 

Scanner::spaceOptionT Scanner::getSpaceOption() { 
    return spaceOption; 
} 

void Scanner::setStringOption(stringOptionT option) { 
    stringOption = option; 
} 

Scanner::stringOptionT Scanner::getStringOption() { 
    return stringOption; 
} 


/* Private functions */ 
/* 
* Private method: skipSpaces 
* Usage: skipSpaces(); 
* ------------------- 
* This function advances the position of the scanner until the 
* current character is not a whitespace character. 
*/ 
void Scanner::skipSpaces() { 
    while (cp < len && isspace(buffer[cp])) { 
     cp++; 
    } 
} 

    /* 
    * Private method: scanToEndOfIdentifier 
    * Usage: finish = scanToEndOfIdentifier(); 
    * ---------------------------------------- 
    * This function advances the position of the scanner until it 
    * reaches the end of a sequence of letters or digits that make 
    * up an identifier. The return value is the index of the last 
    * character in the identifier; the value of the stored index 
    * cp is the first character after that. 
    */ 
    int Scanner::scanToEndOfIdentifier() { 
     while (cp < len && isalnum(buffer[cp])) { 
      if ((stringOption == ScanQuotesAsStrings) && (buffer[cp] == '"')) 
       break; 
      cp++; 
     } 
     return cp - 1; 
    } 


    /* Private functions */ 
    /* 
    * Private method: scanQuotedString 
    * Usage: scanQuotedString(); 
    * ------------------- 
    * This function advances the position of the scanner until the 
    * current character is a double quotation mark 
    */ 
    void Scanner::scanQuotedString() { 
     while ((cp < len && (buffer[cp] == '"')) || (cp < len && (buffer[cp] == '"'))){ 
      cp++; 
     } 

scanner.h

/* 
* File: scanner.h 
* --------------- 
* This file is the interface for a class that facilitates dividing 
* a string into logical units called "tokens", which are either 
* 
* 1. Strings of consecutive letters and digits representing words 
* 2. One-character strings representing punctuation or separators 
* 
* To use this class, you must first create an instance of a 
* Scanner object by declaring 
* 
* Scanner scanner; 
* 
* You initialize the scanner's input stream by calling 
* 
* scanner.setInput(str); 
* 
* where str is the string from which tokens should be read. 
* Once you have done so, you can then retrieve the next token 
* by making the following call: 
* 
* token = scanner.nextToken(); 
* 
* To determine whether any tokens remain to be read, you can call 
* the predicate method scanner.hasMoreTokens(). The nextToken 
* method returns the empty string after the last token is read. 
* 
* The following code fragment serves as an idiom for processing 
* each token in the string inputString: 
* 
* Scanner scanner; 
* scanner.setInput(inputString); 
* while (scanner.hasMoreTokens()) { 
* string token = scanner.nextToken(); 
* . . . process the token . . . 
* } 
* 
* This version of the Scanner class includes an option for skipping 
* whitespace characters, which is described in the comments for the 
* setSpaceOption method. 
*/ 
#ifndef _scanner_h 
#define _scanner_h 
#include "genlib.h" 
/* 
* Class: Scanner 
* -------------- 
* This class is used to represent a single instance of a scanner. 
*/ 
class Scanner { 
public: 
/* 
* Constructor: Scanner 
* Usage: Scanner scanner; 
* ----------------------- 
* The constructor initializes a new scanner object. The scanner 
* starts empty, with no input to scan. 
*/ 
    Scanner(); 
/* 
* Destructor: ~Scanner 
* Usage: usually implicit 
* ----------------------- 
* The destructor deallocates any memory associated with this scanner. 
*/ 
    ~Scanner(); 
/* 
* Method: setInput 
* Usage: scanner.setInput(str); 
* ----------------------------- 
* This method configures this scanner to start extracting 
* tokens from the input string str. Any previous input string is 
* discarded. 
*/ 
    void setInput(string str); 
/* 
* Method: nextToken 
* Usage: token = scanner.nextToken(); 
* ----------------------------------- 
* This method returns the next token from this scanner. If 
* nextToken is called when no tokens are available, it returns the 
* empty string. 
*/ 
    string nextToken(); 
/* 
* Method: hasMoreTokens 
* Usage: if (scanner.hasMoreTokens()) . . . 
* ------------------------------------------ 
* This method returns true as long as there are additional 
* tokens for this scanner to read. 
*/ 
    bool hasMoreTokens(); 
/* 
* Methods: setSpaceOption, getSpaceOption 
* Usage: scanner.setSpaceOption(option); 
* option = scanner.getSpaceOption(); 
* ------------------------------------------ 
* This method controls whether this scanner 
* ignores whitespace characters or treats them as valid tokens. 
* By default, the nextToken function treats whitespace characters, 
* such as spaces and tabs, just like any other punctuation mark. 
* If, however, you call 
* 
* scanner.setSpaceOption(Scanner::IgnoreSpaces); 
* 
* the scanner will skip over any white space before reading a 
* token. You can restore the original behavior by calling 
* 
* scanner.setSpaceOption(Scanner::PreserveSpaces); 
* 
* The getSpaceOption function returns the current setting 
* of this option. 
*/ 
    enum spaceOptionT { PreserveSpaces, IgnoreSpaces }; 
    void setSpaceOption(spaceOptionT option); 
    spaceOptionT getSpaceOption(); 

/* 
* Methods: setStringOption, getStringOption 
* Usage: scanner.setStringOption(option); 
*  option = scanner.getStringOption(); 
* -------------------------------------------------- 
* This method controls how the scanner reads double quotation marks 
* as input. The default is set to treat quotes just like any other 
* punctuation character: 
* scanner.setStringOption(Scanner::ScanQuotesAsPunctuation); 
* 
* Otherwise, the option: 
* scanner.setStringOption(Scanner::ScanQuotesAsStrings); 
* 
* the token starting with a quotation mark will be scanned until 
* another quotation mark is found (closing quotation). Therefore 
* the entire string within the quotation, including both quotation 
* marks counts as 1 token. 
*/ 
    enum stringOptionT { ScanQuotesAsPunctuation, ScanQuotesAsStrings }; 

    void setStringOption(stringOptionT option); 
    stringOptionT getStringOption(); 


private: 

#include "scanpriv.h" 
}; 
#endif 

**終於scanpriv.h **

/* 
* File: scanpriv.h 
* ---------------- 
* This file contains the private data for the simplified version 
* of the Scanner class. 
*/ 

/* Instance variables */ 
string buffer; /* The string containing the tokens */ 
int len; /* The buffer length, for efficiency */ 
int cp; /* The current index in the buffer */ 
spaceOptionT spaceOption; /* Setting of the space option */ 
stringOptionT stringOption; 

/* Private method prototypes */ 
void skipSpaces(); 
int scanToEndOfIdentifier(); 
void scanQuotedString(); 

回答

3

長期閱讀。

兩種方式解析引用文字的:

0)國家

一個簡單的開關,告訴你是否在報價的權利,並激活一些特殊處理的報價。這基本上相當於#1),只是內聯。

1)分治的遞歸下降掃描儀

把狀態路程,寫掃描引用的文字一個單獨的規則。該代碼實際上是相當簡單的(C++靈感的P碼):

// assume we are one behind the opening quotation mark 
for (c : text) { 
    if (is_escape (*c)) { // to support stuff like "foo's name is \"bar\"" 
     p = peek(c); 
     if (!is_valid_escape_character (peek (c))) error; 
     else { 
      make the peeked character (*p) part of the result; 
      ++c; 
     } 
    } 
    else if (is_quotation_mark (*c)) 
    { 
     return the result; // we approached the end of the string 
    } 
    else if (!is_valid_character (*c)) 
    { 
     error; // maybe you want to forbid literal control characters 
    } 
    else 
    { 
     make *c part of the result 
    } 
} 
error; // reached end of input before closing quotation mark 

如果你不想這麼支持轉義字符,代碼變得更加簡單:

// assume we are one behind the opening quotation mark 
for (c : text) { 
    if (is_quotation_mark (*c)) 
     return the result; 
    else if (!is_valid_character (*c)) 
     error; 
    else 
     make *c part of the result 
} 
error; // reached end of input before closing quotation mark 

你不應該忽略檢查無論它是否爲無效字符,因爲這會引起用戶利用您的代碼並可能利用程序的未定義行爲。

+0

太好了,謝謝! – OverAir

0

快速瀏覽一下代碼:如果您處於ScanQuotesAsStrings模式,您希望沒有其他標記比引用的字符串;相反,區別應該是,當你看到一個以'"'開頭的令牌時,你可以去一個單獨的子掃描器。

在僞代碼(使用C++「結束迭代是一過去最端」成語):

current_token.begin = cursor; 
current_token.end = current_token.begin + 1; 
if(scan_quotes_as_strings && *current_token.begin == '"') { 
    while(*current_token.end && *current_token.end != '"') 
     ++current_token.end; 
    return; 
} 
while(*current_token.end && *current_token.end != ' ') 
    ++current_token.end; 

可以通過引入可變的狀態下,而不是表達這兩個環結合到一個單一的一個掃描儀狀態具有不同的代碼路徑。

此外,

while ((cp < len && (buffer[cp] == '"')) || (cp < len && (buffer[cp] == '"'))) ... 

只是看起來可疑。

+0

我認爲你應該檢查閱讀字符的有效性。 –

+0

非常感謝! (對不起,沒有足夠的代表點投票) – OverAir

+0

閱讀的字符的有效性是由解析的語言定義的,我對此一無所知。目前的要求是除非有引用,否則令牌是空格分隔的;也許他/她的語言使用Codepage 437輸入並接受字符串或標識符中的表情符號。 –