解析示例文本文件和拆分它

我試圖去通過與它的彙編指令一個簡單的文本文件，它看起來像這樣解析示例文本文件和拆分它

TOP NOP 
VAL INT 0 
TAN LA 2,1

這只是一個小例子，所以我可以證明你是如何工作的。基本上，我將第一個標籤放在標籤中，然後是第二個標籤，它們是NOP，INT和LA，並將它們放入操作碼中。

之後，我將第一個參數（0和2）並將它們放在arg1中。但這裏是我的問題的用武之地。隨着當前代碼我有，輸出我得到的，當我把參數到字符串是這樣

TOP 
0 
2

很顯然，我想只得到最後兩是唯一的，但我怎樣才能讓TOP不會被我的第一個參數拋在那裏呢？

#include <string> 
#include <iostream> 
#include <cstdlib> 
#include <string.h> 
#include <fstream> 
#include <stdio.h> 

using namespace std; 

int main(int argc, char *argv[]) 
{ 
// If no extra file is provided then exit the program with error message 
if (argc <= 1) 
{ 
    cout << "Correct Usage: " << argv[0] << " <Filename>" << endl; 
    exit (1); 
} 

// Array to hold the registers and initialize them all to zero 
int registers [] = {0,0,0,0,0,0,0,0}; 

string memory [16000]; 

string Symtablelab[1000]; 
int Symtablepos[1000]; 

string line; 
string label; 
string opcode; 
string arg1; 
string arg2; 

// Open the file that was input on the command line 
ifstream myFile; 
myFile.open(argv[1]); 

if (!myFile.is_open()) 
{ 
    cerr << "Cannot open the file." << endl; 
} 

int counter = 0; 
int i = 0; 
int j = 0; 

while (getline(myFile, line, '\n')) 
{ 
    if (line[0] == '#') 
    { 
     continue; 
    } 

    if (line.length() == 0) 
    { 
     continue; 
    } 

    if (line[0] != '\t' && line[0] != ' ') 
    { 
     string delimeters = "\t "; 

     int current; 
     int next = -1; 

     current = next + 1; 
     next = line.find_first_of(delimeters, current); 
     label = line.substr(current, next - current); 

     Symtablelab[i] = label; 

     current = next + 1; 
     next = line.find_first_of(delimeters, current); 
     opcode = line.substr(current, next - current); 

     if (opcode != "WORDS" && opcode != "INT") 
     { 
      counter += 3; 
     } 

     if (opcode == "INT") 
     { 
      counter++; 
     } 

     delimeters = ", \n\t"; 
     current = next + 1; 
     next = line.find_first_of(delimeters, current); 
     arg1 = line.substr(current, next-current); 

     cout << arg1<<endl; 

     i++; 
    } 
}

來源

2012-09-16 cadavid4j

零長度線的檢查應循環內的第一個檢查。當你還不知道它是否存在時，你正在閱讀'line [0]' – jrok

感謝你。我改變了這一點。你有什麼想法如何解決我遇到的問題。 :) – cadavid4j

問題是尋找每個後續單詞的開頭：current = next + 1。您希望查找第一個非分隔符作爲單詞的開頭，並在查找參數之前檢查是否在行尾。

增加調試信息，我看到以下內容：

>> label: start=0 end=3 value="TOP" 
>> opcode: start=4 end=4 value="" 

>> label: start=0 end=3 value="VAL" 
>> opcode: start=4 end=4 value="" 

>> label: start=0 end=3 value="TAN" 
>> opcode: start=4 end=4 value=""

還告訴我，在每個操作碼試圖找到另一個分隔符。

問題是你只增加一個單詞後，下一行line.substr（）捕獲分隔符。

在開始後的查詢，更改：

current = next + 1;

到：

current = line.find_first_not_of(delimeters, next + 1);

這允許它尋找後的任何和所有的分隔符開始的下一個單詞。

此外，你想在參數的長度上進行參數查找，所以把它包裝在if(next >0) { ... }中。

這給了我，用我的調試和你原來的輸出（製造條件）：從主迴路

>> label: start=0 end=3 value="TOP" 
>> opcode: start=6 end=-1 value="NOP" 
>> label: start=0 end=3 value="VAL" 
>> opcode: start=6 end=9 value="INT" 
>> arg1: start=10 end=-1 value="0" 
0 
>> label: start=0 end=3 value="TAN" 
>> opcode: start=6 end=8 value="LA" 
>> arg1: start=9 end=10 value="2" 
2

重因素的分析/標記化，所以你可以專注於他們。你甚至可能想要獲得cppunit（或類似的）來幫助你測試你的解析函數。在沒有這樣的，它可以幫助你去一個地方，插入像調試信息：

cout << ">> " << whatIsBeingDebugged << ": " << start=" << current 
    << " end=" << next << " value= \"" << value << "\"" << endl;

製作一個強大的詞法分析器和語法分析器是許多圖書館（lex和yacc，flex和野牛等主題），可以是其他人的應用，如正則表達式，甚至是整個大學課程。這是工作。但是，只要有條理，徹底，並且單獨進行測試，例如用cppunit（或類似的）進行單元測試。

來源

2012-09-16 20:48:30

這只是切斷每個字符串的第一個字符，並留下我的輸出OP – cadavid4j

好吧，事情是，我得到正確的操作碼 – cadavid4j

哇，我得說，非常感謝你的時間和你的井想出來和很好的迴應。這正是我需要的，我感謝你的幫助。 – cadavid4j

使用這種技術有這麼多的弱點，你不會檢查任何結果。例如，當你說：

current = next + 1;

你應該已經知道，你只有一個項目之間的分隔符！否則，你應該通過所有項目，當你說

next = line.find_first_of(delimeters, current); 
<something> = line.substr(current, next - current)

您應該肯定的是，find_first_of找到的東西，否則會返回-1，並next - current將是負面的東西！

如果我想要做這個工作，我用regex，無論是從std或boost和使用正則表達式這個任務是小菜一碟，只需使用：

std::matches m; 
std::regex rx("\\s*(\\w+)\\s+(\\w+)(?:\\s+(\\d+)\\s*(?:,(\\d+))?)?"); 
if (std::regex_match(line, m, rx)) { 
    // we found a match here 
    string label = m.str(1); 
    string opcode = m.str(2); 
    string arg1 = m.str(3), arg2 = m.str(4) 
}

來源

2012-09-16 21:03:49 BigBoss

你能解釋一下正則表達式如何工作嗎？這對我來說看起來很奇怪 – cadavid4j

解釋了正則表達式的工作方式，我應該至少寫出幾百行，但這很容易，每一件事情都會被比較，除了一些標誌將與一些預定義的規則進行比較，例如\\ s表示任何空格，\\ s *表示0或更多\\ s，並且使用這個表示可以在我的行開始處顯示任意數量的空格（0或更多）。（）創建一個捕獲組並記住結果，因此你可以用m.str（i）指向它。它非常簡單，但你應該閱讀它的語法。它可能看起來很糟糕，但它可以幫你避免編寫可怕的代碼！ – BigBoss

解析示例文本文件和拆分它

回答

相關問題