2011-03-12 53 views
3

我有以下文件和需要分析如何創建一個解析器(lex/yacc)?

--TestFile 
Start ASDF123 
Name "John" 
Address "#6,US" 
end ASDF123 

的行以--會註釋行處理。文件開始「開始」並以end結束。 Start之後的字符串是UserID,然後nameaddress將位於雙重內。

我需要解析文件並將解析的數據寫入xml文件。

所以生成的文件就會像

<ASDF123> 
    <Name Value="John" /> 
    <Address Value="#6,US" /> 
</ASDF123> 

我現在正在使用圖案匹配(Regular Expressions)來解析上述文件。這是我的示例代碼。

/// <summary> 
    /// To Store the row data from the file 
    /// </summary> 
    List<String> MyList = new List<String>(); 

    String strName = ""; 
    String strAddress = ""; 
    String strInfo = ""; 

方法:ReadFile的

/// <summary> 
    /// To read the file into a List 
    /// </summary> 
    private void ReadFile() 
    { 
     StreamReader Reader = new StreamReader(Application.StartupPath + "\\TestFile.txt"); 
     while (!Reader.EndOfStream) 
     { 
      MyList.Add(Reader.ReadLine()); 
     } 
     Reader.Close(); 
    } 

方法:FormateRowData

/// <summary> 
    /// To remove comments 
    /// </summary> 
    private void FormateRowData() 
    { 
     MyList = MyList.Where(X => X != "").Where(X => X.StartsWith("--")==false).ToList(); 
    } 

方法:ParseData

/// <summary> 
    /// To Parse the data from the List 
    /// </summary> 
    private void ParseData() 
    { 
     Match l_mMatch; 
     Regex RegData = new Regex("start[ \t\r\n]*(?<Data>[a-z0-9]*)", RegexOptions.IgnoreCase); 
     Regex RegName = new Regex("name [ \t\r\n]*\"(?<Name>[a-z]*)\"", RegexOptions.IgnoreCase); 
     Regex RegAddress = new Regex("address [ \t\r\n]*\"(?<Address>[a-z0-9 #,]*)\"", RegexOptions.IgnoreCase); 
     for (int Index = 0; Index < MyList.Count; Index++) 
     { 
      l_mMatch = RegData.Match(MyList[Index]); 
      if (l_mMatch.Success) 
       strInfo = l_mMatch.Groups["Data"].Value; 
      l_mMatch = RegName.Match(MyList[Index]); 
      if (l_mMatch.Success) 
       strName = l_mMatch.Groups["Name"].Value; 
      l_mMatch = RegAddress.Match(MyList[Index]); 
      if (l_mMatch.Success) 
       strAddress = l_mMatch.Groups["Address"].Value; 
     } 
    } 

方法:WriteFile的

/// <summary> 
    /// To write parsed information into file. 
    /// </summary> 
    private void WriteFile() 
    { 
     XDocument XD = new XDocument(
          new XElement(strInfo, 
             new XElement("Name", 
              new XAttribute("Value", strName)), 
             new XElement("Address", 
              new XAttribute("Value", strAddress)))); 
     XD.Save(Application.StartupPath + "\\File.xml"); 
    } 

我聽說過ParserGenerator

請幫我寫使用lex和yacc解析器。原因是,我現有的解析器(Pattern Matching)不靈活,更不是它的正確方式(我認爲是這樣)。

如何使用ParserGenerator(我讀過Code Project Sample OneCode Project Sample Two,但仍然不熟悉這一點)。請給我建議一些輸出C#解析器的解析器生成器。

回答

5

Gardens Point LEXGardens Point Parser Generator受LEX和YACC強烈影響,並輸出C#代碼。

你的語法很簡單,我認爲你現在的做法很好,但是想要學習「真正」做法的讚譽。 :-)所以這裏是我對語法的建議(只是生產規則;這遠不是一個完整的例子,實際的GPPG文件需要用C#代碼替代...來構建語法樹,並且需要令牌聲明等。閱讀文檔中的GPPG例子,你還需要描述令牌GPLEX文件):

/* Your input file is a list of "top level elements" */ 
TopLevel : 
    TopLevel TopLevelElement { ... } 
    | /* (empty) */ 

/* A top level element is either a comment or a block. 
    The COMMENT token must be described in the GPLEX file as 
    any line that starts with -- . */ 
TopLevelElement: 
    Block { ... } 
    | COMMENT { ... } 

/* A block starts with the token START (which, in the GPLEX file, 
    is defined as the string "Start"), continues with some identifier 
    (the block name), then has a list of elements, and finally the token 
    END followed by an identifier. If you want to validate that the 
    END identifier is the same as the START identifier, you can do that 
    in the C# code that analyses the syntax tree built by GPPG. 
    The token Identifier is also defined with a regular expression in GPLEX. */ 
Block: 
    START Identifier BlockElementList END Identifier { ... } 

BlockElementList: 
    BlockElementList BlockElement { ... } 
    | /* empty */ 

BlockElement: 
    (NAME | ADDRESS) QuotedString { ... } 
1

你需要首先定義文法爲您解析。(Yacc的部分)

看起來是這樣的:

file : record file 
    ; 

record: start identifier recordContent end identifier {//rule to match the two identifiers} 
     ; 

recordContent: name value; //Can be more detailed if you require order in the fields 

詞法分析將執行是法。我想你的正則表達式對於定義它們會很有用。

我的答案是一個粗略的草案,我建議你在網上找一個關於lex/yacc flex/bison的更完整的教程,如果你有更專注的問題,請回到這裏。

我也不知道是否有一個C#實現可以讓你保留一個託管代碼。您可能必須使用非託管C/C++導入。