2010-03-04 45 views
5

我有一個字符串紅寶石解析字符串

input = "maybe (this is | that was) some ((nice | ugly) (day |night) | (strange (weather | time)))" 

如何在Ruby中最好的方法來分析這個字符串?

我的意思是腳本應該能夠建立sententes這樣的:

也許這是一些醜陋的夜晚

也許這是一些很好的夜晚

也許這是一些奇怪的時間

等等,你明白了...

我應該通過字符char讀取字符串char並使用堆棧建立狀態機來存儲括號值以供以後計算,還是有更好的方法?

也許一個現成的,開箱即用的圖書館用於這種目的?

回答

8

嘗試Treetop。描述語法的是類似Ruby的DSL。解析你給出的字符串應該很容易,通過使用真正的解析器,你可以很容易地在以後擴展你的語法。

一個例子語法爲要解析串的類型(保存爲sentences.treetop):

grammar Sentences 
    rule sentence 
    # A sentence is a combination of one or more expressions. 
    expression* <Sentence> 
    end 

    rule expression 
    # An expression is either a literal or a parenthesised expression. 
    parenthesised/literal 
    end 

    rule parenthesised 
    # A parenthesised expression contains one or more sentences. 
    "(" (multiple/sentence) ")" <Parenthesised> 
    end 

    rule multiple 
    # Multiple sentences are delimited by a pipe. 
    sentence "|" (multiple/sentence) <Multiple> 
    end 

    rule literal 
    # A literal string contains of word characters (a-z) and/or spaces. 
    # Expand the character class to allow other characters too. 
    [a-zA-Z ]+ <Literal> 
    end 
end 

語法上述需要一個伴隨文件,定義,使我們能夠訪問該節點值的類(另存爲sentence_nodes.rb)。

class Sentence < Treetop::Runtime::SyntaxNode 
    def combine(a, b) 
    return b if a.empty? 
    a.inject([]) do |values, val_a| 
     values + b.collect { |val_b| val_a + val_b } 
    end 
    end 

    def values 
    elements.inject([]) do |values, element| 
     combine(values, element.values) 
    end 
    end 
end 

class Parenthesised < Treetop::Runtime::SyntaxNode 
    def values 
    elements[1].values 
    end 
end 

class Multiple < Treetop::Runtime::SyntaxNode 
    def values 
    elements[0].values + elements[2].values 
    end 
end 

class Literal < Treetop::Runtime::SyntaxNode 
    def values 
    [text_value] 
    end 
end 

以下示例程序顯示解析您給出的例句非常簡單。

require "rubygems" 
require "treetop" 
require "sentence_nodes" 

str = 'maybe (this is|that was) some' + 
    ' ((nice|ugly) (day|night)|(strange (weather|time)))' 

Treetop.load "sentences" 
if sentence = SentencesParser.new.parse(str) 
    puts sentence.values 
else 
    puts "Parse error" 
end 

這個程序的輸出是:

maybe this is some nice day 
maybe this is some nice night 
maybe this is some ugly day 
maybe this is some ugly night 
maybe this is some strange weather 
maybe this is some strange time 
maybe that was some nice day 
maybe that was some nice night 
maybe that was some ugly day 
maybe that was some ugly night 
maybe that was some strange weather 
maybe that was some strange time 

您也可以訪問語法樹:

p sentence 

The output is here

你有它:一個可擴展的解析解決方案,應該在50行左右的代碼中完成你想做的事情。這有幫助嗎?

+0

謝謝,我已經閱讀了網上的例子,但我不明白我怎麼能讀嵌套圓括號...... – astropanic 2010-03-04 14:56:11

+0

謝謝你!你是我的英雄:) – astropanic 2010-03-04 19:55:34

+0

http://www.bestechvideos.com/2008/07/18/rubyconf-2007-treetop-syntactic-analysis-with-ruby,不錯的視頻 – astropanic 2010-03-05 06:37:23