2014-01-17 160 views
1

我有一個名爲「Section 1」...「Section 20」的幾個部分的字符串,並且希望將這個字符串拆分爲這些單獨的部分。這裏有一個例子:將字符串拆分爲基於標題的部分

Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section 

我想這個分成

["Section 1\n Text within this section, may contain the word section.\n\nAnd go in for quite a bit.", 
"Section 15 Another section"] 

我感覺不得到它的權利相當愚蠢的。我的嘗試總是捕捉一切。現在我有

/(Section.+\d+$[\s\S]+)/ 

但我無法從中得到貪婪。

+0

一旦遇到「第1部分」,是否要捕獲其他所有內容?或者,你想忽略第20節之後的文字嗎?您想要在部分,*總是*緊隨其後的行,還是在段之間會有段落/空白行? –

+0

這個例子很清楚。他希望每個部分(標題+文本)都是數組。 – robertodecurnex

+0

有幫助嗎? –

回答

0

在我看來,Regexp分裂文字如下:

/(?:\n\n|^)Section/ 

因此,代碼爲:

str = " 
Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section 
" 

newstr = str.split(/(?:\n\n|^)Section/, -1)[1..-1].map {|l| "Section " + l.strip } 
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.", "Section 15\nAnother section"] 
+0

對不起,每個部分中的文本更復雜,可能包含換行符等。我會更新它。 –

+0

@MattW。我已更新答案 –

0

你可以使用這個表達式:

(?m)(Section\s*\d+)(.*?\1)$ 

Live demo

+0

我無法正常工作。我在最後忽略了「另一部分」,並給出奇怪的比賽 – robertodecurnex

+0

@robertodecurnex你錯了。 「另一部分」的意思是「第16部分」,例如,它雖然工作。 – revo

+0

不,剛拿了樣本,並使用你的鏈接 - > http://www.rubular.com/r/euxXwqo03d – robertodecurnex

0

您可以使用scan與此正則表達式/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m

string.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m) 

Section\s\d+\n將匹配任何節頭

(?:.(?!Section\s\d+\n))*將匹配任何東西,除了另一節頭。

m將使點匹配換行符太

sample = <<SAMPLE 
Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section 
SAMPLE 

sample.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m) 
#=> ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", "Section 15\nAnother section\n"] 
0

我認爲最簡單的辦法是:

str = "Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section" 

str[/^Section 1.+/m] # => "Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n\nSection 15\nAnother section" 

如果你在Section頭破段,開始以同樣的方式,然後取Enumerable的優勢slice_before

str = "Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section" 

str[/^Section 1.+/m].split("\n").slice_before(/^Section \d+/m).map{ |a| a.join("\n") } 
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", 
#  "Section 15\nAnother section"] 

slice_before文檔說:

爲每個分塊元素創建一個枚舉器。塊的開始由模式和塊定義。

+0

請注意,第一行右側有逗號。示例中有2個元素。 – robertodecurnex

+0

這隻會讓你更容易。謝謝。 –