將字符串拆分爲基於標題的部分

我有一個名爲「Section 1」...「Section 20」的幾個部分的字符串，並且希望將這個字符串拆分爲這些單獨的部分。這裏有一個例子：將字符串拆分爲基於標題的部分

Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section

我想這個分成

["Section 1\n Text within this section, may contain the word section.\n\nAnd go in for quite a bit.", 
"Section 15 Another section"]

我感覺不得到它的權利相當愚蠢的。我的嘗試總是捕捉一切。現在我有

/(Section.+\d+$[\s\S]+)/

但我無法從中得到貪婪。

來源

2014-01-17 MattW.

一旦遇到「第1部分」，是否要捕獲其他所有內容？或者，你想忽略第20節之後的文字嗎？您想要在部分，*總是*緊隨其後的行，還是在段之間會有段落/空白行？ –

這個例子很清楚。他希望每個部分（標題+文本）都是數組。 – robertodecurnex

有幫助嗎？ –

在我看來，Regexp分裂文字如下：

/(?:\n\n|^)Section/

因此，代碼爲：

str = " 
Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section 
" 

newstr = str.split(/(?:\n\n|^)Section/, -1)[1..-1].map {|l| "Section " + l.strip } 
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.", "Section 15\nAnother section"]

來源

2014-01-17 16:46:08

對不起，每個部分中的文本更復雜，可能包含換行符等。我會更新它。 –

@MattW。我已更新答案 –

你可以使用這個表達式：

(?m)(Section\s*\d+)(.*?\1)$

Live demo

來源

2014-01-17 17:22:19 revo

我無法正常工作。我在最後忽略了「另一部分」，並給出奇怪的比賽 – robertodecurnex

@robertodecurnex你錯了。「另一部分」的意思是「第16部分」，例如，它雖然工作。 – revo

不，剛拿了樣本，並使用你的鏈接 - > http://www.rubular.com/r/euxXwqo03d – robertodecurnex

您可以使用scan與此正則表達式/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m

string.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m)

Section\s\d+\n將匹配任何節頭

(?:.(?!Section\s\d+\n))*將匹配任何東西，除了另一節頭。

m將使點匹配換行符太

sample = <<SAMPLE 
Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section 
SAMPLE 

sample.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m) 
#=> ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", "Section 15\nAnother section\n"]

來源

2014-01-17 18:18:39 robertodecurnex

我認爲最簡單的辦法是：

str = "Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section" 

str[/^Section 1.+/m] # => "Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n\nSection 15\nAnother section"

如果你在Section頭破段，開始以同樣的方式，然後取Enumerable的優勢slice_before：

str = "Stuff we don't care about 

Section 1 
Text within this section, may contain the word section. 

And go on for quite a bit. 

Section 15 
Another section" 

str[/^Section 1.+/m].split("\n").slice_before(/^Section \d+/m).map{ |a| a.join("\n") } 
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", 
#  "Section 15\nAnother section"]

slice_before文檔說：

爲每個分塊元素創建一個枚舉器。塊的開始由模式和塊定義。

來源

2014-01-17 19:06:49

請注意，第一行右側有逗號。示例中有2個元素。 – robertodecurnex

這隻會讓你更容易。謝謝。 –

將字符串拆分爲基於標題的部分

回答

相關問題