2013-12-19 63 views
0

我的影片格式這樣的數據,作爲一個字符串:Ruby的正則表達式匹配字符模式和換行之間子

"1. Enloe Medical Center - 2,000 
2. CSU Chico - 1,805 
3. Walmart Distribution Center - 1,350 
4. Pacific Coast Producers (Agribusiness) - 1,200 
5. Marysville School District - 1,000 
6. Feather River Hospital - 865 
7. Sunsweet Growers (Agriculture) - 600 
8. YRC (Freight Services) - 500 
9. Sierra Pacific Industries (Lumber Products) - 500 
10. Colusa Casino Resort - 500" 

在Ruby應用程序,我想創建兩個數組:的一個每個編號列表標記和儀表板,以及包含儀表板和換行符(爲整數)之間的數字的其中一個子之間的串,就像這樣:

labels = ["Enloe Medical Center","CSU Chico","Walmart Distribution Center","Pacific Coast Producers (Agribusiness)","Marysville School District","Feather River Hospital","Sunsweet Growers (Agriculture)","YRC (Freight Services)","Sierra Pacific Industries (Lumber Products)","Colusa Casino Resort"] 

numbers = [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500] 

我不是我的正則表達式如此之大;我知道如何做替代和匹配,但我不確定從哪裏開始。誰能幫忙?

回答

3
labels, numbers = string.scan(/^\s*\d+\.\s+(.+)\s+-\s+([\d,]+)\s*$/).transpose 
numbers.map!{|s| s.gsub(",", "").to_i} 
+0

我喜歡你的方法真的很不錯,用一個更短的正則表達式把它變成一個單線程:-) labels,numbers = str.scan(/ \ d + \。\ s(。+)\ s- \(\ d。 *)$ /)。map {| label,number | [標籤,數字.gsub(「,」,「」).to_i]} .transpose – bjhaid

+0

@bjhaid我認爲你可以使用像這樣的東西寫得更少:'/ \ $(? \ d +)\。(? \ d +)/ =〜「$ 3.67」; #=> 0; dollars#=>「3」' –

+1

@DarekNędza我很困惑你的例子與討論中的問題無關 – bjhaid

0
s = "1. Enloe Medical Center - 2,000 
2. CSU Chico - 1,805 
3. Walmart Distribution Center - 1,350 
4. Pacific Coast Producers (Agribusiness) - 1,200 
5. Marysville School District - 1,000 
6. Feather River Hospital - 865 
7. Sunsweet Growers (Agriculture) - 600 
8. YRC (Freight Services) - 500 
9. Sierra Pacific Industries (Lumber Products) - 500 
10. Colusa Casino Resort - 500" 

arr1 = s.each_line.map { | x | 
    x.match(/- (.*)/)[ 1 ].gsub(/[^0-9]*/,'') 
} 

arr2 = s.each_line.map { | x | 
    x.match(/\d. (.*) - (.*)/)[ 1 ] 
} 

puts arr1 
puts arr2 
0
str = %{1. Enloe Medical Center - 2,000 
2. CSU Chico - 1,805 
3. Walmart Distribution Center - 1,350 
4. Pacific Coast Producers (Agribusiness) - 1,200 
5. Marysville School District - 1,000 
6. Feather River Hospital - 865 
7. Sunsweet Growers (Agriculture) - 600 
8. YRC (Freight Services) - 500 
9. Sierra Pacific Industries (Lumber Products) - 500 
10. Colusa Casino Resort - 500} 

numbers = str.scan(/-\ (\d.*)$/).flatten.map{|s| s.gsub(",", "").to_i} # => [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500] # !> assigned but unused variable - numbers 
labels = str.scan(/\d+\.\s(.*)\s-/).flatten # => ["Enloe Medical Center", "CSU Chico", "Walmart Distribution Center", "Pacific Coast Producers (Agribusiness)", "Marysville School District", "Feather River Hospital", "Sunsweet Growers (Agriculture)", "YRC (Freight Services)", "Sierra Pacific Industries (Lumber Products)", "Colusa Casino Resort"] # !> assigned but unused variable - labels 
0

你可以這樣做:

rawlines = <<EOF 
1. Enloe Medical Center - 2,000 
2. CSU Chico - 1,805 
3. Walmart Distribution Center - 1,350 
4. Pacific Coast Producers (Agribusiness) - 1,200 
5. Marysville School District - 1,000 
6. Feather River Hospital - 865 
7. Sunsweet Growers (Agriculture) - 600 
8. YRC (Freight Services) - 500 
9. Sierra Pacific Industries (Lumber Products) - 500 
10. Colusa Casino Resort - 500 
EOF 
labels = [] 
numbers = [] 
rawlines.scan(/^[0-9]+\. ([^-]+) - ([1-9][0-9]{0,2}(?>,[0-9]{3})*)/) do |label, number| 
    labels << label 
    numbers << number.gsub(",", "") 
end 
puts labels 
puts numbers 

注意模式([1-9][0-9]{0,2}(?>,[0-9]{3})*)的這部分可以通過([0-9,]+)

1

一兩件事,可以很容易被替換:

/pat/m - 將換行符視爲匹配的字符。

其他東西是分組(例如在第2部分)。

你寫的正則表達式爲1行,它適合整個字符串:

r1 = /\d+\,\d+\s*$/m 
str.scan r1 
["2,000 ", "1,805 ", "1,350 ", "1,200 ", "1,000 "] 

$匹配行結束
\d
+多少時間─>一個或多個
\s空間(0或更多次)
ps。既然你知道如何替換我沒有把它改成數字

r2 = /\d+\.\s*([\w\s]+)\s*\-/m 
str.scan(r2).flatten 

\d+ - 匹配號碼1次或更多次
\. - 匹配. - 你必須逃跑,因爲.匹配任何字符
​​- 空間0以上
[\w\s]+ - 任何文字字符或空格,1次或更多次
() - 你是分組,並且很容易的方式來表達我想這個這個包圍,更這裏:regexp ruby - capturing

+0

非常感謝您的深入解釋。非常有幫助。 –

相關問題