2013-02-08 31 views
0

我有一個數組稱這些對象的人:如何在Nokogiri :: XML :: Text對象中使用正則表達式?

Nokogiri::XML::Text:0x3fe41985e69c "CEO, Company_1" 
Nokogiri::XML::Text:0x3fe4194dab74 "COO, Company_2 " 
Nokogiri::XML::Text:0x3fe4195eb414 "CFO, Company_3" 

我想的對象在拆分「」所以我試圖做這樣的事情:

companies = people.each do | company | 
    company.inner_text.match("/, (.*)/") 
end 

和:

occupations = people.each do | occupation | 
    occupation.inner_text.match("/(.*),/") 
end 

match似乎並沒有提取我從對象所需的值。我檢查rubular.com,它應該工作,但我發現我放在同一個字符串:"CEO, Company_1"當它應該被分開,這樣occupations = [CEO, COO, CFO]companies = [Company_1, Company_2, Company_3]

我如何拆分這些對象呢?

+0

什麼是'company.inner_text'每家公司返回,什麼是'occupation.inner_text'每個職業回來了? –

+0

Nokogiri :: XML :: Text –

+0

我的意思是返回文本的樣本。它是'CEO,XYZ'嗎? –

回答

2

你爲什麼不split的文本?

require 'nokogiri' 

xml = '<x> 
<people>CEO, Company_1</people> 
<people>COO, Company_2</people> 
<people>CFO, Company_3</people> 
</x> 
' 

doc = Nokogiri::XML(xml) 
people = doc.search('people') 
companies = people.map do |company| 
    company.text.split(',') 
end 

pp companies 

=> [["CEO", " Company_1"], ["COO", " Company_2"], ["CFO", " Company_3"]] 

如果你想在公司前擺脫前導空格,使用方法:

companies = people.map do |company| 
    company.text.split(/,\s*/) 
end 
=> [["CEO", "Company_1"], ["COO", "Company_2"], ["CFO", "Company_3"]] 

或者:

companies = people.map do |company| 
    company.text.split(',').map(&:lstrip) 
end 
=> [["CEO", "Company_1"], ["COO", "Company_2"], ["CFO", "Company_3"]] 

或者使用map{ |s| s.sub(/^\s+/, '') }代替lstrip

見 「How to avoid joining all text from Nodes when scraping」 也。

+0

如果公司名稱包含逗號,那麼'split(/,\ s * /,2)'也可能有用。 – Casper

+0

謝謝,這個工作很好。 –

相關問題