使用Ruby on Rails在多個點中分割字符串

我在我的數據庫中有一個字符串，代表用戶的註釋。我想分割這個字符串，以便我可以將每個音符分隔爲內容，用戶和日期。使用Ruby on Rails在多個點中分割字符串

這裏是字符串格式：

"Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>"

我需要打入的

["Example Note", "Josh Test", "12:53 8/14/12", "Another example note", "John Doe", "12:00 PM 9/15/12", "Last Example Note", "Joe Smoe", "1:00 AM 10/12/12"]

數組我仍然嘗試這個本。任何想法都非常歡迎謝謝！ :)

來源

2013-05-31 user1977840

這不是字符串的格式，它是一個例子。那裏有多少變化？問另一種方式，你用什麼標準來拆分？ –

沒有變化每個音符將立即開始，然後內容將以''結尾，那麼名稱將始終以空格結尾，然後是數字。時間和日期與'開'分開，整個筆記總是以'

'結尾。沒有變化。 – user1977840

你可以使用正則表達式更簡單的方法。

s = "Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>" 
s.split(/\s+<i>|<\/i><br><br>\s?|(?<!on) (?=\d)/) 
=> ["Example Note", "Josh Test", "12:53 PM on 8/14/12", "Another example note", "John Doe", "12:00 PM on 9/15/12", " Last Example Note", "Joe Smoe", "1:00 AM on 10/12/12"]

datetime元素爲非格式，但也許可以單獨對它們應用某些格式。

編輯：刪除不必要的+字符。

來源

2013-05-31 20:24:49 depa

這就是我一直在尋找的地方謝謝你。我對Regexp很恐怖。肯定會不得不研究。 – user1977840

您可以使用Nokogiri解析出使用Xpath/CSS選擇器所需的文本。爲了給您提供裸機解析，讓你開始一個簡單的例子，下面的每一個i標籤映射作爲數組中的新元素：

require 'nokogiri' 

html = Nokogiri::HTML("Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>") 

my_array = html.css('i').map {|text| text.content} 
#=> ["Josh Test 12:53 PM on 8/14/12", "John Doe 12:00 PM on 9/15/12", "Joe Smoe :00 AM on 10/12/12"]

通過CSS選擇器，你可以很容易地做這樣的事情：

require 'nokogiri' 

html = Nokogiri::HTML("<h1>My Message</h1><p>Hi today's date is: <time>Firday, May 31st</time></p>") 
message_header = html.css('h1').first.content #=> "My Message" 
message_body = html.css('p').first.content #=> "Hi today's date is:" 
message_sent_at = html.css('p > time').first.content #=> "Friday, May 31st"

來源

2013-05-31 19:37:27 Noz

你是說html標籤應該已經存在，因爲我無法編輯數據庫的數據。這將永遠是我以前的方式，因爲這是爲100,000名用戶不幸保存的方式。我正試圖解決某人的錯誤。 – user1977840

@ user1977840這只是一個讓你開始的例子。只要HTML數據在數據庫中的結構方式有一些共同模式（例如，日期和名稱數據總是在標籤X之後和標籤Y之前），則可以根據需要定製Nokogiri選擇器以選擇和分析數據的相關部分。如果HTML格式不正確，那麼使用XSS選擇器可能會更好。 – Noz

也許這可能是有用的

require 'date' 
require 'time' 

text = "Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>" 

notes=text.split('<br><br>') 

pro_notes = [] 

notes.each do |note_e| 
    notes_temp = note_e.split('<i>') 
    words = notes_temp[1].split(' ') 

    temp = words[5].gsub('</i>','') 
    a = temp.split('/') 

    full_name = words[0] + ' ' + words[1] 
    nn = notes_temp[0] 
    dt = DateTime.parse(a[2] +'/'+ a[0] +'/'+ a[1] +' '+ words[2]) 

    pro_notes << [full_name, nn, dt] 
end

來源

2013-05-31 19:59:18 ajt

完美。我在那裏添加了一個條帶來擺脫空白區域，它的工作原理非常感謝！ :) – user1977840

使用Ruby on Rails在多個點中分割字符串

回答

相關問題