2016-09-19 48 views
-1

我有一個文本文件的末尾:如何找到在一個文本文件中的表達和處理所有行,直到表達式的下一次出現,並重復,直到文件

Some comment on the 1st line of the file. 

processing date:   31.8.2016 
amount:     -1.23 
currency:    EUR 
balance:     1234.56 
payer reference:   /VS123456/SS0011223344/KS1212 
type of the transaction: Some type of the transaction 1 
additional info:   Amount: 1.23 EUR 29.08.2016 Place: 123456789XY 



processing date:   30.8.2016 
amount:     -2.23 
currency:    EUR 
balance:     12345.56 
payer reference:   /VS123456/SS0011223344/KS1212 
type of the transaction: Some type of the transaction 2 
additional info:   Amount: 2.23 EUR 28.08.2016 Place: 123456789XY 



processing date:   29.8.2016 
amount:     -3.23 
currency:    EUR 
balance:     123456.56 
payer reference:   /VS123456/SS0011223344/KS1212 
type of the transaction: Some type of the transaction 2 
additional info:   Amount: 2.23 EUR 27.08.2016 Place: 123456789XY 

我需要處理的文件,以便我將在右側的值中存儲在MySQL數據庫中的值爲31.8.2016,-1.23,EUR,1234.56等。

我只實現返回任一1次出現的其中包含一個特定的字符串或者使用findfind_all的所有行的行,但這是不夠的,因爲我不知何故需要確定塊開頭「處理日期:」和結束與「附加信息:」並處理那裏的值,然後處理下一個塊,然後處理,直到文件結束。

任何提示如何實現這一目標?

+0

可以通過多種方式完成,但最簡單的方法是將整個文件作爲字符串讀取,然後調用'.split(/^processing date /)',您將得到一個由日期開始的段列表並以在下一個項目之前出現的空白換行符結束。這很簡單,但如果你的文件很大,就可能會失敗,如千兆字節。 – quetzalcoatl

+0

你的問題爲時過早。您需要嘗試,然後當遇到問題時,請寫下關於該特定問題的詳細問題。請閱讀「[問]」和鏈接頁面,以及「[mcve]」。另外「[Stack Overflow用戶需要多少研究工作?](http://meta.stackoverflow.com/a/261593/128421)」將幫助你理解我們的期望。 –

回答

1

我這個啓動:

File.foreach('data.txt', "\n\n") do |li| 
    next unless li[/^processing/] 
    puts "'#{li.strip}'" 
end 

如果「data.txt中」包含您的內容,foreach將讀取該文件,並返回段落,而不是線條,文字的li。一旦你有了這些,你可以根據需要操縱它們。這非常快速且高效,並且不具有可擴展性問題readlines或任何基於I/O的I/O都可能具有的可擴展性問題。

這是輸出:

'processing date:   31.8.2016 
amount:     -1.23 
currency:    EUR 
balance:     1234.56 
payer reference:   /VS123456/SS0011223344/KS1212 
type of the transaction: Some type of the transaction 1 
additional info:   Amount: 1.23 EUR 29.08.2016 Place: 123456789XY' 
'processing date:   30.8.2016 
amount:     -2.23 
currency:    EUR 
balance:     12345.56 
payer reference:   /VS123456/SS0011223344/KS1212 
type of the transaction: Some type of the transaction 2 
additional info:   Amount: 2.23 EUR 28.08.2016 Place: 123456789XY' 
'processing date:   29.8.2016 
amount:     -3.23 
currency:    EUR 
balance:     123456.56 
payer reference:   /VS123456/SS0011223344/KS1212 
type of the transaction: Some type of the transaction 2 
additional info:   Amount: 2.23 EUR 27.08.2016 Place: 123456789XY' 

您可以通過包裝'看到該文件是在塊或"\n\n"劃定的段落則每塊被脫除尾隨空白被讀取。

有關更多信息,請參閱foreach文檔。

split(':', 2)是你的朋友:

'processing date:   31.8.2016'.split(':', 2) # => ["processing date", "   31.8.2016"] 
'amount:     -1.23'.split(':', 2) # => ["amount", "     -1.23"] 
'currency:    EUR'.split(':', 2) # => ["currency", "    EUR"] 
'balance:     1234.56'.split(':', 2) # => ["balance", "     1234.56"] 
'payer reference:   /VS123456/SS0011223344/KS1212'.split(':', 2) # => ["payer reference", "   /VS123456/SS0011223344/KS1212"] 
'type of the transaction: Some type of the transaction 1'.split(':', 2) # => ["type of the transaction", " Some type of the transaction 1"] 
'additional info:   Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'.split(':', 2) # => ["additional info", "   Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"] 

從,你可以做:

text = 'processing date:   31.8.2016 
amount:     -1.23 
currency:    EUR 
balance:     1234.56 
payer reference:   /VS123456/SS0011223344/KS1212 
type of the transaction: Some type of the transaction 1 
additional info:   Amount: 1.23 EUR 29.08.2016 Place: 123456789XY' 

text.lines.map{ |li| li.split(':', 2).map(&:strip) }.to_h 
# => {"processing date"=>"31.8.2016", "amount"=>"-1.23", "currency"=>"EUR", "balance"=>"1234.56", "payer reference"=>"/VS123456/SS0011223344/KS1212", "type of the transaction"=>"Some type of the transaction 1", "additional info"=>"Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"} 

有許多的辦法,繼續解析信息到更多的可用數據,但是這對你身材出。

+0

謝謝!我會檢查這一點。 – stacky33

+0

感謝您的提示!它幫助了我。 – stacky33

相關問題