2016-09-23 58 views
3

在斯卡拉,我想分析每個消息(長度= 20)爲單個單位。該消息將被附加到前一個消息的末尾,而不會換行符。我試過以下,但任何優化和提高性能,歡迎scala ..解析消息到各個字段

/* Length.. id=3,name=5,city=8,port=3,indicator=1 */ 

def layout(rec:String) = { 
val id=rec.take(3) 
val name=rec.drop(3).take(5) 
val city=rec.drop(3+5).take(8) 
val port=rec.drop(3+5+8).take(3) 
val ind=rec.drop(3+5+8+3).take(1) 
println(id,name,city,port,ind) 
} 

val messages="101Jim Portland990Y102JamesHouston 990X103John Boston 880Y" 
messages grouped(20) foreach { x => layout(x) } 


In REPL, 

scala> :load work.scala 
Loading work.scala... 
layout: (rec: String)Unit 
messages: String = 101Jim Portland990Y102JamesHouston 990X103John Boston 880Y 
(101,Jim ,Portland,990,Y) 
(102,James,Houston ,990,X) 
(103,John ,Boston ,880,Y) 

scala> 

回答

5

您可以用正則表達式做到這一點相當不錯:

val messages = "101Jim Portland990Y102JamesHouston 990X103John Boston 880Y" 

val RecordPattern = """(\d{3})(.{5})(.{8})(\d{3})(.)""".r 

val records = messages.grouped(20).map { 
    case RecordPattern(id, name, city, port, ind) => (id, name, city, port, ind) 
} 

然後:

scala> records.foreach(println) 
(101,Jim ,Portland,990,Y) 
(102,James,Houston ,990,X) 
(103,John ,Boston ,880,Y) 

這是也可能比使用像droptake這樣的收集操作分割字符串的效果更好,但差別很小,而主要的adva ntage是清晰的。

+0

.. @特拉維斯..感謝您的解決方案。假設消息是以ebcdic格式發送的,即來自Mainframe系統。我如何在ebcdic中讀取它並轉換爲ascii ?. – stack0114106

+0

@ stack0114106這應該是一個單獨的問題。 :) –