2017-08-15 36 views
1

我有一個場景,我根據條件在scala中壓縮兩個列表。 它們可能不是按順序排列的。什麼是最好的方式來做到這一點?基於條件的壓縮非順序列表

我想將具有相同requestId的DirectRetailCM和DirectRetailCM分組爲一個元組。

object Main extends App { 
    case class SalesDoc(val id: Int, val name: String, val requestId: String) {} 
    val list = List(
    SalesDoc(1, "ILLEGAL", "1"), 
    SalesDoc(2, "DirectRetailCM", "1"), 

    SalesDoc(3, "DirectRetailOffsetInvoice", "2"), 
    SalesDoc(4, "DirectRetailCM", "2"), 
    SalesDoc(5, "OTHER", "2"), 

    SalesDoc(5, "DirectRetailCM", "LEFTOUT"), 
    SalesDoc(6, "ILLEGAL2", "4"), 

    SalesDoc(5, "OTHER", "3"), 
    SalesDoc(7, "DirectRetailOffsetInvoice", "4"), 
    SalesDoc(8, "DirectRetailCM", "4") 
) 

// I expect zip results of drOffsetInvoice and drCms as 
List(
    (SalesDoc(3, "DirectRetailOffsetInvoice", "2"), SalesDoc(4, "DirectRetailCM", "2")), 
    (SalesDoc(7, "DirectRetailOffsetInvoice", "4"), SalesDoc(8, "DirectRetailCM", "4")) 
) 
} 

,我能想到的初始方法是

  • 組directRetailCM - list.filter(E => e.name == 「DirectRetailCM」)
  • 組DirectRetailOffsetInvoice - list.filter (E => e.name == 「DirectRetailOffsetInvoice」)
  • 郵編兩個 - 但可能不會按順序
  • 有可能是沒有對應物的行

您能否建議我需要考慮的其他方法?

回答

1
// You don't need the val keyword for a case class 
case class SalesDoc(id: Int, name: String, requestId: String) 

val list = List(
    SalesDoc(1, "ILLEGAL", "1"), 
    SalesDoc(2, "DirectRetailCM", "1"), 

    SalesDoc(3, "DirectRetailOffsetInvoice", "2"), 
    SalesDoc(4, "DirectRetailCM", "2"), 
    SalesDoc(5, "OTHER", "2"), 

    SalesDoc(5, "DirectRetailCM", "LEFTOUT"), 
    SalesDoc(6, "ILLEGAL2", "4"), 

    SalesDoc(5, "OTHER", "3"), 
    SalesDoc(7, "DirectRetailOffsetInvoice", "4"), 
    SalesDoc(8, "DirectRetailCM", "4") 
) 

// Find all of the DirectRetailOffsetInvoice items 
val offsets = list.filter(_.name == "DirectRetailOffsetInvoice") 

// Map over all of the DirectRetailOffsetInvoice items and see if there is matching DirectRetailCM item 
val maybeMatched = offsets.map(offset => { 
    val maybeCm = list.find(i => i.requestId == offset.requestId && i.name == "DirectRetailCM") 

    // Return a tuple of type (SalesDoc, Option[SalesDoc]) 
    (offset, maybeCm) 
}) 

// Map over the tuples and only take the ones where there was a match, and extract it from the Option to create a tuple of (SalesDoc, SalesDoc) 
val output = maybeMatched.collect { case (s1, Some(s2)) => (s1, s2) } 

output.foreach(println) 
// (SalesDoc(3,DirectRetailOffsetInvoice,2),SalesDoc(4,DirectRetailCM,2)) 
// (SalesDoc(7,DirectRetailOffsetInvoice,4),SalesDoc(8,DirectRetailCM,4)) 
+0

感謝泰勒,我可能會使用一個更多的DirectRetailCM過濾器來避免每個DirectRetailOffsetInvoice循環。 –

+0

如果您的行比您的示例建議的多得多,那麼您可能需要構建地圖,以便查找速度更快。 – Tyler

1
list.filter(s => s.name == "DirectRetailCM" || s.name == "DirectRetailOffsetInvoice") 
    .groupBy(_.requestId) 
    .collect { case (_, List(a, b)) => (a, b) } 
    .toList 

// List[(SalesDoc, SalesDoc)] 
1

您可以用標準的Scala實現這個組合子

list 
    .filter(sd => sd.name == "DirectRetailCM" || sd.name == "DirectRetailOffsetInvoice") 
    .groupBy(_.requestId) 
    .flatMap { 
    case (_, List(a,b)) => List(a->b) 
    case _ => List.empty 
    } 

它給你:

res3: scala.collection.immutable.Map[SalesDoc,SalesDoc] = 
     Map(
     SalesDoc(3,DirectRetailOffsetInvoice,2) -> SalesDoc(4,DirectRetailCM,2), 
     SalesDoc(7,DirectRetailOffsetInvoice,4) -> SalesDoc(8,DirectRetailCM,4)) 

如果輸入序列未與DirectRetailOffsetInvoiceDirectRetailCM之前進行排序,你將需要處理它。