在scala中過濾數據幀

假設我有一個使用案例類模式從文本文件創建的數據幀。以下是存儲在數據框中的數據。在scala中過濾數據幀

ID - 型 - QT - P

1，X，10，100.0

2，Y，20％，200.0

1，Y，15％，150.0

1， X，5，120.0

我需要通過「id」和Type篩選數據框。並且對於每個「id」迭代通過數據幀進行一些計算。我試過這種方式，但沒有奏效。代碼快照。

 case class MyClass(id: Int, type: String, qt: Long, PRICE: Double) 
    val df = sc.textFile("xyz.txt") 
    .map(_.split(",")) 
    .map(p => MyClass(p(0).trim.toInt, p(1), p(2).trim.toLong, p(3).trim.toDouble) 
    .toDF().cache() 

    val productList: List[Int] = df.map{row => row.getInt(0)}.distinct.collect.toList 
    val xList: List[RDD[MyClass]] = productList.map { 
      productId => df.filter({ item: MyClass => (item.id== productId) && (item.type == "X" })}.toList 
    val yList: List[RDD[MyClass]] = productList.map { 
      productId => df.filter({ item: MyClass => (item.id== productId) && (item.type == "Y" })}.toList

來源

2016-09-06 Advika

從您的示例中取出獨特的想法，只需遍歷所有ID並根據當前ID篩選DataFrame。在此之後，你有一個數據幀，只有相關數據：

val df3 = sc.textFile("src/main/resources/importantStuff.txt") //Your data here 
    .map(_.split(",")) 
    .map(p => MyClass(p(0).trim.toInt, p(1), p(2).trim.toLong, p(3).trim.toDouble)).toDF().cache() 

val productList: List[Int] = df3.map{row => row.getInt(0)}.distinct.collect.toList 

println(productList) 

productList.foreach(id => { 
    val sqlDF = df3.filter(df3("id") === id) 
    sqlDF.show() 
})

sqlDF在環路與相關數據的DF，以後就可以在其上運行你的計算。

來源

2016-09-06 11:14:17

謝謝。有效。我需要一個更多的幫助，如下所述： sqlDF.foreach（row => {caluculation（row）; row.getLong（2）= row.getLong（2） - x}）所以這裏用於每行計算後，我只需要上傳row.getLong（2）[單值而不是整列]，並保持其餘的數據框。你能建議如何做到這一點？ – Advika

@Advika我建議你打開一個新的線程問這個問題。首先它將允許更好地閱讀它，並將它暴露給更多的人。此外，由於答案解決了問題，請將其標記爲已回答。 –

在scala中過濾數據幀

回答

相關問題