2016-11-29 84 views
0

如何從數據幀序列中刪除空數據幀?在下面的代碼片段中,twoColDF中有很多空的數據框。對於下面的for循環還有一個問題,有沒有一種方法可以使這種效率更高?我試圖重寫這低於線,但沒有工作如何從scala中的數據幀序列中刪除空數據幀

//finalDF2 = (1 until colCount).flatMap(j => groupCount(j).map(y=> finalDF.map(a=>a.filter(df(cols(j)) === y)))).toSeq.flatten 

    var twoColDF: Seq[Seq[DataFrame]] = null 
if (colCount == 2 ) 
{ 
    val i = 0 
    for (j <- i + 1 until colCount) { 

     twoColDF = groupCount(j).map(y => { 
     finalDF.map(x => x.filter(df(cols(j)) === y)) 

    }) 

    } 
}finalDF = twoColDF.flatten 
+0

你的問題是很難理解:什麼是'finalDF'的類型?什麼是「groupCount」? –

回答

1

給定一組DataFrames的,您可以訪問每個數據幀的基本RDD和使用isEmpty過濾掉空的:

val input: Seq[DataFrame] = ??? 
val result = input.filter(!_.rdd.isEmpty()) 

由於對於你的其他問題 - 我不明白你的代碼試圖做什麼,但我首先嚐試將它轉換爲更多的功能功能(刪除使用var和命令條件)。如果我猜您輸入的意思,這裏的東西,可能是等同於你正在試圖做的:

var input: Seq[DataFrame] = ??? 

// map of column index to column values - 
// for each combination we'd want a new DF where that column has that value 
// I'm assuming values are Strings, can be anything else 
val groupCount: Map[Int, Seq[String]] = ??? 

// for each combination of DF + column + value - produce the filtered DF where this column has this value 
val perValue: Seq[DataFrame] = for { 
    df <- input 
    index <- groupCount.keySet 
    value <- groupCount(index) 
} yield df.filter(col(df.columns(index)) === value) 

// remove empty results: 
val result: Seq[DataFrame] = perValue.filter(!_.rdd.isEmpty())