6
我正在使用spark 1.6.1,並且我有這樣一個數據框。如何僅將「立方體」用於火花數據框上的特定字段?
+-------------+-----------+-----------------+-------+-------+-------+----------+-------+-------+-------+-------+
| scene_id| action_id| classifier|os_name|country|app_ver| p0value|p1value|p2value|p3value|p4value|
+-------------+-----------+-----------------+-------+-------+-------+----------+-------+-------+-------+-------+
| test_home|scene_enter| test_home|android| KR| 5.6.3|__OTHERS__| false| test| test| test|
......
而且我想通過使用多維數據集操作獲得如下數據框。
(所有領域分組,但只有「OS_NAME」,「國家」,「app_ver」字段立方)
+-------------+-----------+-----------------+-------+-------+-------+----------+-------+-------+-------+-------+---+
| scene_id| action_id| classifier|os_name|country|app_ver| p0value|p1value|p2value|p3value|p4value|cnt|
+-------------+-----------+-----------------+-------+-------+-------+----------+-------+-------+-------+-------+---+
| test_home|scene_enter| test_home|android| KR| 5.6.3|__OTHERS__| false| test| test| test| 9|
| test_home|scene_enter| test_home| null| KR| 5.6.3|__OTHERS__| false| test| test| test| 35|
| test_home|scene_enter| test_home|android| null| 5.6.3|__OTHERS__| false| test| test| test| 98|
| test_home|scene_enter| test_home|android| KR| null|__OTHERS__| false| test| test| test|101|
| test_home|scene_enter| test_home| null| null| 5.6.3|__OTHERS__| false| test| test| test|301|
| test_home|scene_enter| test_home| null| KR| null|__OTHERS__| false| test| test| test|225|
| test_home|scene_enter| test_home|android| null| null|__OTHERS__| false| test| test| test|312|
| test_home|scene_enter| test_home| null| null| null|__OTHERS__| false| test| test| test|521|
......
我試圖像下面,但它似乎是緩慢的和醜陋..
var cubed = df
.cube($"scene_id", $"action_id", $"classifier", $"country", $"os_name", $"app_ver", $"p0value", $"p1value", $"p2value", $"p3value", $"p4value")
.count
.where("scene_id IS NOT NULL AND action_id IS NOT NULL AND classifier IS NOT NULL AND p0value IS NOT NULL AND p1value IS NOT NULL AND p2value IS NOT NULL AND p3value IS NOT NULL AND p4value IS NOT NULL")
任何更好的解決方案? 請執行我的壞英語.. ^^;
在此先感謝..
謝謝,但'由'cube'操作@CarlosVilchez產生NULL'值... –