0
我有一個很大的spark sql語句,我試圖將其分成更小的塊以獲得更好的代碼可讀性。我不想加入它,只是合併結果。把大火花sql查詢分解成更小的查詢併合並它
當前工作SQL陳述式
val dfs = x.map(field => spark.sql(s"
select ‘test’ as Table_Name,
'$field' as Column_Name,
min($field) as Min_Value,
max($field) as Max_Value,
approx_count_distinct($field) as Unique_Value_Count,
(
SELECT 100 * approx_count_distinct($field)/count(1)
from tempdftable
) as perc
from tempdftable
」))
我試圖把下面的查詢出上面的SQL
(SELECT 100 * approx_count_distinct($field)/count(1) from tempdftable) as perc
與此邏輯 -
val Perce = x.map(field => spark.sql(s"(SELECT 100 * approx_count_distinct($field)/count(1) from parquetDFTable)"))
後來將此val Perce與下面的語句中的第一個大SQL語句合併,但它不起作用 -
val dfs = x.map(field => spark.sql(s"
select ‘test’ as Table_Name,
'$field' as Column_Name,
min($field) as Min_Value,
max($field) as Max_Value,
approx_count_distinct($field) as Unique_Value_Count,
'"+Perce+ "'
from tempdftable
」))
我們該怎麼寫?
謝謝Glennie!它有幫助,我接受這個答案,但我擅長SQL,並且有很少的表達式,我已經使用了RANK等分析函數,純粹是Spark,我不知道如何實現這些結果。 – sabby
導入'org.apache.spark.sql.functions._'會爲您提供大部分(??)的sql函數。包括'rank';) –
再次感謝Glennie!請問我可以從哪裏得到這些信息,我可以參考的任何文件都可以成爲Master:P :) – sabby