2017-06-17 81 views
1

我想迭代相同的代碼,用於像SAS一樣的不同宏集,然後附加所有填充在一起的表。由於我來自薩斯背景,我很困惑如何在Pyspark環境中做到這一點。任何幫助深表感謝!如何在SAS中像pyspark一樣循環宏?

實施例代碼如下:

STEP1:定義宏變量

lastyear_st=201615 
lastyear_end=201622 

thisyear_st=201715 
thisyear_end=201722 

STEP2:循環通過各種宏變量

代碼
customer_spend=sqlContext.sql(""" 
select a.customer_code, 
sum(case when a.week_id between %d and %d then a.spend else 0 end) as spend 
from tableA 
group by a.card_code 
""" 
%(lastyear_st,lastyear_end) 
(thisyear_st,thisyear_end)) 

STEP3:附加上述各填充數據集的到基礎表

回答

1
# macroVars are your start and end values arranged as list of list. 
# where each innner list contains start and end value 

macroVars = [[201615,201622],[201715, 201722]] 

# loop thru list of list ==> 
for start,end in macroVars: 

    # prepare query using the values of start and end 
    query = "SELECT a.customer_code,Sum(CASE\ 
    WHEN a.week_id BETWEEN {} AND {} \ 
    THEN a.spend \ 
    ELSE 0 END) \ 
    AS spend FROM tablea GROUP BY a.card_code".format(start,end) 

    # execute query 
    customer_spend = sqlContext.sql(query) 

    # depending on your base table setup use appropriate write command for example 

    customer_spend\ 
    .write.mode('append')\ 
    .parquet(os.path.join(tempfile.mkdtemp(), 'data')) 
+0

嗨普希卡,謝謝你。我也可以在列表中使用字符串值嗎?所以我的意思是,它可以是[['a','b','c'],[1,2,'x]]等等。 –

+0

是的,你也可以使用字符串 – Pushkr

+0

我也可以單獨定義一個宏變量出數組,並在數組中引用它,例如:a =「」「花> 0然後1 else 0結束」「」[[a ,1,2],[a,2,4]] –