2017-07-20 26 views
0
joindf.printSchema() 
root 
|-- order_customer_id: string (nullable = true) 
|-- order_date: string (nullable = true) 
|-- order_id: string (nullable = true) 
|-- order_status: string (nullable = true) 
|-- order_item_id: string (nullable = true) 
|-- order_item_order_id: string (nullable = true) 
|-- order_item_product_id: string (nullable = true) 
|-- order_item_product_price: string (nullable = true) 
|-- order_item_quantity: string (nullable = true) 
|-- order_item_subtotal: string (nullable = true) 



joindf.show(5) 
+-----------------+--------------------+--------+------------+-------------+-------------------+---------------------+------------------------+-------------------+-------------------+ 
|order_customer_id|   order_date|order_id|order_status|order_item_id|order_item_order_id|order_item_product_id|order_item_product_price|order_item_quantity|order_item_subtotal| 
+-----------------+--------------------+--------+------------+-------------+-------------------+---------------------+------------------------+-------------------+-------------------+ 
|   10153|2013-08-17 00:00:...| 4061| COMPLETE|  10153|    4080|     365|     59.99|     4|    239.96| 
|   10153|2014-01-12 00:00:...| 27596|  PENDING|  10153|    4080|     365|     59.99|     4|    239.96| 
|   10153|2014-07-18 00:00:...| 56604|  CLOSED|  10153|    4080|     365|     59.99|     4|    239.96| 
|   10153|2013-08-14 00:00:...| 58259| COMPLETE|  10153|    4080|     365|     59.99|     4|    239.96| 
|   10153|2013-08-14 00:00:...| 58269|  PENDING|  10153|    4080|     365|     59.99|     4|    239.96| 
+-----------------+--------------------+--------+------------+-------------+-------------------+---------------------+------------------------+-------------------+-------------------+ 

我在此RDD上使用combineByKey()生成結果,該結果給出了每天每個狀態的總訂單和總金額。 下面是代碼:使用combineByKey時的錯誤

joindf.map(lambda x: ((str(x[1]),str(x[3])),(float(x[9]),int(x[2])))) 
.combineByKey(lambda v: (v[0],set(v[1])) , 
       lambda acc,v: (acc[0]+v[0],v[1].add(acc[1])), 
       lambda acc1,acc2 : (acc1[0]+acc2[0],acc1[1].update(acc2[1]))) 

這是給錯誤。

TypeError: 'int' object is not iterable

我哪裏錯了?請幫助。

回答

0

你已經有一個數據幀,你不需要將其轉換爲RDD,並且執行操作。

據我所知你可以做如下,但是代碼是Scala中,你可以將其轉換到Python

joindf.groupBy(split($"order_date", " ")(0).as("order_date")) 
    .agg(sum($"order_item_quantity"), sum($"order_item_subtotal")) 

希望這有助於!

+0

這是否幫助了? –