爲PySpark數據幀

修改熊貓代碼我有下面的代碼片斷，其用於創建的曲線圖。我想修改它在PySpark中工作，但不知道如何繼續。問題是我無法迭代PySpark中的列，並且我沒有成功嘗試將它變成函數。爲PySpark數據幀

背景：據幀有一個名爲City列這是城市的只是名字作爲一個字符串

cities = [i.City for i in df.select('City').distinct().collect()] 

stack = [] 

for city in cities: 
    df = sqlContext.sql( 'SELECT Complaint Type, COUNT(*) as `counts` ' 
          'FROM c311 ' 
          'WHERE City = "{}" COLLATE NOCASE ' 
          'GROUP BY `Complaint Type` ' 
          'ORDER BY counts DESC'.format(city)) 

    stack.append(Bar(x=df['Complaint Type'], y=df.counts, name=city.capitalize()))

我的目標是再發送此toPandas()並在本地繪製它。不過，我自Column is not iterable以來遇到錯誤。我如何解決PySpark的問題？

來源

2016-12-12 aws_apprentice

你可以：

from pyspark.sql.functions import upper, col 

pdf = df.withColumn("city", upper(col("city"))) \ 
    .groupBy("Complaint Type").pivot("city").count() \ 
    .toPandas()

（或一組city和樞軸由type），並從那裏使用它。

來源

2016-12-13 13:54:24

爲PySpark數據幀

回答

相關問題