2016-06-23 67 views
9

如何從Spark Dataframe中的Row對象獲取值?

averageCount = (wordCountsDF 
       .groupBy().mean()).head() 

我得到

Row(avg(count)=1.6666666666666667)

但是當我嘗試:

averageCount = (wordCountsDF 
       .groupBy().mean()).head().getFloat(0) 

我收到以下錯誤:

AttributeError: getFloat --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in() 1 # TODO: Replace with appropriate code ----> 2 averageCount = (wordCountsDF 3 .groupBy().mean()).head().getFloat(0) 4 5 print averageCount

/databricks/spark/python/pyspark/sql/types.py in getattr(self, item) 1270 raise AttributeError(item) 1271
except ValueError: -> 1272 raise AttributeError(item) 1273 1274 def setattr(self, key, value):

AttributeError: getFloat

我在做什麼錯?

回答

18

我想通了。這將返回我的價值:

averageCount = (wordCountsDF 
       .groupBy().mean()).head()[0] 
1

數據幀行從namedtuples繼承(從集合庫),因此,儘管你可以索引他們像一個傳統的元組上面一樣的方式,你可能希望通過訪問它它的字段的名稱。畢竟,這是命名元組的重點,它對未來的變化也更加強大。就像這樣:

averageCount = wordCountsDF.groupBy().mean().head()['avg(jobs)'] 
7

這也適用於:

averageCount = (wordCountsDF 
       .groupBy().mean('count').collect())[0][0] 
print averageCount