Groupby in sparkR沒有給出想要的結果

我已經創建了一個數據幀，其格式爲mtcars。我做了一組gear和cyl。然後我計算最大爲hp和disp。團隊中出現了問題，因爲那裏應該有8個團隊。而我只得到6組。Groupby in sparkR沒有給出想要的結果

library(sparkR) 
xx=as.DataFrame(sqlContext, data = mtcars) 

head(agg(groupBy(xx, "gear", "cyl"), hp = 'max')) 
    gear cyl max(hp) 
1 3 8  245 
2 5 4  113 
3 3 4  97 
4 4 4  109 
5 5 6  175 
6 3 6  110

更新1：

我有另一個查詢的groupby的文檔中，我們有一個例子是：

## Examples 

## Not run: 
    # Compute the average for all numeric columns grouped by department. 
    avg(groupBy(df, "department")) 

    # Compute the max age and average salary, grouped by department and gender. 
    agg(groupBy(df, "department", "gender"), salary="avg", "age" -> "max") 

## End(Not run)

同樣，對於mtcars我

agg(groupBy(xx, "gear", "cyl"), qsec ="avg", "disp" -> "max")

上來

首先，我的理解是，我們得到最大的disp，但代碼does not似乎工作。它發出如下錯誤。第二件事是代碼與=代替->。那麼有沒有錯字或什麼的。

unable to find an inherited method for function ‘groupBy’ for signature ‘"function"’

我的SparkR版本是SparkR_1.6.1。

來源

2016-12-07 Chirayu Chamoli

你的聚合效果很好，但是你首先添加一個'head'，它會向你顯示前6行。你需要用一個收集器來替換它。這樣的：

df <- as.DataFrame(mtcars) 
gp = agg(groupBy(df, df$gear, df$cyl), hp = 'max') 
collect(gp)

短短一句話，我使用的火花2.0.2

來源

2016-12-08 11:23:51

哦。我怎麼會錯過'頭部'。感謝「collect」的建議。 –

你能否看看更新。 –

請你告訴你的火花版本是什麼，如果「disp」 - >「max」的重要性只是爲了獲得disp變量的最大值，你只需要用'='來代替它。 –

Groupby in sparkR沒有給出想要的結果

回答

相關問題