pyspark：distinctCount - AnalysisException：U「解決不了‘X’給定的輸入列：

-1

，我有以下的數據幀：pyspark：distinctCount - AnalysisException：U「解決不了‘X’給定的輸入列：

Id | field_A | field_B | field_C | field_D 
1 | cat | 12  | black | 11 
1 | dog | 128  | white | 19 
2 | dog | 35  | yellow | 20 
2 | dog | 21  | brown | 4 
3 | bird | 10  | blue | 7 
4 | cow | 99  | brown | 34

我想只保留了行，其中ID具有distinctCount(field_A') = 1（也就是說，與動物的唯一的「一類」）的標識的最終結果應該是：

Id | field_A | field_B | field_C | field_D 
2 | dog | 35  | yellow | 20 
2 | dog | 21  | brown | 4 
3 | bird | 10  | blue | 7 
4 | cow | 99  | brown | 34

我開始用下面的方法：

myDF.groupBy(['Id']).agg(countDistinct('field_A')).alias('distinct_A_count').filter('distinct_A_count = 1').show(20,False)

然後我得到了以下錯誤：

AnalysisException: u"cannot resolve 'distinct_A_count' given input columns: [Id, count(field_A)];"

有誰知道我做錯了什麼？謝謝！

來源

2016-06-24 Edamame

我得到它的工作由withColumnRenamed的代替別名

myDF.groupBy(['Id']).agg(countDistinct('field_A')).withColumnRenamed('count(field_A)','distinct_A_count').filter('distinct_A_count = 1').show(20,False)

來源

2016-06-24 21:51:07 Edamame

pyspark：distinctCount - AnalysisException：U「解決不了‘X’給定的輸入列：

回答

相關問題