1
使用pyspark
或sparkr
(最好是兩者),我怎麼能得到兩個DataFrame
列的交集?例如,在sparkr
我有以下DataFrames
:如何檢查火花中的兩個DataFrame列的交集
newHires <- data.frame(name = c("Thomas", "George", "George", "John"),
surname = c("Smith", "Williams", "Brown", "Taylor"))
salesTeam <- data.frame(name = c("Lucas", "Bill", "George"),
surname = c("Martin", "Clark", "Williams"))
newHiresDF <- createDataFrame(newHires)
salesTeamDF <- createDataFrame(salesTeam)
#Intersect works for the entire DataFrames
newSalesHire <- intersect(newHiresDF, salesTeamDF)
head(newSalesHire)
name surname
1 George Williams
#Intersect does not work for single columns
newSalesHire <- intersect(newHiresDF$name, salesTeamDF$name)
head(newSalesHire)
Error in as.vector(y) : no method for coercing this S4 class to a vector
我怎樣才能intersect
爲單一的列上工作?
在pyspark工作正常 'spark.createDataFrame([ 「一」, 「B」, 「X」],StringType() ).intersect(spark.createDataFrame([ 「Z」, 「Y」, 「X」],StringType()))' –