我有以下格式的json數據集,每行一個條目。Spark列中的數組的每個值的映射
{ "sales_person_name" : "John", "products" : ["apple", "mango", "guava"]}
{ "sales_person_name" : "Tom", "products" : ["mango", "orange"]}
{ "sales_person_name" : "John", "products" : ["apple", "banana"]}
{ "sales_person_name" : "Steve", "products" : ["apple", "mango"]}
{ "sales_person_name" : "Tom", "products" : ["mango", "guava"]}
我想知道誰賣了最大的芒果等等。 因此,我想將文件加載到數據框,併爲每個事務發出陣列中每個產品值的(鍵,值)(產品,名稱)對。
var df = spark.read.json("s3n://sales-data.json")
df.printSchema()
root
|-- sales_person_name: string (nullable = true)
|-- products: array (nullable = true)
var nameProductsMap = df.select("sales_person_name", "products").show()
+-----------------+--------------------+
|sales_person_name| products |
+-----------------+--------------------+
| John|[mango, apple,... |
| Tom|[mango, orange,... |
| John|[apple, banana... |
var resultMap = df.select("products", "sales_person_name")
.map(r => (r(1), r(0)))
.show() //This is where I am stuck.
我無法找出正確的方式爆炸()行(0),並有它的所有行(1)值一次發射值。任何人都可以提出一種方法謝謝!
給定示例的預期輸出是什麼? – Nyavro
芒果:約翰(4),湯姆(2),格雷格(1)...香蕉:湯姆(5),約翰(2)... – lazywiz
我想這樣的:var actorHashtagsMap = df.select(「products 「,」sales_person_name「)。map(r => {0} .map(x =>(x,r(1))) – lazywiz