0
我有以下RDD,每個記錄(BIGINT,載體)的元組:pyspark:擴大DenseVector到元組到RDD
myRDD.take(5)
[(1, DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432])),
(1, DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432])),
(0, DenseVector([5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0])),
(1, DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432])),
(1, DenseVector([9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432]))]
如何展開密集的載體,使其一部分一個元組?即我希望以上成爲:
[(1, 9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432),
(1, 9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432),
(0, 5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0),
(1, 9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432),
(1, 9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432)]
謝謝!
提示:'Vector'是可迭代的。其他一切都是一個基本的Python(參數拆包可能是有用的,但不是必需的)。 – zero323
謝謝zero323!我嘗試newRDD = myRDD.map(lambda x:(x [0],tuple(x [1]))),它確實將DenseVector展開爲一個元組,但我仍然在元組內部找到一個元組,如:(1, (1,9.2463,1.0,0.392,0.3381,162.6437,7.9432)),這個嵌套元組變成一個元組的任何提示?謝謝! – Edamame