0
我有一個鍵/值對RDD如何pyspark轉換RDD爲稀疏矩陣
{(("a", "b"), 1), (("a", "c"), 3), (("c", "d"), 5)}
我怎麼能拿稀疏矩陣:
0 1 3 0
1 0 0 0
3 0 0 5
0 0 5 0
即
from pyspark.mllib.linalg import Matrices
Matrices.sparse(4, 4, [0, 2, 3, 5, 6], [1, 2, 0, 0, 3, 2], [1, 3, 1, 3, 5, 5])
或
import numpy as np
from scipy.sparse import csc_matrix
data = [1, 3, 1, 3, 5, 5]
indices = [1, 2, 0, 0, 3, 2]
indptr = [0, 2, 3, 5, 6]
csc_matrix((data, indices, indptr), shape=(4, 4), dtype=np.float)