0
工作,我有一個RDD與RDD
[u'1,0,0,0,0,0,0,0,1,2013,52,0,4,1,0',
u'1,0,0,0,1,1,0,1,1,2012,49,1,1,0,1',
u'1,0,0,0,1,1,0,0,1,2012,49,1,1,0,1',
u'0,1,0,0,0,0,1,1,1,2014,45,0,0,1,0']
有了這個代碼
rdd = rdd.groupBy(lambda x: x.split(",")[9])
new_rdds = [sc.parallelize(x[1]) for x in rdd.collect()]
for x in new_rdds:
print x.collect()
我
[u'1,0,0,0,0,0,0,0,1,2013,52,0,4,1,0'],
[u'1,0,0,0,1,1,0,1,1,2012,49,1,1,0,1',
u'1,0,0,0,1,1,0,0,1,2012,49,1,1,0,1']
[ u'0,1,0,0,0,0,1,1,1,2014,45,0,0,1,0']
有沒有辦法讓只有特定的RDD,例如在x [9] = 2014
所以我可以得到
[u'0,1,0,0,0,0,1,1,1,2014,45,0,0,1,0']