我JSON RDD中,我已經使用pprint印刷如下圖所示提取特定值
[u'{']
[u'"hash" ', u' "0000000000000000059134ebb840559241e8e2799f3ebdff56723efecfd6567a",']
[u'"confirmations" ', u' 969,']
[u'"size" ', u' 52543,']
[u'"height" ', u' 395545,']
[u'"version" ', u' 4,']
[u'"merkleroot" ', u' "8cf3eea32f692e5ebc9c25bb912ab3aff43c02761609d52cdd48afc5a05918fb",']
[u'"tx" ', u' [']
[u'"b3df3d5fedadd07a46753af556c336c41e038a9aec7ddd9921ad249828fd6d66",']
[u'"4ada431255d104c1c76ef56bdef4186ea89793223133e535383ff39d5a322910",']
我想提取第二個最後一個值[u'"b3df3d5fedadd07a46753af556c336c41e038a9aec7ddd9921ad249828fd6d66",']
如何獲得時,這個值索引不起作用。代碼如下
from pyspark.streaming import StreamingContext
import json
# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "txcount")
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream("localhost", 9999)
dump_rdd = lines.map(lambda x: json.dumps(x))
load_rdd = dump_rdd.map(lambda x: json.loads(x))
tx = load_rdd.map(lambda x: x.split(":"))
tx.pprint()
只是不要嘗試。您不應該以這種方式傳遞多行輸入。 – zero323
@ zero323比方說,如果這是單行,那麼你會如何提取它? – user2065276
@ zero323把你的答案,所以我可以接受它 – user2065276