2
我在Spark上做了一些字符串處理。我的代碼片段:爲什麼相同任務集中的任務具有非常不同的執行時間?
val rdd = sc.objectFile[(String, String)]("some hdfs url", 1);
rdd.cache.count // let cache happen
val combOp = (f: List[String], g: List[String]) => {
for (x <- f) {
finder.processEntry(x)
}
for (x <- g) {
finder.processEntry(x)
}
finder.result
}
val res = rdd.mapPartitions(x => {
for (e<-x) {
finder.processEntry(e)
}
Iterator(finder.result)
}, true).reduce(combOp)
我擁有的數據集大約爲10GB。我在24核心機器上運行Spark,內存爲48GB。配置文件:
spark.driver.memory 1g
spark.executor.memory 30g
spark.executor.extraJavaOptions -Xloggc:/var/log/gcmemory.log -XX:+PrintGCDetails
spark.executor.cores 4
執行日誌片斷:
INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 10.60.1.143, ANY, 1642 bytes)
INFO BlockManagerMasterEndpoint: Registering block manager 10.60.1.143:42850 with 15.5 GB RAM, BlockManagerId(0, 10.60.1.143, 42850)
INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.60.1.143:42850 (size: 1766.0 B, free: 15.5 GB)
INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.60.1.143:42850 (size: 16.8 KB, free: 15.5 GB)
INFO BlockManagerInfo: Added rdd_1_3 in memory on 10.60.1.143:42850 (size: 219.7 MB, free: 15.3 GB)
INFO BlockManagerInfo: Added rdd_1_1 in memory on 10.60.1.143:42850 (size: 229.7 MB, free: 15.1 GB)
INFO BlockManagerInfo: Added rdd_1_2 in memory on 10.60.1.143:42850 (size: 221.5 MB, free: 14.9 GB)
INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 6345 ms on 10.60.1.143 (1/34)
INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 6351 ms on 10.60.1.143 (2/34)
INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 6354 ms on 10.60.1.143 (3/34)
INFO BlockManagerInfo: Added rdd_1_0 in memory on 10.60.1.143:42850 (size: 220.6 MB, free: 14.7 GB)
INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6454 ms on 10.60.1.143 (4/34)
INFO BlockManagerInfo: Added rdd_1_5 in memory on 10.60.1.143:42850 (size: 219.9 MB, free: 14.4 GB)
INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 2287 ms on 10.60.1.143 (5/34)
INFO BlockManagerInfo: Added rdd_1_4 in memory on 10.60.1.143:42850 (size: 222.7 MB, free: 14.2 GB)
INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, 10.60.1.143, ANY, 1642 bytes)
INFO BlockManagerInfo: Added rdd_1_6 in memory on 10.60.1.143:42850 (size: 210.7 MB, free: 14.0 GB)
INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 2350 ms on 10.60.1.143 (6/34)
INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 2356 ms on 10.60.1.143 (7/34)
INFO BlockManagerInfo: Added rdd_1_7 in memory on 10.60.1.143:42850 (size: 214.6 MB, free: 13.8 GB)
INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 2289 ms on 10.60.1.143 (8/34)
INFO BlockManagerInfo: Added rdd_1_8 in memory on 10.60.1.143:42850 (size: 216.3 MB, free: 13.6 GB)
INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 2430 ms on 10.60.1.143 (9/34)
INFO BlockManagerInfo: Added rdd_1_11 in memory on 10.60.1.143:42850 (size: 216.5 MB, free: 13.4 GB)
INFO BlockManagerInfo: Added rdd_1_10 in memory on 10.60.1.143:42850 (size: 216.5 MB, free: 13.2 GB)
INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 2416 ms on 10.60.1.143 (10/34)
INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 2445 ms on 10.60.1.143 (11/34)
INFO BlockManagerInfo: Added rdd_1_9 in memory on 10.60.1.143:42850 (size: 231.4 MB, free: 12.9 GB)
INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 2528 ms on 10.60.1.143 (12/34)
INFO BlockManagerInfo: Added rdd_1_12 in memory on 10.60.1.143:42850 (size: 217.3 MB, free: 12.7 GB)
INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 1797 ms on 10.60.1.143 (13/34)
INFO BlockManagerInfo: Added rdd_1_14 in memory on 10.60.1.143:42850 (size: 215.8 MB, free: 12.5 GB)
INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 1748 ms on 10.60.1.143 (14/34)
INFO BlockManagerInfo: Added rdd_1_13 in memory on 10.60.1.143:42850 (size: 220.9 MB, free: 12.3 GB)
INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 1812 ms on 10.60.1.143 (15/34)
INFO BlockManagerInfo: Added rdd_1_15 in memory on 10.60.1.143:42850 (size: 233.8 MB, free: 12.1 GB)
INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 1756 ms on 10.60.1.143 (16/34)
INFO BlockManagerInfo: Added rdd_1_16 in memory on 10.60.1.143:42850 (size: 221.6 MB, free: 11.9 GB)
INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 2600 ms on 10.60.1.143 (17/34)
在相同的任務集中的第一個運動員如何執行比後者更長的選手?很感謝任何形式的幫助。
我確定這些數據是均勻分區的,並且它運行在一臺SMP計算機上,因此沒有噪聲。我不知道它可能是JVM類加載器的開銷。 – Amos