2015-10-20 54 views
-1

我試圖在RStudio 0.99.484和Hadoop-2.3.0(Windows版本)中使用MRkmeans。使用一個文件(其中包含755 * 1682實數值和21 MB的大小)作爲輸入數據,它已成功完成,但與另一個文件(包含4832 * 3952實數值和317 MB的大小)我有一些錯誤和map-reduce失敗,所有MR過程和錯誤如下所示。 是我的問題解決如果我們在rmr.options(backend.parameters)中使用更大的大小?如果是的話,我需要一個示例代碼。未能在較大的文件中運行R中的MRKmeans

rmr: DEPRECATED: Please use 'rm -r' instead. 
rmr: `/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/file10f06a465c65': No  such file or directory 
rmr: DEPRECATED: Please use 'rm -r' instead. 
rmr: `/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/file10f0634072aa': No such file or directory 
15/10/19 21:49:56 WARN zlib.ZlibFactory: Failed to load/initialize native- zlib library 
15/10/19 21:49:56 INFO compress.CodecPool: Got brand-new compressor [.deflate] 
packageJobJar: [/C:/tmp/hadoop-Koohi/hadoop-unjar740024213403447693/] []  C:\Users\SETUPC~1\AppData\Local\Temp\streamjob2283559356588490466.jar tmpDir=null 
15/10/19 21:54:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
15/10/19 21:54:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
15/10/19 21:54:12 INFO mapred.FileInputFormat: Total input paths to process : 1 
15/10/19 21:54:13 INFO mapreduce.JobSubmitter: number of splits:2 
15/10/19 21:54:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445275456322_0003 
15/10/19 21:54:15 INFO impl.YarnClientImpl: Submitted application application_1445275456322_0003 
15/10/19 21:54:15 INFO mapreduce.Job: The url to track the job:  http://Hamidreza:8088/proxy/application_1445275456322_0003/ 
15/10/19 21:54:15 INFO mapreduce.Job: Running job: job_1445275456322_0003 
15/10/19 21:54:34 INFO mapreduce.Job: Job job_1445275456322_0003 running in uber mode : false 
15/10/19 21:54:34 INFO mapreduce.Job: map 0% reduce 0% 
15/10/19 21:55:04 INFO mapreduce.Job: map 1% reduce 0% 
15/10/19 21:56:07 INFO mapreduce.Job: map 9% reduce 0% 
15/10/19 21:56:31 INFO mapreduce.Job: map 10% reduce 0% 
15/10/19 21:56:41 INFO mapreduce.Job: map 11% reduce 0% 
15/10/19 21:56:55 INFO mapreduce.Job: map 19% reduce 0% 
15/10/19 21:56:58 INFO mapreduce.Job: map 20% reduce 0% 
15/10/19 21:57:07 INFO mapreduce.Job: map 21% reduce 0% 
15/10/19 21:57:19 INFO mapreduce.Job: map 26% reduce 0% 
15/10/19 21:57:25 INFO mapreduce.Job: map 27% reduce 0% 
15/10/19 21:57:28 INFO mapreduce.Job: map 31% reduce 0% 
15/10/19 21:57:31 INFO mapreduce.Job: map 39% reduce 0% 
15/10/19 21:57:34 INFO mapreduce.Job: map 46% reduce 0% 
15/10/19 21:57:44 INFO mapreduce.Job: map 47% reduce 0% 
15/10/19 21:57:47 INFO mapreduce.Job: map 50% reduce 0% 
15/10/19 21:57:49 INFO mapreduce.Job: map 66% reduce 0% 
15/10/19 21:57:50 INFO mapreduce.Job: map 67% reduce 0% 
15/10/19 21:57:50 INFO mapreduce.Job: Task Id : attempt_1445275456322_0003_m_000000_0, Status : FAILED 
Container  [pid=container_1445275456322_0003_01_000002,containerID=container_1445275456322_0003_01_000002] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 1.3 GB of 2.1 GB virtual memory used. Killing container. 
Dump of the process-tree for container_1445275456322_0003_01_000002 : 
    |- PID CPU_TIME(MILLIS) VMEM(BYTES) WORKING_SET(BYTES) 
    |- 176 15 716800 2641920 
    |- 6680 17515 979025920 955031552 
    |- 5660 0 512000 1769472 
    |- 6288 31 1675264 2793472 
    |- 6976 11296 363868160 241926144 
    |- 2816 0 1736704 2416640 

Container killed on request. Exit code is 137 
Container exited with a non-zero exit code 137 

15/10/19 21:57:51 INFO mapreduce.Job: map 17% reduce 0% 
15/10/19 21:58:12 INFO mapreduce.Job: map 18% reduce 0% 
15/10/19 21:58:13 INFO mapreduce.Job: map 22% reduce 0% 
15/10/19 21:58:50 INFO mapreduce.Job: map 26% reduce 0% 
15/10/19 21:58:55 INFO mapreduce.Job: map 31% reduce 0% 
15/10/19 21:59:10 INFO mapreduce.Job: map 47% reduce 0% 
15/10/19 21:59:11 INFO mapreduce.Job: map 51% reduce 0% 
15/10/19 21:59:13 INFO mapreduce.Job: map 60% reduce 0% 
15/10/19 21:59:17 INFO mapreduce.Job: map 63% reduce 0% 
15/10/19 21:59:28 INFO mapreduce.Job: Task Id :  attempt_1445275456322_0003_m_000000_1, Status : FAILED 
Container [pid=container_1445275456322_0003_01_000004,containerID=container_1445275456322_0003_01_000004] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 1.3 GB of 2.1 GB virtual memory used. Killing container. 
Dump of the process-tree for container_1445275456322_0003_01_000004 : 
    |- PID CPU_TIME(MILLIS) VMEM(BYTES) WORKING_SET(BYTES) 
    |- 5420 0 716800 2641920 
    |- 1420 62 1671168 2785280 
    |- 5432 13531 375529472 302137344 
    |- 4016 15 507904 1765376 
    |- 4204 17125 971837440 951898112 
    |- 4208 15 1732608 2404352 

Container killed on request. Exit code is 137 
Container exited with a non-zero exit code 137 

15/10/19 21:59:29 INFO mapreduce.Job: map 30% reduce 0% 
15/10/19 21:59:35 INFO mapreduce.Job: map 33% reduce 0% 
15/10/19 21:59:53 INFO mapreduce.Job: map 34% reduce 0% 
15/10/19 21:59:56 INFO mapreduce.Job: map 50% reduce 0% 
15/10/19 22:00:03 INFO mapreduce.Job: map 72% reduce 0% 
15/10/19 22:00:06 INFO mapreduce.Job: map 83% reduce 0% 
15/10/19 22:00:16 INFO mapreduce.Job: map 100% reduce 0% 
15/10/19 22:00:16 INFO mapreduce.Job: Task Id : attempt_1445275456322_0003_m_000000_2, Status : FAILED 
Container   [pid=container_1445275456322_0003_01_000005,containerID=container_1445275456322_0003_01_000005] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 1.3 GB of 2.1 GB virtual memory used. Killing container. 
Dump of the process-tree for container_1445275456322_0003_01_000005 : 
    |- PID CPU_TIME(MILLIS) VMEM(BYTES) WORKING_SET(BYTES) 
    |- 5904 15 1732608 2412544 
    |- 6872 0 712704 2629632 
    |- 4664 14546 971898880 951922688 
    |- 3632 78 1667072 2785280 
    |- 6092 0 512000 1769472 
    |- 6924 13203 371974144 314916864 

Container killed on request. Exit code is 137 
Container exited with a non-zero exit code 137 
15/10/19 22:00:17 INFO mapreduce.Job: map 50% reduce 0% 
15/10/19 22:00:20 INFO mapreduce.Job: map 50% reduce 17% 
15/10/19 22:00:27 INFO mapreduce.Job: map 76% reduce 17% 
15/10/19 22:00:30 INFO mapreduce.Job: map 83% reduce 17% 
15/10/19 22:00:38 INFO mapreduce.Job: map 100% reduce 17% 
15/10/19 22:00:39 INFO mapreduce.Job: map 100% reduce 100% 
15/10/19 22:00:41 INFO mapreduce.Job: Job job_1445275456322_0003 failed with state FAILED due to: Task failed task_1445275456322_0003_m_000000 
Job failed as tasks failed. failedMaps:1 failedReduces:0 

15/10/19 22:00:45 INFO mapreduce.Job: Counters: 40 
File System Counters 
    FILE: Number of bytes read=0 
    FILE: Number of bytes written=79441152 
    FILE: Number of read operations=0 
    FILE: Number of large read operations=0 
    FILE: Number of write operations=0 
    HDFS: Number of bytes read=63636256 
    HDFS: Number of bytes written=0 
    HDFS: Number of read operations=5 
    HDFS: Number of large read operations=0 
    HDFS: Number of write operations=0 
Job Counters 
    Failed map tasks=4 
    Killed map tasks=1 
    Killed reduce tasks=1 
    Launched map tasks=6 
    Launched reduce tasks=1 
    Other local map tasks=4 
    Data-local map tasks=2 
    Total time spent by all maps in occupied slots (ms)=714657 
    Total time spent by all reduces in occupied slots (ms)=39170 
    Total time spent by all map tasks (ms)=714657 
    Total time spent by all reduce tasks (ms)=39170 
    Total vcore-seconds taken by all map tasks=714657 
    Total vcore-seconds taken by all reduce tasks=39170 
    Total megabyte-seconds taken by all map tasks=731808768 
    Total megabyte-seconds taken by all reduce tasks=40110080 
Map-Reduce Framework 
    Map input records=78 
    Map output records=56 
    Map output bytes=79348969 
    Map output materialized bytes=79349223 
    Input split bytes=93 
    Combine input records=0 
    Spilled Records=56 
    Failed Shuffles=0 
    Merged Map outputs=0 
    GC time elapsed (ms)=4670 
    CPU time spent (ms)=161251 
    Physical memory (bytes) snapshot=373673984 
    Virtual memory (bytes) snapshot=395513856 
    Total committed heap usage (bytes)=306708480 
File Input Format Counters 
    Bytes Read=63636163 
15/10/19 22:00:45 ERROR streaming.StreamJob: Job not Successful! 
Streaming Command Failed! 

Error in mr(map = map, reduce = reduce, combine = combine,  vectorized.reduce, : 
    hadoop streaming failed with error code 1 In addition: Warning message: 
running command '/hadoop-2.3.0/bin/hadoop jar /hadoop- 2.3.0/share/hadoop/tools/lib/hadoop-streaming-2.3.0.jar -D   "stream.map.input=typedbytes"  -D  "stream.map.output=typedbytes"  -D   "stream.reduce.input=typedbytes"  -D  "stream.reduce.output=typedbytes"   -D  "mapreduce.map.java.opts=-Xmx400M"  -D   "mapreduce.reduce.java.opts=-Xmx400M"  -files   "/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-local- env10f0780c2119,/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-global- env10f03b794070,/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-streaming- map10f06b4f59ee,/Users/SETUPC~1/AppData/Local/Temp/RtmpQ9MVgC/rmr-streaming-reduce10f054f5e9e"  -input  "/tmp/file10f08e55037"  -output   "/tmp/file10f03d086dcc"  -mapper  "Rscript --vanilla ./rmr-streaming- map10f06b4f59ee"  -reducer  "Rscript --vanilla ./rmr-streaming- reduce10f054f5e9e"  -inputformat   "org.apache.hadoop.streaming.AutoInputFormat"  -outputformat  "o [... truncated] 

回答

0

如果你指的是在包的測試目錄中的文件,它不是真正的意思的數據廣本,也不是很清楚,我當你有儘可能多的列行,你應該使用K均值,大約。如果您有k箇中心,D尺寸和P點,則您正在爲P點擬合kD參數。如果D和P的大小差不多,我認爲這不是一個統計上合理的程序。即使我錯了,數據也是按行分區的。列數沒有可擴展性。您需要查看不同的算法。目前尚不清楚您的目標數據大小。 300MB並不是真正的mapreduce尺寸。這種內存問題通常發生,因爲每個容器都將其所有內存分配給java進程,並且沒有或很少留給R進程。請參閱幫助(「hadoop.settings」)。

+0

感謝親愛的Piccolbo,默認情況下,單個節點hadoop爲容器分配400MB,我試圖使用更大的尺寸,但我在這個領域是新的,不知道如何改變「rmr.options(後端.parameters)「,以便爲容器分配更大的尺寸。 – Hamidreza

+0

這就是爲什麼我寫了幫助(「hadoop.settings」)。你不是唯一有這個問題的人。 – piccolbo