我試圖從Spark數據框中將一些數據保存到S3存儲桶。這很簡單:什麼是AWSRequestMetricsFullSupport,如何關閉它?
dataframe.saveAsParquetFile("s3://kirk/my_file.parquet")
數據已成功保存,但UI很忙很長一段時間。我得到成千上萬的這樣的行:
2015-09-04 20:48:19,591 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[5C3211750F4FF5AB], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[63.827], HttpRequestTime=[62.919], HttpClientReceiveResponseTime=[61.678], RequestSigningTime=[0.05], ResponseProcessingTime=[0.812], HttpClientSendRequestTime=[0.038],
2015-09-04 20:48:19,610 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[709DA41540539FE0], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[18.064], HttpRequestTime=[17.959], HttpClientReceiveResponseTime=[16.703], RequestSigningTime=[0.06], ResponseProcessingTime=[0.003], HttpClientSendRequestTime=[0.046],
2015-09-04 20:48:19,664 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[1B1EB812E7982C7A], ServiceEndpoint=[https://kirk.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[54.36], HttpRequestTime=[54.26], HttpClientReceiveResponseTime=[53.006], RequestSigningTime=[0.057], ResponseProcessingTime=[0.002], HttpClientSendRequestTime=[0.034],
2015-09-04 20:48:19,675 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: AF6F960F3B2BF3AB), S3 Extended Request ID: CLs9xY8HAxbEAKEJC4LS1SgpqDcnHeaGocAbdsmYKwGttS64oVjFXJOe314vmb9q], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[AF6F960F3B2BF3AB], ServiceEndpoint=[https://kirk.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[10.111], HttpRequestTime=[10.009], HttpClientReceiveResponseTime=[8.758], RequestSigningTime=[0.043], HttpClientSendRequestTime=[0.044],
2015-09-04 20:48:19,685 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: F2198ACEB4B2CE72), S3 Extended Request ID: J9oWD8ncn6WgfUhHA1yqrBfzFC+N533oD/DK90eiSvQrpGH4OJUc3riG2R4oS1NU], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[F2198ACEB4B2CE72], ServiceEndpoint=[https://kirk.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[9.879], HttpRequestTime=[9.776], HttpClientReceiveResponseTime=[8.537], RequestSigningTime=[0.05], HttpClientSendRequestTime=[0.033],
我可以理解,如果一些用戶感興趣的記錄S3操作的等待時間,但有什麼辦法禁用任何和所有的監測和AWSRequestMetricsFullSupport
登錄?
當我檢查Spark UI時,它告訴我作業完成得相對較快,但控制檯充斥着這些消息很長一段時間。
對於上下文,我保存了1m行和500列的數據幀。大約需要20秒鐘才能保存,但延遲警告會出現在我的控制檯中> 20分鐘。 –