2016-12-12 81 views
-1

語境爲什麼DataFrameWriter.parquet()寫出沒有數據文件/分區,在其中

我試圖從星火1.6.1遷移到2.0.0星火。我的問題可能並不完全與Spark版本有關,也可能與壓縮格式有關。我正在嘗試讀取由Spark 1.6.1編寫的gzip格式壓縮的parquet文件。我需要添加一個額外的文字列並將其保存回磁盤。這個過程現在正在使用Spark 2.0.0進行。我注意到輸出包含大量只包含元數據的小文件。以前,當我加載這些parquet文件時,我使用的分區數量(df.rdd.partitions.size)與磁盤上parquet的分割數量一樣多。後來我意識到這是由於gzip格式不可分割。但是,我不明白爲什麼Spark將空分區寫回磁盤。 Spark 2.0默認快速壓縮拼花文件。

我所知道的,到目前爲止

  • gzip(1.6.1默認值)是一個非裂開的格式,將導致許多內存分區,磁盤上的分裂。
  • 使用snappy允許Spark使用可用的執行程序來加載數據並解釋分區數量的動態性質。
  • sc.hadoopConfiguration.set("parquet.enable.summary-metadata", "true")
  • 我也關閉了模式合併。

問題

  • 爲什麼星火編寫空文件到硬盤?
  • 我該如何讓Spark將分區大小優化爲每個文件128MB? (我知道這是可以通過再分配來實現/聚結。我需要計算的參數雖然重新分區)

命令從上市中所使用的兩個目錄驗證我的索賠

HADOOP_CONF_DIR=/etc/hive/conf /home/srikar/spark-2.0.0/bin/spark-shell --master yarn --deploy-mode client --driver-class-path '/etc/hive/conf' --num-executors 100 --executor-memory 6g --driver-memory 8g 

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111) 
Type in expressions to have them evaluated. 
Type :help for more information. 

scala> val df = spark.read.parquet("/user/srikary/data/2016/07/05") 
df: org.apache.spark.sql.DataFrame = [user_uuid: string, client_uuid: string ... 2 more fields] 

scala> df.rdd.partitions.size 
res1: Int = 95 

scala> df.write.parquet("/user/srikary/test/partitions_test") 
輸出測試上述
[email protected]:~$ hadoop fs -ls /user/srikary/data/2016/07/05 
Found 21 items 
-rw-r--r-- 3 srikar srikar   0 2016-10-24 01:09 /user/srikary/data/2016/07/05/_SUCCESS 
-rw-r--r-- 3 srikar srikar  473 2016-10-24 01:09 /user/srikary/data/2016/07/05/_common_metadata 
-rw-r--r-- 3 srikar srikar  12303 2016-10-24 01:09 /user/srikary/data/2016/07/05/_metadata 
-rw-r--r-- 3 srikar srikar 34576052 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00000-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34574386 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00001-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34575034 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00002-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34588117 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00003-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34578050 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00004-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34584603 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00005-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34595888 2016-10-24 01:09 /user/srikary/data/2016/07/05/part-r-00006-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34582493 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00007-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34594552 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00008-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34584819 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00009-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34601397 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00010-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34580279 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00011-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34651221 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00012-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34605249 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00013-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34561204 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00014-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34603328 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00015-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34575536 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00016-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 
-rw-r--r-- 3 srikar srikar 34597036 2016-10-24 01:08 /user/srikary/data/2016/07/05/part-r-00017-982889cd-a118-46a0-8349-732f8c3fd678.gz.parquet 

隨着snappy

[email protected]:~$ hadoop fs -ls /user/srikary/test/partitions_test 
Found 96 items 
-rw-r--r-- 3 srikary srikary   0 2016-12-12 00:45 /user/srikary/test/partitions_test/_SUCCESS 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00000-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00001-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59317007 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00002-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00003-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00004-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00005-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary56 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00006-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00007-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00008-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00009-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59322819 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00010-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00011-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00012-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00013-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59313102 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00014-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00015-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00016-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00017-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59323721 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00018-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00019-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00020-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00021-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59316186 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00022-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00023-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00024-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00025-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59323141 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00026-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00027-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00028-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00029-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59322078 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00030-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00031-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00032-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00033-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59325795 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00034-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00035-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00036-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00037-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59329053 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00038-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00039-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00040-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00041-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59317677 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00042-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00043-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00044-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00045-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59324442 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00046-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00047-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00048-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00049-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59325743 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00050-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00051-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00052-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00053-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59317381 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00054-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00055-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00056-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00057-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59324735 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00058-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00059-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00060-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00061-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59320296 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00062-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00063-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00064-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00065-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59312148 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00066-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00067-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00068-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00069-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59326905 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00070-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00071-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00072-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00073-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary 59326284 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00074-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00075-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00076-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00077-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00078-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00079-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00080-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00081-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00082-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00083-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00084-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00085-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00086-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00087-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00088-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  467 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00089-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00090-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00091-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00092-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00093-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 
-rw-r--r-- 3 srikary srikary  516 2016-12-12 00:45 /user/srikary/test/partitions_test/part-r-00094-99ba2a41-5c27-4e30-9ebc-4b055013ec66.snappy.parquet 

回答

0

爲什麼Spark將空文件寫入磁盤?

它寫入分區內容。無論是否爲空都沒關係。這是Spark和相關係統上的正常行爲。

我該如何讓Spark優化分區大小爲每個文件128MB? (我知道這可以通過重新分區/合併來實現,但我需要計算重新分區的參數)

這是一個很難回答的問題。一般來說,獲得多個分區的準確值是不可能的。鑲木地板在數據上應用不同的壓縮技術,其內容與原始數據同樣重要。

相關問題