在hive中並行運行查詢

我用了一段時間的配置單元，但是，從來沒有想過這件事。我正在嘗試在hive -f sql-file中並行運行查詢嗎？任何人都知道如何做到這一點？由於在hive中並行運行查詢

2013-02-11 user1653240

蜂巢將隱蔽的HiveQL查詢到MapReduce工作和MapReduce作業可以並行基於集羣的規模和配置的調度類型運行。因此，Hive查詢將自動在Hadoop集羣上並行運行。

2013-02-12 06:22:55

我打算在文件中並行運行獨立查詢。例如： 1）SELECT COUNT（1）FROM t1; 2）SELECT COUNT（1）FORM t2; 我想並行運行1）和2），以便1）不會阻塞2）。 – user1653240 2013-02-12 19:57:23

打開兩個單獨的Hive shell並執行HiveQL查詢。 – 2013-02-13 07:11:41

這絕對是一種方式，有沒有辦法在SQL文件中查詢非阻塞？例如，指定一些Hive標誌... – user1653240 2013-02-14 07:05:13

在蜂巢的任何查詢被編譯爲Map-Reduce和Hadoop的運行。 Map-reduce是一個並行處理框架，因此您的每個Hive查詢都將並行運行和處理數據。

同樣的問題，我問，但在某些不同的方式。有關更多詳細信息，請參閱here。

來源

2013-02-12 09:35:45

@ user1653240爲了在同一時間獨立運行的查詢，我在做什麼是：

認沽查詢到不同的文件，例如，select count(1) from t1 - > file1.sql，select count(1) from t2 - >文件2。 sql
使用nohup和&命令。採取file1.sql和file2.sql爲例，運行：nohup hive -f file1.sql & nohup hive -f file2.sql，這將並行運行這兩個查詢。
如果你想在後臺運行，只需添加一個&到底。對於例如：(nohup hive -f file1.sql & nohup hive -f file2.sql) &

來源

2016-01-18 20:54:47 legbird

蜂巢查詢規劃應該能夠parallelise在特定情況下的東西。您需要雖然設置配置選項：如果您希望並行運行完全獨立的查詢從https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

採取

hive.exec.parallel 

Default Value: false 
Added In: Hive 0.5.0 

Whether to execute jobs in parallel. Applies to MapReduce jobs that can run in parallel, for example jobs processing different source tables before a join. As of Hive 0.14, also applies to move tasks that can run in parallel, for example moving files to insert targets during multi-insert.

，它可能是運行它作爲獨立的文件，其它單獨的作業最好的選擇建議。

來源

2016-01-19 15:44:01 LiMuBei

這是我選擇了做，因爲我無法找到一個方法來從蜂巢本身做到這一點。只需將文件名/數據庫替換爲您的情況。

# This file should have all the queries separated with semicolon ';' 
queries=`cat queries_file.sql` 
count=0 
while true; do 
    ((count++)) 
    query=`echo ${queries} | cut -d';' -f${count}` 
    if [ -z "${query}" ]; then 
     echo "Completed executing ${count} - 1 queries." 
     exit 
    fi 
    echo "${query}" 
    hive --database "your_db" -e "${query};" & 

    # This is optional. If you want to give some gap, say after every 5 
    # concurrent queries, use this. Or remove next 4 lines. 
    mod=`expr ${count} % 5` 
    if [ ${mod} -eq 0 ]; then 
     sleep 30 
    fi 
done

來源

2016-07-28 13:24:26 PratPor

在hive中並行運行查詢

回答

相關問題