2017-08-09 22 views
0

對於BigQuery批處理管道,模板只能執行一次,因爲BigQuery作業ID是在模板創建時設置的。我正在使用Apache beam v2.0.0,並且無法多次執行模板。我們可以在頭部使用光束進行這種限制嗎?如果是的話,我想知道的第一件事是什麼是梁?我的Apache Beam程序需要多次支持多次模板執行的確切更改?使用BigQuery作爲接收器多次執行模板

Maven的相關性:

<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-jms</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-examples-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-examples-java8</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-common-fn-api</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-build-tools</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-core</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-join-library</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-protobuf</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-extensions-sorter</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-amqp</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-jdbc</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-kafka</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-kinesis</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-mongodb</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-mqtt</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-java-io-solr</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-core-construction-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-core-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-direct-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-runners-google-cloud-dataflow-java</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.beam</groupId> 
    <artifactId>beam-sdks-common-runner-api</artifactId> 
    <version>2.2.0-SNAPSHOT</version> 
</dependency> 

回答

1

這是問題BEAM-2058。如果您使用Beam github repository的最新代碼,它應該是固定的。除了構建Beam的新版本並更新您的pom.xml以使用它之外,您不需要執行任何操作。

或者,等待正在準備的Beam 2.1.0版本。

+0

感謝您的回覆。我已經更新了我的pom.xml文件,它基於github存儲庫中存在的pom.xml文件,但仍面臨同樣的問題。在github存儲庫上,多個代碼組件顯示在不同的文件夾下。您能否讓我知道最新代碼的確切路徑,以便我可以在DataFlow程序中使用它?另外,請讓我知道我的DF程序中需要添加哪些片段(來自GitHub代碼)來解決模板執行問題? 對不起,我是github的新手。你能幫我解決嗎 –

+0

你不應該根據Github中的內容修改你的代碼。相反,您需要克隆github存儲庫並使用Maven進行安裝。這將使您的代碼可以使用2.2.0-SNAPSHOT版本。然後,您只需更新您的pom.xml以引用您構建的Beam的2.2.0-SNAPSHOT版本,然後使用它。 –

+0

[Beam貢獻指南](https://beam.apache.org/contribute/contribution-guide/#one-time-setup)還介紹了從Github代碼庫構建Beam所需的步驟,這可能有助於安裝最新版本所需的步驟。完成之後,除了使用您構建的新版本之外,您不需要對項目進行任何更改。 –