2017-01-10 60 views
0

我正在使用XmlSource.from來讀取存儲在雲存儲存儲桶中的XML文件。在Google Cloud Dataflow中使用XmlSource讀取XML文件時發生ClassCastException

XmlSource<Data> source = XmlSource.<Data>from("gs://<my-url>/TestData.xml") 
     .withRootElement("data") 
     .withRecordElement("record") 
     .withRecordClass(Data.class); 

p.apply(Read.from(source)) 
     .apply(RemoveDuplicates.<Data>create()) 
     .apply(ParDo.of(new XMLPipeline.CreateItemQtyMapping())) 
     .apply(Combine.<String, Integer>perKey(new SumIntegers())) 
     .apply("FormatResults", MapElements.via(
       new SimpleFunction<KV<String, Integer>, String>() { 
        @Override 
        public String apply(KV<String, Integer> input) { 
        return input.getKey() + "," + input.getValue(); 
        } 
       })) 
     .apply(TextIO.Write.to("gs://<my-url>.appspot.com/pos-pipeline-output/ItemCounts")); 

p.run(); 

但我得到這個異常:

017-01-09T14:01:31.107Z: Error: (c88c756cabe0dbec): java.io.IOException: Failed to start reading from source: StaticValueProvider{value=gs://<my-url>/TestData.xml} range [48524, 97048) 
at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:534) 
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:387) 
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:217) 
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182) 
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:284) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:220) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:170) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:192) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:172) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:159) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.ClassCastException: com.sun.xml.internal.stream.XMLInputFactoryImpl cannot be cast to org.codehaus.stax2.XMLInputFactory2 
    at com.google.cloud.dataflow.sdk.io.XmlSource$XMLReader.setUpXMLParser(XmlSource.java:490) 
    at com.google.cloud.dataflow.sdk.io.XmlSource$XMLReader.startReading(XmlSource.java:356) 
    at com.google.cloud.dataflow.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528) 
    at com.google.cloud.dataflow.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:281) 
    at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:531) 
    ... 14 more 

這些都是在我的pom.xml的依賴關係:

<dependencies> 
<dependency> 
    <groupId>com.google.cloud.dataflow</groupId> 
    <artifactId>google-cloud-dataflow-java-sdk-all</artifactId> 
    <version>1.9.0</version> 
</dependency> 

<dependency> 
    <groupId>com.google.cloud</groupId> 
    <artifactId>google-cloud-storage</artifactId> 
    <version>0.7.0</version> 
</dependency> 

<dependency> 
    <groupId>org.codehaus.woodstox</groupId> 
    <artifactId>stax2-api</artifactId> 
    <version>4.0.0</version> 
</dependency> 

我不知道什麼是錯在這裏。有人可以給一些指針嗎?

感謝,

阿布舍克

+0

這看起來可能是一個錯誤。我將深入探討一下,但您可以使用SDK 1.8.0解決問題。 –

回答

0

爲我解決 java.lang.ClassCastException:com.sun.xml.internal.stream.XMLInputFactoryImpl不能轉換到org.codehaus.stax2.XMLInputFactory2

的答案是只使用的爲org.codehaus.woodstox依賴性:woodstox.core.asl

已經具有間接的依賴關係,STAX和stax2(javax.xml.stream - STAX的API,org.codehaus.woodstox - stax2-API) 。

相關問題