0
我需要索引5種不同的xml文件。他們有相似的結構,每個人都有細微的差異。如何使用SOLR中的DIH對不同類型的XML進行索引?
例子1:
<?xml version="1.0"?>
<manifest>
<metadata>
<isbn>9780815341291</isbn>
<title>Essential Cell Biology,Third Edition</title>
<authors>
<author>Alberts;Bruce</author>
<author>Bray;Dennis</author>
</authors>
<categories>
<category>SCABC</category>
<category>SCDEF</category>
</categories>
</metadata>
<resources>
<audioresource>
<uuid>123456789</uuid>
<source>03_Mutations_Origin_Cancer.mp3</source>
<mimetype>audio/mpeg</mimetype>
<title>Part Three - Mutations and the Origin of Cancer</title>
<description>123</description>
<chapters>
<chapter>1</chapter>
</chapters>
</audioresource>
</resources>
</manifest>
例子2:
<?xml version="1.0"?>
<manifest>
<metadata>
<isbn>9780815341291</isbn>
<title>Essential Cell Biology,Third Edition</title>
<authors>
<author>FN:Alberts;Bruce</author>
<author>FN:Bray;Dennis</author>
</authors>
<categories>
<category>SCABC</category>
<category>SCGHI</category>
</categories>
</metadata>
<resources>
<glossaryresource>
<uuid>123456789</uuid>
<term>A subunit </term>
<definition>The portion of a bacterial exotoxin that interferes with normal host cell function. </definition>
<chapters>
<chapter>10</chapter>
</chapters>
</glossaryresource>
</resources>
</manifest>
我二氫-config.xml文件是如下:
<dataConfig>
<dataSource name="fileReader" type="FileDataSource" encoding="UTF-8"/>
<document>
<entity name="dir" rootEntry="false" dataSource="null" processor="FileListEntityProcessor" fileName="^.*\.xml$" recursive="true" baseDir="X:/tmp/npr">
<entity name="audioresource"
rootEntity="true"
dataSource="fileReader"
url="${dir.fileAbsolutePath}"
stream="false"
logTemplate=" processing ${dir.fileAbsolutePath}"
logLevel="debug"
processor="XPathEntityProcessor"
forEach="/manifest/metadata | /manifest/metadata/authors | /manifest/metadata/categories | /manifest/metadata/resources | /manifest/resources/audioresource | /manifest/resources/audioresource/chapters"
transformer="DateFormatTransformer">
<field column="category" xpath="/manifest/metadata/categories/category" />
<field column="author" xpath="/manifest/metadata/authors/author" />
<field column="book_title" xpath="/manifest/metadata/title" />
<field column="isbn" xpath="/manifest/metadata/isbn"/>
<field column="id" xpath="/manifest/resources/audioresource/uuid"/>
<field column="mimetype" xpath="/manifest/resources/audioresource/mimetype" />
<field column="title" xpath="/manifest/resources/audioresource/title"/>
<field column="description" xpath="/manifest/resources/audioresource/description"/>
<field column="chapter" xpath="/manifest/resources/audioresource/chapters/chapter"/>
<field column="source" xpath="/manifest/resources/audioresource/source"/>
</entity>
</entity>
</document>
</dataConfig>
我不是很熟悉XPath的。我不能在元素名稱中使用通配符,我可以嗎?試過了,它不起作用。
非常感謝提前。