2010-09-20 73 views
0

我需要索引5種不同的xml文件。他們有相似的結構,每個人都有細微的差異。如何使用SOLR中的DIH對不同類型的XML進行索引?

例子1:

<?xml version="1.0"?> 

    <manifest> 
    <metadata> 
       <isbn>9780815341291</isbn> 
       <title>Essential Cell Biology,Third Edition</title> 
       <authors> 
         <author>Alberts;Bruce</author> 
         <author>Bray;Dennis</author> 
       </authors> 
       <categories> 
         <category>SCABC</category> 
         <category>SCDEF</category> 
       </categories> 
    </metadata> 
    <resources> 
       <audioresource> 
         <uuid>123456789</uuid> 
         <source>03_Mutations_Origin_Cancer.mp3</source> 
         <mimetype>audio/mpeg</mimetype> 
         <title>Part Three - Mutations and the Origin of Cancer</title> 
         <description>123</description> 
         <chapters> 
           <chapter>1</chapter> 
         </chapters> 
       </audioresource> 
    </resources> 
</manifest> 

例子2:

<?xml version="1.0"?> 
<manifest> 
     <metadata> 
       <isbn>9780815341291</isbn> 
       <title>Essential Cell Biology,Third Edition</title> 
       <authors> 
         <author>FN:Alberts;Bruce</author> 
         <author>FN:Bray;Dennis</author> 
       </authors> 
       <categories> 
         <category>SCABC</category> 
         <category>SCGHI</category> 
       </categories> 
     </metadata> 

     <resources> 
       <glossaryresource> 
         <uuid>123456789</uuid> 
         <term>A subunit </term> 
         <definition>The portion of a bacterial exotoxin that interferes with normal host cell function. </definition> 
         <chapters> 
           <chapter>10</chapter> 
         </chapters> 
       </glossaryresource> 
     </resources> 
</manifest> 

我二氫-config.xml文件是如下:

<dataConfig> 
     <dataSource name="fileReader" type="FileDataSource" encoding="UTF-8"/> 
     <document> 
       <entity name="dir" rootEntry="false" dataSource="null" processor="FileListEntityProcessor" fileName="^.*\.xml$" recursive="true" baseDir="X:/tmp/npr"> 
         <entity name="audioresource" 
             rootEntity="true" 
             dataSource="fileReader" 
             url="${dir.fileAbsolutePath}" 
             stream="false" 
             logTemplate=" processing ${dir.fileAbsolutePath}" 
             logLevel="debug" 
             processor="XPathEntityProcessor" 
             forEach="/manifest/metadata | /manifest/metadata/authors | /manifest/metadata/categories | /manifest/metadata/resources | /manifest/resources/audioresource | /manifest/resources/audioresource/chapters" 
             transformer="DateFormatTransformer"> 

             <field column="category" xpath="/manifest/metadata/categories/category" /> 
             <field column="author" xpath="/manifest/metadata/authors/author" /> 
             <field column="book_title" xpath="/manifest/metadata/title" /> 
             <field column="isbn" xpath="/manifest/metadata/isbn"/> 
             <field column="id" xpath="/manifest/resources/audioresource/uuid"/> 
             <field column="mimetype" xpath="/manifest/resources/audioresource/mimetype" /> 
             <field column="title" xpath="/manifest/resources/audioresource/title"/> 
             <field column="description" xpath="/manifest/resources/audioresource/description"/> 
             <field column="chapter" xpath="/manifest/resources/audioresource/chapters/chapter"/> 
             <field column="source" xpath="/manifest/resources/audioresource/source"/> 
         </entity> 
       </entity> 
     </document> 
</dataConfig> 

我不是很熟悉XPath的。我不能在元素名稱中使用通配符,我可以嗎?試過了,它不起作用。

非常感謝提前。

回答

0

我目前正在調查類似的問題。您是否嘗試過創建XSLT?實體元素具有可選的「xsl」屬性。

相關問題