2013-08-22 53 views
2

我最近問了這個問題,但是意識到我沒有很清楚地解釋它。 我有一個很大的.csv文件(8000+行),由發票組成,每個發票有多行。我將其解析爲XML結構,如下所示(簡化)。XSLT將大的單個父節點拆分成較小的子節點

輸入1 - $ XMLInput

<?xml version="1.0" encoding="UTF-8"?> 
<root> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-1</invoiceText> 
     <position>1<position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-2</invoiceText> 
     <position>2<position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-1</invoiceText> 
     <position>3<position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-2</invoiceText> 
     <position>4<position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-1</invoiceText> 
     <position>5<position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-2</invoiceText> 
     <position>6<position> 
     ... 
    </row> 
</roow> 

輸入2 - $ maxBatchSize 描述:中斷到下一批次它變得比這個尺寸(常數)

輸入較大的後3 - $ listOfInvoices 描述:文檔中唯一發票編號的重複變量。例如:

<root> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
    </row> 
</root> 

爲了提高性能時間,我需要組這些元件由invoiceNumber,成批不大於x的每個節點(變量要導入)。從那裏我將每批發送到一個子處理器,而不是一次處理整個原始文檔。例如,在上面的例子中的XML文檔,如果批量大小可能不大於3,我需要以下XML輸出:

輸出1 - $ XMLOutput

<root> 
    <batch> 
     <row> 
      <invoiceNumber>1</invoiceNumber> 
      <invoiceText>invoice 1-1</invoiceText> 
      <position>1<position> 
      ... 
     </row> 
     <row> 
      <invoiceNumber>1</invoiceNumber> 
      <invoiceText>invoice 1-2</invoiceText> 
      <position>2<position> 
      ... 
     </row> 
     <row> 
      <invoiceNumber>2</invoiceNumber> 
      <invoiceText>invoice 2-1</invoiceText> 
      <position>3<position> 
      ... 
     </row> 
     <row> 
      <invoiceNumber>2</invoiceNumber> 
      <invoiceText>invoice 2-2</invoiceText> 
      <position>4<position> 
      ... 
     </row> 
    </batch> 
    <batch> 
     <row> 
      <invoiceNumber>3</invoiceNumber> 
      <invoiceText>invoice 3-1</invoiceText> 
      <position>5<position> 
      ... 
     </row> 
     <row> 
      <invoiceNumber>3</invoiceNumber> 
      <invoiceText>invoice 3-2</invoiceText> 
      <position>6<position> 
      ... 
     </row> 
    </batch> 
</root> 

這是一個要求,即所有發票的行在同一批中發送。我最初的XSLT嘗試是低於(2.0),我嘗試模擬一個while循環,通過遞歸調用模板,將發票組附加到當前節點。當達到最大批量時,我遞歸地調用批處理模板來創建一個新的批處理。我在每次遞歸調用之間傳遞發票和批處理計數器。

編輯:感謝肯的幫助我越來越近。我確實需要每次按行數劃分發票,而不是明確發票的數量。理論上,如果以下工作,我不知道如何確保發票號碼不存在於前面的兄弟節點中。

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<xsl:variable name="batch-size" select="40" as="xs:integer"/> 
<xsl:variable name="input" select="bpws:getVariableData('sortedInvoicesByBU')"/> 
<xsl:key name="invoice-lines-by-invoice-number" match="row" use="invoiceNumber4z"/> 

<xsl:template match="/"> 
    <xsl:element name="batches"> 
     <!--establish batches from possible non-contiguous invoice numbers--> 
     <xsl:for-each-group select="$input/*:UPSData/*:row" group-by="(position() - 1) idiv $batch-size"> 
      <xsl:for-each select="distinct-values($input/*:UPSData/*:row/*:invoiceNumber4z)[not(.=preceding-sibling::item)]"> 
       <xsl:element name="UPSData"> 
        <xsl:for-each select="current()"> 
         <xsl:for-each select="key('invoice-lines-by-invoice-number',.,$input)"> 
          <!--copy rows as they are--> 
          <xsl:copy-of select="."/> 
         </xsl:for-each> 
        </xsl:for-each> 
       </xsl:element> 
      </xsl:for-each> 
     </xsl:for-each-group> 
    </xsl:element> 
</xsl:template> 
</xsl:stylesheet> 

回答

4

我告訴我的學生,可以折磨一個樣式多達必要終於得到它的工作,但是這並不能使它維護,甚至做的事情以正確的方式。我希望你會接受這樣的分析,即你將XSLT視爲一種命令式編程語言,這種語言沒有公正性,只會讓你相信嘗試在C和Java中執行的事情更加容易,冗長和尷尬。

但是,如果您按照設計的方式使用XSLT,則它比命令式語言更容易,並且啓動它都基於XML,您可以在其中顯示所需的結果。因爲它更短,維護起來更容易。當你理解正在使用的聲明性指令時,你不必嘗試解開一個強制性的算法。 XSLT處理器可以優化聲明式方法,但如果它遵循書面的命令式方法而沒有機會對其進行優化,則它有義務緩慢工作。

在下面的解決方案中,您將精確地生成您的Output1結果,請注意我如何確定唯一的發票號碼,然後通過有效的方式對它們進行過濾。然後我根據批量大小(這是一個參數)對這些進行批量處理。沒有被調用的模板,沒有任何類型的計數器......使用XSLT 2.0的內置工具的解決方案。

而且不包括全局參數和變量和意見的聲明,這只是5個元素長:<root><xsl:for-each-group><batch><xsl:for-each><xsl:copy-of>

至於你的問題你爲什麼不工作,我不知道......你採取的方法並不像「XSLT」那樣「感覺」......它感覺像是某種程序化命令式方法的XSLT表達式。

t:\ftemp>type numbers.xml 
<root> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
    </row> 
</root> 

t:\ftemp>type invoices.xml 
<?xml version="1.0" encoding="UTF-8"?> 
<root> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-1</invoiceText> 
     <position>1</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-2</invoiceText> 
     <position>2</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-1</invoiceText> 
     <position>3</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-2</invoiceText> 
     <position>4</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-1</invoiceText> 
     <position>5</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-2</invoiceText> 
     <position>6</position> 
     ... 
    </row> 
</root> 

t:\ftemp>call xslt2 invoices.xml invoices.xsl 
<?xml version="1.0" encoding="UTF-8"?> 
<root> 
    <batch> 
     <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-1</invoiceText> 
     <position>1</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-2</invoiceText> 
     <position>2</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-1</invoiceText> 
     <position>3</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-2</invoiceText> 
     <position>4</position> 
     ... 
    </row> 
    </batch> 
    <batch> 
     <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-1</invoiceText> 
     <position>5</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-2</invoiceText> 
     <position>6</position> 
     ... 
    </row> 
    </batch> 
</root> 

t:\ftemp>type invoices.xsl 
<?xml version="1.0" encoding="US-ASCII"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
       version="2.0"> 

<xsl:output indent="yes"/> 

<xsl:param name="batch-size" select="2"/> 

<xsl:variable name="valid-numbers" 
       select="doc('numbers.xml')/root/row/invoiceNumber"/> 

<xsl:template match="/"> 
    <xsl:variable name="invoiceLines" select="root/row"/> 
    <root> 
    <!--establish batches from possible non-contiguous invoice numbers--> 
    <xsl:for-each-group group-by="(position() - 1) idiv $batch-size" 
     select="distinct-values($invoiceLines/invoiceNumber)[.=$valid-numbers]"> 
     <!--create a batch using all invoice lines for all numbers in group--> 
     <batch> 
     <xsl:for-each select="$invoiceLines[invoiceNumber=current-group()]"> 
      <!--copy rows as they are--> 
      <xsl:copy-of select="."/> 
     </xsl:for-each> 
     </batch> 
    </xsl:for-each-group> 
    </root> 
</xsl:template> 

</xsl:stylesheet> 
t:\ftemp>rem Done! 

我編輯這個答案補充下面,因爲你的狀態的替代你有800萬個的輸入記錄我想用一個鍵查找表將執行比我的簡單的變量斷言更好。它會在模板中生成一個額外的XSLT指令的相同結果(可以在不添加它的情況下完成,但我認爲這更易讀)並刪除不再需要的變量。

<?xml version="1.0" encoding="US-ASCII"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
       version="2.0"> 

<xsl:output indent="yes"/> 

<xsl:param name="batch-size" select="2"/> 

<xsl:variable name="valid-numbers" 
       select="doc('numbers.xml')/root/row/invoiceNumber"/> 

<xsl:key name="invoice-lines-by-invoice-number" 
     match="row" use="invoiceNumber"/> 

<xsl:variable name="input" select="/"/> 

<xsl:template match="/"> 
    <root> 
    <!--establish batches from possible non-contiguous invoice numbers--> 
    <xsl:for-each-group group-by="(position() - 1) idiv $batch-size" 
     select="distinct-values(root/row/invoiceNumber)[.=$valid-numbers]"> 
     <!--create a batch using all invoice lines for all numbers in group--> 
     <batch> 
     <xsl:for-each select="current-group()"> 
      <xsl:for-each 
        select="key('invoice-lines-by-invoice-number',.,$input)"> 
      <!--copy rows as they are--> 
      <xsl:copy-of select="."/> 
      </xsl:for-each> 
     </xsl:for-each> 
     </batch> 
    </xsl:for-each-group> 
    </root> 
</xsl:template> 

</xsl:stylesheet> 
+0

再次謝謝你。我絕對同意我試圖採取一種程序化的方法,並迫使XSLT相應地適應它,只需要開始學習如何用功能語言進行思考。至於爲什麼我嘗試的編程方法不起作用,我會說這是因爲這不是它的設計方式。 – rwolters3

+0

現在我想這是你幫助過的兩個問題,我一定會下載和學習你的書,看看你是否在我的領域有任何講座/課程,或者以前的講座的緩存版本。另外,我寫了一個小的錯字,我打算說8000或8千條記錄,而不是800萬,處理時間要快得多。 – rwolters3

+0

我將我的StackOverflow配置文件更新爲即將發佈的講座系列或http://www.CraneSoftwrights.com/schedule.htm#calendar提供的信息。 在http://www.CraneSoftwrights.com/links/udemy-ptux-online.htm上,XSLT/XPath上有5個小時的免費流視頻講座,您甚至不需要設置用戶名即可只是自由觀看。 Udemy擁有可通過http://www.CraneSoftwrights.com/training/ptux/ptux-video.htm頁面購買的DVD流媒體版本。兩者都有完整答案的練習。獨立書沒有練習。 –

0

請不要將此標記爲答案,因爲我的上一個答案回答了原始問題。

下面的代碼回答瞭如何按發票總行數進行批量處理的輔助問題,而不會在兩個批次之間打破發票。

我無法想象一種聲明式的方式,所以下面的答案是一個必要的遞歸解決方案,但是這樣編寫,使得實現尾遞歸的XSLT處理器不會佔用堆棧空間。我還利用原生XSLT功能(關鍵表和序列),這些功能在其他語言中很難使用。

代碼非常緊湊,只有一個部分實際寫出了一批發票......沒有更多的批量寫入代碼塊。我很滿意這是怎麼發生的。

我歡迎任何有關改進的建議或者比這更緊密的替代解決方案。

t:\ftemp>type numbers.xml 
<root> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
    </row> 
    <row> 
     <invoiceNumber>5</invoiceNumber> 
    </row> 
</root> 

t:\ftemp>type invoices.xml 
<?xml version="1.0" encoding="UTF-8"?> 
<root> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-1</invoiceText> 
     <position>1</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-2</invoiceText> 
     <position>2</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-1</invoiceText> 
     <position>3</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-2</invoiceText> 
     <position>4</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-1</invoiceText> 
     <position>5</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-2</invoiceText> 
     <position>6</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-1</invoiceText> 
     <position>7</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-2</invoiceText> 
     <position>8</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-3</invoiceText> 
     <position>9</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-4</invoiceText> 
     <position>10</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-5</invoiceText> 
     <position>11</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-6</invoiceText> 
     <position>12</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>5</invoiceNumber> 
     <invoiceText>invoice 5-1</invoiceText> 
     <position>13</position> 
     ... 
    </row> 
    <row> 
     <invoiceNumber>5</invoiceNumber> 
     <invoiceText>invoice 5-2</invoiceText> 
     <position>14</position> 
     ... 
    </row> 
</root> 

t:\ftemp>call xslt2 invoices.xml invoices.xsl 
<?xml version="1.0" encoding="UTF-8"?> 
<root> 
    <!--Batch max lines: 5--> 
    <batch> 
    <!--invoice numbers: 1 2--> 
    <!--total line count: 4--> 
    <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-1</invoiceText> 
     <position>1</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>1</invoiceNumber> 
     <invoiceText>invoice 1-2</invoiceText> 
     <position>2</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-1</invoiceText> 
     <position>3</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>2</invoiceNumber> 
     <invoiceText>invoice 2-2</invoiceText> 
     <position>4</position> 
     ... 
    </row> 
    </batch> 
    <batch> 
    <!--invoice numbers: 3--> 
    <!--total line count: 2--> 
    <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-1</invoiceText> 
     <position>5</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>3</invoiceNumber> 
     <invoiceText>invoice 3-2</invoiceText> 
     <position>6</position> 
     ... 
    </row> 
    </batch> 
    <batch> 
    <!--invoice numbers: 4--> 
    <!--total line count: 6--> 
    <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-1</invoiceText> 
     <position>7</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-2</invoiceText> 
     <position>8</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-3</invoiceText> 
     <position>9</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-4</invoiceText> 
     <position>10</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-5</invoiceText> 
     <position>11</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>4</invoiceNumber> 
     <invoiceText>invoice 4-6</invoiceText> 
     <position>12</position> 
     ... 
    </row> 
    </batch> 
    <batch> 
    <!--invoice numbers: 5--> 
    <!--total line count: 2--> 
    <row> 
     <invoiceNumber>5</invoiceNumber> 
     <invoiceText>invoice 5-1</invoiceText> 
     <position>13</position> 
     ... 
    </row> 
     <row> 
     <invoiceNumber>5</invoiceNumber> 
     <invoiceText>invoice 5-2</invoiceText> 
     <position>14</position> 
     ... 
    </row> 
    </batch> 
</root> 

t:\ftemp>type invoices.xsl 
<?xml version="1.0" encoding="US-ASCII"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
       version="2.0"> 

<xsl:output indent="yes"/> 

<xsl:param name="batch-size" select="5"/> 

<xsl:variable name="valid-numbers" 
       select="doc('numbers.xml')/root/row/invoiceNumber"/> 

<xsl:key name="invoice-lines-by-invoice-number" 
     match="row" use="invoiceNumber"/> 

<xsl:variable name="input" select="/"/> 

<xsl:template match="/"> 
    <root> 
    <xsl:text>&#xa; </xsl:text> 
    <xsl:comment select="'Batch max lines:',$batch-size"/> 
    <xsl:text>&#xa; </xsl:text> 
    <xsl:call-template name="next-batch"> 
     <xsl:with-param name="remaining-numbers" 
     select="distinct-values(root/row/invoiceNumber)[.=$valid-numbers]"/> 
    </xsl:call-template> 
    </root> 
</xsl:template> 

<xsl:template name="next-batch"> 
    <xsl:param name="this-batch-lines" select="0"/> 
    <xsl:param name="this-batch-numbers" select="()"/> 
    <xsl:param name="remaining-numbers" required="yes"/> 
    <xsl:variable name="this-invoice" select="$remaining-numbers[1]"/> 
    <xsl:variable name="this-invoice-lines" 
    select="count(key('invoice-lines-by-invoice-number',$this-invoice,$input))"/> 

    <xsl:choose> 
    <xsl:when test="not($this-invoice) and not($this-batch-lines)"> 
     <!--nothing to clean up and nothing more to do--> 
    </xsl:when> 
    <xsl:when test="not($this-invoice) (:last invoice complete:) or 
        ($this-batch-lines + $this-invoice-lines > $batch-size) 
         (:this invoice exceeds limit:)"> 
     <!--clean up previous unfinished batch--> 
     <batch> 
     <xsl:text>&#xa; </xsl:text> 
     <xsl:comment select="'invoice numbers:',$this-batch-numbers"/> 
     <xsl:text>&#xa; </xsl:text> 
     <xsl:comment select="'total line count:',$this-batch-lines"/> 
     <xsl:text>&#xa; </xsl:text> 
     <xsl:copy-of select="for $num in $this-batch-numbers return 
         key('invoice-lines-by-invoice-number',$num,$input)"/> 
     </batch> 
     <xsl:if test="$this-invoice"> 
     <!--continue with the next batch comprised of this invoice only--> 
     <xsl:call-template name="next-batch"> 
      <xsl:with-param name="this-batch-lines" 
          select="$this-invoice-lines"/> 
      <xsl:with-param name="this-batch-numbers" 
          select="$this-invoice"/> 
      <xsl:with-param name="remaining-numbers" 
          select="$remaining-numbers[position()>1]"/> 
     </xsl:call-template> 
     </xsl:if> 
     <!--the cleaned up batch was the last batch, template recursion ends--> 
    </xsl:when> 
    <xsl:otherwise> 
     <!--a batch limit has not been exceeded; add this invoice to batch--> 
     <xsl:call-template name="next-batch"> 
     <xsl:with-param name="this-batch-lines" 
         select="$this-batch-lines + $this-invoice-lines"/> 
     <xsl:with-param name="this-batch-numbers" 
         select="($this-batch-numbers,$this-invoice)"/> 
     <xsl:with-param name="remaining-numbers" 
          select="$remaining-numbers[position()>1]"/> 
     </xsl:call-template> 
    </xsl:otherwise> 
    </xsl:choose> 
</xsl:template> 

</xsl:stylesheet> 
相關問題