2012-03-09 37 views
1

美好的一天!XSLT2不區分大小寫的正則表達式

這是我第一次在這裏發佈一個問題,所以請耐心等待。

我有在如何使匹配不敏感的情況下,一個問題,我想補充的標誌,但不知道這是正確的地方插旗子,因爲我仍然代碼似乎不區分大小寫。

預先感謝您分享您的想法,我可以如何解決這個問題。

--elmer

我做編輯這個帖子添加更多的代碼我做了和任何一個源XML文件的樣品要進行測試。索引條目列表上

基地我需要搜索從源XML每一個項目進行搜索,並且必須是不區分大小寫。例如,如果搜索單詞「縮寫」,所有單詞「縮寫詞或縮寫詞」必須在縮寫或縮寫詞旁邊有錨元素。

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" xmlns:saxon="http://saxon.sf.net/" 
xmlns:ati="http://www.asiatype.com/Functions" exclude-result-prefixes="#all"> 

<xsl:output method="xml" encoding="UTF-8" indent="no"/> 

<xsl:template match="@*|node()"> 
    <xsl:copy> 
     <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
</xsl:template> 

<!-- create variable for EALL_Index--> 
<xsl:variable name="index" as="element() +"> 
    <xsl:for-each select="for $doc in document('EALL_Index_test.xml') return $doc/descendant::item"> 
     <xsl:sort select="string-length(normalize-space(text()[1]))" order="descending"/> 
     <list> 
      <entry> 
       <xsl:variable name="itemString"> 
        <xsl:analyze-string select="text()[1]" regex="(\[)|(\])|(\()|(\))"> 
         <xsl:matching-substring> 
          <xsl:value-of select="replace(regex-group(1),'\[','\\[')"/> 
          <xsl:value-of select="replace(regex-group(2),'\]','\\]')"/> 
          <xsl:value-of select="replace(regex-group(3),'\(','\\(')"/> 
          <xsl:value-of select="replace(regex-group(4),'\)','\\)')"/> 
         </xsl:matching-substring> 
         <xsl:non-matching-substring> 
          <xsl:value-of select="."/> 
         </xsl:non-matching-substring> 
        </xsl:analyze-string> 
       </xsl:variable> 
       <xsl:value-of select="normalize-space($itemString)"/> 
      </entry> 
      <xsl:for-each select="link"> 
       <link target="{@targets}" id="{@n}"> 
       </link> 
      </xsl:for-each> 
     </list> 
    </xsl:for-each> 
</xsl:variable> 

<xsl:template match="text()[ancestor::*[self::simplearticle or self::complexarticle]]"> 
    <xsl:variable name="id" as="xs:string" select="current()/ancestor::*[self::simplearticle or self::complexarticle]/@id"/> 
    <xsl:variable name="srchString" as="xs:string" select="."/> 
    <xsl:choose> 
     <xsl:when test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'))"> 
      <xsl:message select="concat('(^|\W)(',string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')"/> 
      <xsl:analyze-string select="$srchString" regex="{concat('(^|\W)(',string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}"> 
       <xsl:matching-substring> 
        <xsl:value-of select="regex-group(1)"/> 
        <xsl:for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'))]/link[@target=$id]"> 
         <xsl:if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'))"> 
          <anchor target="{@target}" id="{@id}"/> 
         </xsl:if> 
        </xsl:for-each> 
        <xsl:value-of select="regex-group(2)"/> 
        <xsl:value-of select="regex-group(3)"/> 
       </xsl:matching-substring> 
      <xsl:non-matching-substring> 
       <xsl:value-of select="."/> 
      </xsl:non-matching-substring> 
     </xsl:analyze-string> 
    </xsl:when> 
    <xsl:otherwise> 
     <xsl:sequence select="."/> 
    </xsl:otherwise> 
    </xsl:choose> 
</xsl:template> 

索引條目源XML:搜索

<div2 id="EALL-index"> 
<head>EALL3 Index</head> 
<art><simplearticle id="SIM-INDEX" entry="" volume="" page=""> 
<pseudoarticle> 
<articleentry><mainentry>Index of Terms</mainentry></articleentry></pseudoarticle><list> 
<item>Abbreviation <link type="eall" targets="SIM-0002" n="idx-abbreviation-01">Abbreviations</link>, <link type="eall" targets="SIM-0021" n="idx-abbreviation-02">Compounds</link>, <link type="eall" targets="COM-vol3-0192" n="idx-abbreviation-03">Lexicography: Monolingual Dictionaries</link>, <link type="eall" targets="COM-vol3-0276" n="idx-abbreviation-04">Punctuation</link></item> 
</list></simplearticle></art></div2> 

源XML:

<?xml version="1.0" encoding="UTF-8"?> 
<art> 
<simplearticle id="SIM-0002" entry="Abbreviations" volume="1" page="1:1a" sortcode="abbreviations"> 
<pseudoarticle> 
<articleentry> 
<mainentry>Abbreviations</mainentry> 
</articleentry> 
<p>Generally speaking, there are four main categories of abbreviations encountered in Arabic texts:</p> 
<p> 
<list> 
<label>i.</label> 
<item>i. Suspensions: abbreviation by truncation of the letters at the end of the word, e.g.&#160;&#1575;&#1604;&#1605;&#1589;&#1600; = <hi rend="italic">al-mu&#7779;annif</hi>. Perhaps the most interesting here is the case of suspensions that look like, or were considered by some to be, numerals. To this category belong the signs that resemble the numerals &#1778; and &#1635;, but which may represent the unpointed <hi rend="italic">t&#257;&#702;</hi> and <hi rend="italic">&#353;&#299;n</hi> (for <hi rend="italic">tam&#257;m</hi> and <hi rend="italic">&#353;ar&#7717;</hi>) when used in conjunction with marginal glosses.</item> 
<label>ii.</label> 
<item>ii. Contractions: abbreviating by means of omitting some letters in the middle of the word, but not the beginning or the ending, e.g. &#1602;&#1607; (<hi rend="italic">qawlu-hu</hi>).</item> 
<label>iii.</label> 
<item>iii. Sigla: using one letter to represent the whole word, e.g. &#1605; (<hi rend="italic">matn</hi>).</item> 
<label>iv.</label> 
<item>iv. Abbreviation symbols: symbols in the form of logographs used for whole words. A typical abbreviation symbol is the horizontal stroke (sometimes hooked at the end) which represents the word <hi rend="italic">sana</hi> &#8216;year&#8217;. Another example is the &#8216;two teeth stroke&#8217; (which looks like two unpointed <hi rend="italic">b&#257;</hi>'s), which represents the word &#1602;&#1601; &#8216;stop&#8217;, or the suspension &#1601;&#1578;&#1600; (for <hi rend="italic">fa-ta&#702;ammal-hu/h&#257;</hi> &#8216;reflect on it&#8217;), used in manuscripts for notabilia or side-heads.</item> 
</list> 
</p> 
<p>Closely connected with these abbreviations is a contraction of a group of words into one &#8216;portmanteau&#8217; word (<hi rend="italic">na&#7717;t</hi>; <xref target="SIM-0021">compounds</xref>), for instance <hi rend="italic">basmala</hi> (<hi rend="italic">bi-sm All&#257;h</hi>) <hi rend="italic">&#7717;amdala</hi> (<hi rend="italic">al-&#7717;amdu li-ll&#257;h</hi>) and <hi rend="italic">&#7779;alwala</hi> (<hi rend="italic">&#7779;all&#257; ll&#257;h &#703;alayhi</hi>). To all intents and purposes, the word <hi rend="italic">na&#7717;t</hi> corresponds to an acronym, i.e. a word formed from the abbreviation of, in most cases, the initial letters of each word in the construct. Most of these constructs are textual and pious formulae. Apart from <hi rend="italic">basmala, &#7717;amdala</hi>, and <hi rend="italic">&#7779;alwala</hi>, we encounter <hi rend="italic">&#7789;albaqa</hi> (<hi rend="italic">&#7789;&#257;la ll&#257;h baq&#257;&#702;a-hu</hi>), <hi rend="italic">&#7717;awqala</hi> or <hi rend="italic">&#7717;awlaqa</hi> (<hi rend="italic">l&#257; &#7717;awla wa-l&#257; quwwata &#702;ill&#257; bi-ll&#257;h</hi>), <hi rend="italic">&#7779;al&#703;ama</hi> (a synonym of <hi rend="italic">&#7779;alwala</hi>), <hi rend="italic">&#7717;asbala</hi> (<hi rend="italic">&#7717;asbun&#257; all&#257;h</hi>), <hi rend="italic">ma&#353;&#702;ala</hi> (<hi rend="italic">m&#257; &#353;&#257;&#702;a ll&#257;h</hi>), <hi rend="italic">sab&#7717;ala</hi> (<hi rend="italic">sub&#7717;&#257;na ll&#257;h</hi>), and <hi rend="italic">&#7717;ay&#703;ala</hi> (<hi rend="italic">&#7717;ayya &#703;al&#257; &#7779;-&#7779;al&#257;t</hi>) (as-Samarr&#257;'&#299; 1987; Gacek 2001).</p> 
<p>Abbreviations, especially contractions and sigla, may be (and often are) accompanied by a horizontal stroke (<hi rend="italic">tilde</hi>) placed above them. This mark may resemble the <hi rend="italic">madda</hi> but has nothing to do with the latter's function in Arabic script. Suspensions, on the other hand, were indicated by a long downward stroke, a mark that is very likely to have been borrowed from Greek and Latin paleographic practice.</p> 
<p>The use of abbreviations was quite popular among Muslim scholars, although originally some of the abbreviations, such as those relating to the prayer for the Prophet (<hi rend="italic">ta&#7779;liya, &#7779;alwala</hi>), were disapproved of. In the manuscript age, abbreviations were extensively used, not only in the body of the text but also in marginalia, ownership statements, and in the primitive critical apparatus (Ben Cheneb 1920; Ma&#7717;f&#363;&#7827; 1964).</p> 
<p>Medieval scholars could not always agree on the meaning of some of the abbreviations used in manuscripts. The letter &#1581;, for instance, which is used to separate one <hi rend="italic">&#702;isn&#257;d</hi> from another, was thought by some to stand for <hi rend="italic">&#7717;&#257;&#702;il</hi> or <hi rend="italic">&#7717;ayl&#363;la</hi> &#8216;separation&#8217; and by others for <hi rend="italic">&#7717;ad&#299;&#7791;</hi> and even <hi rend="italic">&#7779;a&#7717;&#7717;a</hi>. Some scholars even thought that the letter <hi rend="italic">&#7717;&#257;&#702;</hi> should be pointed (&#1582; &#8211; <hi rend="italic">x&#257;&#702; mu&#703;jama</hi>) to stand for <hi rend="italic">&#702;isn&#257;d &#702;&#257;xar</hi> &#8216;another <hi rend="italic">&#702;isn&#257;d</hi>&#8217;. The contemporary scholar may face a similar dilemma (see e.g. Ali&#269; 1976).</p> 
<p>Abbreviations in manuscripts are often unpointed and appear sometimes in the form of word-symbols (logographs). Here, the context, whether textual or geographical, is of great importance. Thus, for instance, what appears to be the letter &#1591; may in fact be a &#1592;, and what appears to be an <hi rend="italic">&#703;ayn</hi> or <hi rend="italic">ayn</hi>, in its initial (&#1593;&#1600;) or isolated form (&#1593;), may actually be an unpointed <hi rend="italic">n&#363;n</hi> or <hi rend="italic">x&#257;&#702;</hi> (for <hi rend="italic">nusxa &#702;uxr&#257;</hi> &#8216;another copy&#8217;). Similarly, the same word or abbreviation can have two different functions and/or meanings. For example, the words <hi rend="italic">&#7717;&#257;&#353;iya</hi> and <hi rend="italic">f&#257;&#702;ida</hi> can stand for a gloss or a side-head (&#8216;nota bene&#8217;), while the &#1589; or &#1589;&#1600; can be an abbreviation of <hi rend="italic">&#7779;a&#7717;&#7717;a</hi> (when used for an omission/ insertion or evident correction) or <hi rend="italic">&#702;a&#7779;l</hi> (the body of the text), or it can stand for <hi rend="italic">&#7693;abba</hi> &#8216;door-bolt&#8217;, a mark indicating an uncertain reading and having, for all intents and purposes, the function of a question mark or &#8216;sic&#8217;. Also, the abbreviation &#1606; may stand for <hi rend="italic">bay&#257;n</hi> &#8216;explanation&#8217; or <hi rend="italic">nusxa &#702;uxr&#257;</hi>; the latter is often found in manuscripts of Persian/ Indian provenance.</p> 
<p>The earliest use of abbreviations in the Arabic language is probably connected with its orthography and possibly the &#8216;mysterious letters&#8217; (<hi rend="italic">al-&#7717;ur&#363;f al-muqa&#7789;&#7789;a&#703;a</hi>) at the beginning of some chapters of the <hi rend="italic">Qur&#702;&#257;n</hi> (Bellamy 1973). In terms of orthography, for instance, the initial form of <hi rend="italic">j&#299;m</hi> (&#1580;&#1600;) or <hi rend="italic">m&#299;m</hi> (&#1605;&#1600;) was regarded by some scholars as an abbreviation of <hi rend="italic">jazma</hi>. Furthermore, the unpointed initial form of <hi rend="italic">&#353;&#299;n</hi> (&#1587;&#1600;) was used for <hi rend="italic">ta&#353;d&#299;d</hi> (or <hi rend="italic">&#353;adda</hi>), and the initial form of <hi rend="italic">&#7779;&#257;d</hi> (&#1589;&#1600;) was thought to represent <hi rend="italic">wa&#7779;la</hi> (or <hi rend="italic">&#7779;ila</hi>) (Wright 1967:13&#8211;14, 19; Gacek 2001:23).</p> 
<p>Most of the abbreviations are found in the body of the text. They were introduced in order to speed up the process of transcription and their usage varied according to the subject or type of a given work. Abbreviations can be found in almost all types of works, but especially in compositions on the recitation of the <hi rend="italic">Qur&#702;&#257;n</hi>, compilation and criticism of <hi rend="italic">&#7716;ad&#299;&#7791;</hi>, philosophy, lexicography, poetry, genealogy, biography, and astronomy. The lists of these are often included in prefaces and frequently concern either the names of authors or titles of compositions. In addition, we find didactic poems that were composed specifically in order to help memorize given sets of abbreviations (see, e.g., &#703;Alaw&#257;n 1972). They are especially common in works on <hi rend="italic">&#7716;ad&#299;&#7791;</hi> and jurisprudence (both Sunni and Shi&#703;i) (al-M&#257;maq&#257;n&#299; 1992; a&#7827;-Zufayr&#299; 2002), and although some abbreviations were standardized, most were specific to a given work. Among the commonly used abbreviations for major <hi rend="italic">&#7716;ad&#299;&#7789;</hi> compilations are: &#1582; (al-Bux&#257;r&#299;), &#1605; (Muslim or M&#257;lik), &#1583; (&#702;Ab&#363; D&#257;&#702;&#363;d), &#1578; (at-Tirmi&#7695;&#299;), &#1603; (M&#257;lik), &#1607; (&#702;Ab&#363; &#7694;arr or Ibn M&#257;ja), &#1606; or &#1587; (an-Nas&#257;&#702;&#299;), and the like (Gacek 1989:56).</p> 
</pseudoarticle> 
</simplearticle> 
</art> 

輸出示例:

<art> 
<simplearticle id="SIM-0002" entry="Abbreviations" volume="1" sortcode="abbreviations" page="1:1a"> 
    <pseudoarticle> 
     <articleentry> 
      <mainentry><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations</mainentry> 
     </articleentry> 
     <p>Generally speaking, there are four main categories of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations encountered in Arabic texts:</p> 
     <p> 
      <list type="simple" TEIform="list"> 
       <label TEIform="label">i.</label> 
       <item TEIform="item">i. Suspensions: <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation by truncation of the letters at the end of the word, e.g. المصـ = <hi rend="italic">al-muṣannif</hi>. Perhaps the most interesting here is the case of suspensions that look like, or were considered by some to be, numerals. To this category belong the signs that resemble the numerals ۲ and ٣, but which may represent the unpointed <hi rend="italic">tāʾ</hi> and <hi rend="italic">šīn</hi> (for <hi rend="italic">tamām</hi> and <hi rend="italic">šarḥ</hi>) when used in conjunction with marginal glosses.</item> 
       <label TEIform="label">ii.</label> 
       <item TEIform="item">ii. Contractions: abbreviating by means of omitting some letters in the middle of the word, but not the beginning or the ending, e.g. قه (<hi rend="italic">qawlu-hu</hi>).</item> 
       <label TEIform="label">iii.</label> 
       <item TEIform="item">iii. Sigla: using one letter to represent the whole word, e.g. م (<hi rend="italic">matn</hi>).</item> 
       <label TEIform="label">iv.</label> 
       <item TEIform="item">iv. Abbreviation symbols: symbols in the form of logographs used for whole words. A typical <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation symbol is the horizontal stroke (sometimes hooked at the end) which represents the word <hi rend="italic">sana</hi> ‘year’. Another example is the ‘two teeth stroke’ (which looks like two unpointed <hi rend="italic">bā</hi>'s), which represents the word قف ‘stop’, or the suspension فتـ (for <hi rend="italic">fa-taʾammal-hu/hā</hi> ‘reflect on it’), used in manuscripts for notabilia or side-heads.</item> 
      </list> 
     </p> 
     <p>Closely connected with these <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations is a contraction of a group of words into one ‘portmanteau’ word (<hi rend="italic">naḥt</hi>; <xref target="SIM-0021">compounds</xref>), for instance <hi rend="italic">basmala</hi> (<hi rend="italic">bi-sm Allāh</hi>) <hi rend="italic">ḥamdala</hi> (<hi rend="italic">al-ḥamdu li-llāh</hi>) and <hi rend="italic">ṣalwala</hi> (<hi rend="italic">ṣallā llāh ʿalayhi</hi>). To all intents and purposes, the word <hi rend="italic">naḥt</hi> corresponds to an <anchor target="SIM-0002" id="idx-acronym-01"/>acronym, i.e. a word formed from the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of, in most cases, the initial letters of each word in the construct. Most of these constructs are textual and pious formulae. Apart from <hi rend="italic">basmala, ḥamdala</hi>, and <hi rend="italic">ṣalwala</hi>, we encounter <hi rend="italic">ṭalbaqa</hi> (<hi rend="italic">ṭāla llāh baqāʾa-hu</hi>), <hi rend="italic">ḥawqala</hi> or <hi rend="italic">ḥawlaqa</hi> (<hi rend="italic">lā ḥawla wa-lā quwwata ʾillā bi-llāh</hi>), <hi rend="italic">ṣalʿama</hi> (a synonym of <hi rend="italic">ṣalwala</hi>), <hi rend="italic">ḥasbala</hi> (<hi rend="italic">ḥasbunā allāh</hi>), <hi rend="italic">mašʾala</hi> (<hi rend="italic">mā šāʾa llāh</hi>), <hi rend="italic">sabḥala</hi> (<hi rend="italic">subḥāna llāh</hi>), and <hi rend="italic">ḥayʿala</hi> (<hi rend="italic">ḥayya ʿalā ṣ-ṣalāt</hi>) (as-Samarrā'ī 1987; Gacek 2001).</p> 
     <p><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations, especially contractions and sigla, may be (and often are) accompanied by a horizontal stroke (<hi rend="italic">tilde</hi>) placed above them. This mark may resemble the <hi rend="italic">madda</hi> but has nothing to do with the latter's function in Arabic script. Suspensions, on the other hand, were indicated by a long downward stroke, a mark that is very likely to have been borrowed from Greek and Latin paleographic practice.</p> 
     <p>The use of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations was quite popular among Muslim scholars, although originally some of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations, such as those relating to the prayer for the Prophet (<hi rend="italic">taṣliya, ṣalwala</hi>), were disapproved of. In the manuscript age, <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were extensively used, not only in the body of the text but also in marginalia, ownership statements, and in the primitive critical apparatus (Ben Cheneb 1920; Maḥfūẓ 1964).</p> 
     <p>Medieval scholars could not always agree on the meaning of some of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations used in manuscripts. The letter ح, for instance, which is used to separate one <hi rend="italic">ʾisnād</hi> from another, was thought by some to stand for <hi rend="italic">ḥāʾil</hi> or <hi rend="italic">ḥaylūla</hi> ‘separation’ and by others for <hi rend="italic">ḥadīṯ</hi> and even <hi rend="italic">ṣaḥḥa</hi>. Some scholars even thought that the letter <hi rend="italic">ḥāʾ</hi> should be pointed (خ – <hi rend="italic">xāʾ muʿjama</hi>) to stand for <hi rend="italic">ʾisnād ʾāxar</hi> ‘another <hi rend="italic">ʾisnād</hi>’. The contemporary scholar may face a similar dilemma (see e.g. Alič 1976).</p> 
     <p><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations in manuscripts are often unpointed and appear sometimes in the form of word-symbols (logographs). Here, the context, whether textual or geographical, is of great importance. Thus, for instance, what appears to be the letter ط may in fact be a ظ, and what appears to be an <hi rend="italic">ʿayn</hi> or <hi rend="italic">ayn</hi>, in its initial (عـ) or isolated form (ع), may actually be an unpointed <hi rend="italic">nūn</hi> or <hi rend="italic">xāʾ</hi> (for <hi rend="italic">nusxa ʾuxrā</hi> ‘another copy’). Similarly, the same word or <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation can have two different functions and/or meanings. For example, the words <hi rend="italic">ḥāšiya</hi> and <hi rend="italic">fāʾida</hi> can stand for a gloss or a side-head (‘nota bene’), while the ص or صـ can be an <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of <hi rend="italic">ṣaḥḥa</hi> (when used for an omission/ insertion or evident correction) or <hi rend="italic">ʾaṣl</hi> (the body of the text), or it can stand for <hi rend="italic">ḍabba</hi> ‘door-bolt’, a mark indicating an uncertain reading and having, for all intents and purposes, the function of a question mark or ‘sic’. Also, the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation ن may stand for <hi rend="italic">bayān</hi> ‘explanation’ or <hi rend="italic">nusxa ʾuxrā</hi>; the latter is often found in manuscripts of Persian/ Indian provenance.</p> 
     <p>The earliest use of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations in the Arabic language is probably connected with its orthography and possibly the ‘mysterious letters’ (<hi rend="italic">al-ḥurūf al-muqaṭṭaʿa</hi>) at the beginning of some chapters of the <hi rend="italic">Qurʾān</hi> (Bellamy 1973). In terms of orthography, for instance, the initial form of <hi rend="italic">jīm</hi> (جـ) or <hi rend="italic">mīm</hi> (مـ) was regarded by some scholars as an <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of <hi rend="italic">jazma</hi>. Furthermore, the unpointed initial form of <hi rend="italic">šīn</hi> (سـ) was used for <hi rend="italic">tašdīd</hi> (or <hi rend="italic">šadda</hi>), and the initial form of <hi rend="italic">ṣād</hi> (صـ) was thought to represent <hi rend="italic">waṣla</hi> (or <hi rend="italic">ṣila</hi>) (Wright 1967:13–14, 19; Gacek 2001:23).</p> 
     <p>Most of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations are found in the body of the text. They were introduced in order to speed up the process of transcription and their usage varied according to the subject or type of a given work. Abbreviations can be found in almost all types of works, but especially in compositions on the recitation of the <hi rend="italic">Qurʾān</hi>, compilation and criticism of <hi rend="italic">Ḥadīṯ</hi>, philosophy, lexicography, poetry, genealogy, biography, and astronomy. The lists of these are often included in prefaces and frequently concern either the names of authors or titles of compositions. In addition, we find didactic poems that were composed specifically in order to help memorize given sets of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations (see, e.g., ʿAlawān 1972). They are especially common in works on <hi rend="italic">Ḥadīṯ</hi> and jurisprudence (both Sunni and Shiʿi) (al-Māmaqānī 1992; aẓ-Zufayrī 2002), and although some <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were standardized, most were specific to a given work. Among the commonly used <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations for major <hi rend="italic">Ḥadīṭ</hi> compilations are: خ (al-Buxārī), م (Muslim or Mālik), د (ʾAbū Dāʾūd), ت (at-Tirmiḏī), ك (Mālik), ه (ʾAbū Ḏarr or Ibn Māja), ن or س (an-Nasāʾī), and the like (Gacek 1989:56).</p> 
     <p>Specific to <hi rend="italic">Ḥadīṯ</hi> literature are other <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations connected with the frequent repetitions of such expressions as <hi rend="italic">ḥaddaṯ anā, ʾaxbaranā</hi>, and <hi rend="italic">ʾanbaʾanā</hi>, which were commonly abbreviated as: دثنا ,ثنا ,نا (<hi rend="italic">ḥaddaṯ anā</hi>); ابنا ,ارنا ,انا (<hi rend="italic">ʾaxbaranā</hi>); ق ثنا ,قثنا (<hi rend="italic">qāla ḥaddaṯ anā</hi>). The transition from one <hi rend="italic">ʾisnād</hi> to another, as mentioned above, was marked with ح (<hi rend="italic">ḥāʾil, taḥwīl, ḥaylula, ḥadīṯ</hi> or <hi rend="italic">ṣaḥḥa</hi>) (Gacek 1989:56), and for the evaluation of <hi rend="italic">ḥadīṯ</hi>s the following <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were used:ض (<hi rend="italic">ḍaʿīf</hi>), صح (<hi rend="italic">ṣaḥīḥ</hi>), ح (<hi rend="italic">ḥasan</hi>); م (<hi rend="italic">majhūl</hi>), مو (<hi rend="italic">muwāfiq</hi> or <hi rend="italic">mawqūf</hi>), قف (<hi rend="italic">mawqūf</hi>); ق (<hi rend="italic">muwaṭṭaq</hi> or <hi rend="italic">muttafaq ʿalayhi</hi>), ل (<hi rend="italic">mursal</hi>) (e.g. Gacek 1985:xiv, 96).</p> 
    </pseudoarticle> 
</simplearticle> 

+0

它不能傷害,以顯示你試圖實現(供應輸入和預期產出)。我有這樣的感覺,它可以用更簡單直觀的方式完成。 – Maestro13 2012-03-09 07:08:24

+0

根據您的要求,我用樣本XML和樣本輸出對我的帖子進行了編輯。希望它現在讓我清楚。再次感謝。 – rhemb 2012-03-09 08:08:51

+0

這裏有更多的代碼來說明問題,難道你不能更簡潔地解釋它嗎?在諸如replace()和matches()之類的XPath函數中,有一個「flags」參數,您可以將其設置爲「i」以進行與個案無關的匹配。你不能在正則表達式本身中包含標誌。 – 2012-03-09 08:29:03

回答

0

我的其他答案是不使用替換函數 - 這會導致不完整的結果,因爲「縮寫」的出現不會被替換。 所以我嘗試了另一種方法,包括利用邁克爾建議的「我」標誌;不幸的是,這更復雜,因爲它需要序列化到錨元素的字符串。見下文。
注意:此解決方案也有一個小缺陷:案例不會保留爲像「縮寫」這樣的事件。 此外,略有改善的連載模板將導致< anchor .../>,而不是< anchor ...>< /anchor>

EDITED下面對其中兩個問題都已經解決了一個版本。

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:f="data:f" 
     xmlns:xs="http://www.w3.org/2001/XMLSchema" 
     exclude-result-prefixes="f xs"> 

    <xsl:output method="xml" indent="yes"/> 

    <xsl:variable name="searchString" select="//simplearticle/@sortcode"/> 
    <xsl:variable name="anchor"> 
     <xsl:element name="anchor"> 
      <xsl:attribute name="target" select="//simplearticle/@id"/> 
      <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/> 
     </xsl:element> 
    </xsl:variable> 

    <xsl:template match="/"> 
     <xsl:apply-templates mode="main"/> 
    </xsl:template> 

    <xsl:template match="@*|node()" mode="main"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()" mode="main"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="text()" mode="main"> 
     <!--<xsl:variable name="replacement" select="concat(f:serialize($anchor), $searchString)"/>--> 
     <xsl:variable name="anchorAdd" select="f:serialize($anchor)"/> 
     <xsl:variable name="replacement"> 
      <xsl:value-of select="$anchorAdd"/> 
      <xsl:value-of select="$searchString"/> 
     </xsl:variable> 
     <xsl:value-of select="replace(., $searchString, $replacement, 'i')" disable-output-escaping="yes"/> 
    </xsl:template> 

    <xsl:template match="*" mode="serialize"> 
     <xsl:text>&lt;</xsl:text> 
     <xsl:value-of select="name()"/> 
     <xsl:apply-templates mode="serialize" select="@*"/> 
     <xsl:text>&gt;</xsl:text> 
     <xsl:apply-templates mode="serialize"/> 
     <xsl:text>&lt;/</xsl:text><xsl:value-of select="name()"/><xsl:text>&gt;</xsl:text> 
    </xsl:template> 

    <xsl:template match="text()" mode="serialize"> 
     <xsl:value-of select="."/> 
    </xsl:template> 

    <xsl:template match="@*" mode="serialize"> 
     <xsl:text> </xsl:text><xsl:value-of select="name()"/>="<xsl:value-of select="."/>" 
    </xsl:template> 

    <xsl:function name="f:serialize"> 
     <xsl:param name="xml"/> 
     <xsl:apply-templates mode="serialize" select="$xml"/> 
    </xsl:function> 
</xsl:stylesheet> 

EDITED

說明:
(1)利用反向引用$ 0爲字符串被替換
(2)仍然不得不使用替換變量雖然這似乎有些奇怪;原因是使用$ anchorAdd時的錯誤消息直接
(3)序列:測試上(文本)節點的非空 - 在這種情況下/>而不是應用模板

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:f="data:f" 
     exclude-result-prefixes="f"> 

    <xsl:output method="xml" indent="yes"/> 

    <xsl:variable name="searchString" select="lower-case(//simplearticle/@sortcode)"/> 
    <xsl:variable name="anchor"> 
     <xsl:element name="anchor"> 
      <xsl:attribute name="target" select="//simplearticle/@id"/> 
      <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/> 
     </xsl:element> 
    </xsl:variable> 

    <xsl:template match="/"> 
     <xsl:apply-templates mode="main"/> 
    </xsl:template> 

    <xsl:template match="@*|node()" mode="main"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()" mode="main"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="text()" mode="main"> 
     <xsl:variable name="anchorAdd" select="f:serialize($anchor)"/> 
     <xsl:variable name="replacement"> 
      <xsl:value-of select="$anchorAdd"/> 
     </xsl:variable> 
     <xsl:value-of select="replace(., $searchString, concat($replacement, '$0'), 'i')" disable-output-escaping="yes"/> 
    </xsl:template> 

    <xsl:template match="*" mode="serialize"> 
     <xsl:text>&lt;</xsl:text> 
     <xsl:value-of select="name()"/> 
     <xsl:apply-templates mode="serialize" select="@*"/> 
     <xsl:choose> 
      <xsl:when test="not (string(.))"> 
       <xsl:text>/&gt;</xsl:text> 
      </xsl:when> 
      <xsl:otherwise> 
       <xsl:text>&gt;</xsl:text> 
       <xsl:apply-templates mode="serialize"/> 
       <xsl:text>&lt;/</xsl:text><xsl:value-of select="name()"/><xsl:text>&gt;</xsl:text> 
      </xsl:otherwise> 
     </xsl:choose> 
    </xsl:template> 

    <xsl:template match="text()" mode="serialize"> 
     <xsl:value-of select="."/> 
    </xsl:template> 

    <xsl:template match="@*" mode="serialize"> 
     <xsl:text> </xsl:text><xsl:value-of select="name()"/>="<xsl:value-of select="."/>"<xsl:text/> 
    </xsl:template> 

    <xsl:function name="f:serialize"> 
     <xsl:param name="xml"/> 
     <xsl:apply-templates mode="serialize" select="$xml"/> 
    </xsl:function> 
</xsl:stylesheet> 
0

除了詳細信息(比如不同的項目列表和選擇從何處構建錨點以及在循環多個藝術元素時加上更改),下面應該讓您再次參與(不需要正則表達式,只需要使用適當的功能)。我還沒有取代文中縮寫的出現。或者是代表你的錯誤?還應該在「縮寫」旁邊處理「縮寫」嗎?那麼怎麼做呢?比如拿searchString的「s」來處理那個單詞呢?

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:f="data:f" 
     exclude-result-prefixes="f"> 

    <xsl:output method="xml" indent="yes"/> 

    <xsl:variable name="searchString" select="//simplearticle/@sortcode"/> 
    <xsl:variable name="anchor"> 
     <xsl:element name="anchor"> 
      <xsl:attribute name="target" select="//simplearticle/@id"/> 
      <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/> 
     </xsl:element> 
    </xsl:variable> 

    <xsl:template match="/"> 
     <xsl:apply-templates/> 
    </xsl:template> 

    <xsl:template match="@*|node()"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="text()"> 
     <xsl:call-template name="checkWord"> 
      <xsl:with-param name="splitText" select="tokenize(., ' ')"/> 
     </xsl:call-template> 
    </xsl:template> 

    <xsl:template name="checkWord"> 
     <xsl:param name="splitText"/> 
     <xsl:for-each select="$splitText"> 
      <xsl:if test="lower-case(.) = $searchString"> 
       <xsl:copy-of select="$anchor"/> 
      </xsl:if> 
      <xsl:value-of select="."/><xsl:text> </xsl:text> 
     </xsl:for-each> 
    </xsl:template> 

</xsl:stylesheet> 
+0

謝謝你幫助我解決這個問題...但是,如果我讓你感到困惑... – rhemb 2012-03-09 09:15:33

+0

@ElmerBanate感謝我接受我的答案 - 但只有當你喜歡它時:-) – Maestro13 2012-03-09 09:31:44

0

最後,我能夠解決我的問題,所以我在這裏發佈工作XSLT。

我第一次添加的 'i' 到我的比賽功能:

test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'),'i'); 

的for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')]/link[@target=$id]

if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'),'i') 

then add regex to match the lowercase word(s) by adding lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')) in to the analyze-string. 

analyze-string select="$srchString" regex="{concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}" 

使用此正則表達式我能得到較低的情況下,帶有首字母大寫的「縮寫」和「縮寫」。

(^|\W)(abbreviation[s]?|acronym[s]?|Abbreviation[s]?|acronym)($|\W) 

特別感謝邁克爾·凱Maestro13

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" xmlns:saxon="http://saxon.sf.net/" 
    xmlns:ati="http://www.asiatype.com/Functions" exclude-result-prefixes="#all"> 

    <xsl:output method="xml" encoding="UTF-8" indent="no"/> 

    <xsl:template match="@*|node()"> 
     <xsl:copy> 
      <xsl:apply-templates select="@*|node()"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:variable name="Entries"> 
     <xsl:for-each select="for $doc in collection(iri-to-uri(concat('original-eall-xml/', '?select=*.xml'))) return $doc/descendant::art/*"> 
      <list> 
       <entry> 
        <xsl:value-of select=".//mainentry"/> 
       </entry> 
       <id> 
        <xsl:value-of select="./@id"/> 
       </id> 
       <page> 
        <xsl:value-of select="./@page"/> 
       </page> 
      </list> 
     </xsl:for-each> 
    </xsl:variable> 

    <xsl:template match="art/*"> 
     <xsl:variable name="myId"> 
      <xsl:value-of select="tokenize(@id,' ')[1]"/> 
     </xsl:variable> 
     <xsl:variable name="myStrings" select="//text()"/> 
     <xsl:variable name="myEntry" select="descendant::mainentry/text()"/> 
     <xsl:copy> 
      <xsl:copy-of select="@* except @page"/> 
      <xsl:choose> 
       <xsl:when test="$Entries//entry[.=$myEntry] and $Entries//entry[.=$myEntry]/following-sibling::id[.=$myId]"> 
        <xsl:attribute name="page"><xsl:value-of select="$Entries//entry[.=$myEntry]/following-sibling::page"/></xsl:attribute> 
       </xsl:when> 
       <xsl:otherwise> 
        <xsl:attribute name="page">0</xsl:attribute> 
        <!--<xsl:message select="concat('mainentry: &quot;', $myEntry, '&quot; - volume: &quot;', @volume, '&quot; - id: &quot;',$myId, '&quot;')"/>--> 
       </xsl:otherwise> 
      </xsl:choose> 

      <xsl:apply-templates/> 
     </xsl:copy> 

    </xsl:template>  

    <!-- create variable for EALL_Index--> 
    <xsl:variable name="index" as="element() +"> 
     <xsl:for-each select="for $doc in document('EALL_Index_test.xml') return $doc/descendant::item"> 
      <xsl:sort select="string-length(normalize-space(text()[1]))" order="descending"/> 
      <list> 
       <entry> 
        <xsl:variable name="itemString"> 
         <xsl:analyze-string select="text()[1]" regex="(\[)|(\])|(\()|(\))"> 
          <xsl:matching-substring> 
           <xsl:value-of select="replace(regex-group(1),'\[','\\[')"/> 
           <xsl:value-of select="replace(regex-group(2),'\]','\\]')"/> 
           <xsl:value-of select="replace(regex-group(3),'\(','\\(')"/> 
           <xsl:value-of select="replace(regex-group(4),'\)','\\)')"/> 
          </xsl:matching-substring> 
          <xsl:non-matching-substring> 
           <xsl:value-of select="."/> 
          </xsl:non-matching-substring> 
         </xsl:analyze-string> 
        </xsl:variable> 
        <xsl:value-of select="normalize-space($itemString)"/> 
       </entry> 
       <xsl:for-each select="link"> 
        <link target="{@targets}" id="{@n}"> 
        </link> 
       </xsl:for-each> 
      </list> 
     </xsl:for-each> 
    </xsl:variable> 

    <xsl:template match="text()[ancestor::*[self::simplearticle or self::complexarticle]]"> 
     <xsl:variable name="id" as="xs:string" select="current()/ancestor::*[self::simplearticle or self::complexarticle]/@id"/> 
     <xsl:variable name="srchString" as="xs:string" select="."/> 
     <xsl:choose> 
      <xsl:when test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'),'i')"> 
       <xsl:message select="concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')"/> 
       <xsl:analyze-string select="$srchString" regex="{concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}"> 
        <xsl:matching-substring> 
         <xsl:value-of select="regex-group(1)"/> 
         <xsl:for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')]/link[@target=$id]"> 
          <xsl:if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')"> 
           <anchor target="{@target}" id="{@id}"/> 
          </xsl:if> 
         </xsl:for-each> 
         <xsl:value-of select="regex-group(2)"/> 
         <xsl:value-of select="regex-group(3)"/> 
        </xsl:matching-substring> 
       <xsl:non-matching-substring> 
        <xsl:value-of select="."/> 
       </xsl:non-matching-substring> 
      </xsl:analyze-string> 
     </xsl:when> 
     <xsl:otherwise> 
      <xsl:sequence select="."/> 
     </xsl:otherwise> 
     </xsl:choose> 
    </xsl:template> 
</xsl:stylesheet>