2014-01-07 88 views
2

有什麼辦法可以讓這段代碼更有效嗎?我不是在找人寫我對我的代碼,就點我在正確的方向...PHP中的高效字符串替換

$string = preg_replace('/<ref[^>]*>([\s\S]*?)<\/ref[^>]*>/', '', $string); 
    $string = preg_replace('/{{(.*?)\}}/s', '', $string); 
    $string = preg_replace('/File:(.*?)\\n/s', '', $string); 
    $string = preg_replace('/==(.*?)\=\\n/s', '', $string);   
    $string = str_replace('|', '/', $string); 
    $string = str_replace('[[', '', $string); 
    $string = str_replace(']]', '', $string); 
    $string = strip_tags($string); 

美中不足的,但是,是替換之後順序發生...

樣品輸入文本:

===API sharing and reuse via virtual machine=== 
{{Expand section|date=December 2013}} 

Some languages like those running in a [[virtual machine]] (e.g. [[List of CLI languages|.NET CLI compliant languages]] in the [[Common Language Runtime]] (CLR), and [[List of JVM languages|JVM compliant languages]] in the [[Java Virtual Machine]]) can share an API. In this case, a virtual machine enables [[language interoperability]], by abstracting a programming language using an intermediate [[bytecode]] and its [[language binding]]s.==Web APIs== 
{{Main|Web API}} 
When used in the context of [[web development]], an API is typically defined as a set of [[Hypertext Transfer Protocol]] (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language ([[XML]]) or JavaScript Object Notation ([[JSON]]) format. While "web API" historically has been virtually synonymous for [[web service]], the recent trend (so-called [[Web 2.0]]) has been moving away from Simple Object Access Protocol ([[SOAP]]) based web services and [[service-oriented architecture]] (SOA) towards more direct [[representational state transfer]] (REST) style [[web resource]]s and [[resource-oriented architecture]] (ROA).<ref> 
{{cite web 
|first  = Djamal 
|last  = Benslimane 
|coauthors = Schahram Dustdar, and Amit Sheth 
|title  = Services Mashups: The New Generation of Web Applications 
|url   = http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/2008/09&file=w5gei.xml&xsl=article.xsl 
|work  = IEEE Internet Computing, vol. 12, no. 5 
|publisher = Institute of Electrical and Electronics Engineers 
|pages  = 13â€「15 
|year  = 2008 
}} 
</ref> Part of this trend is related to the [[Semantic Web]] movement toward [[Resource Description Framework]] (RDF), a concept to promote web-based [[ontology engineering]] technologies. Web APIs allow the combination of multiple APIs into new applications known as [[mashup (web application hybrid)|mashup]]s.<ref> 
{{citation 
|first  = James 
|last  = Niccolai 
|title  = So What Is an Enterprise Mashup, Anyway? 
|url   = http://www.pcworld.com/businesscenter/article/145039/so_what_is_an_enterprise_mashup_anyway.html 
|work  = [[PC World (magazine)|PC World]] 
|date  = 2008-04-23 
}}</ref> 

樣本輸出(與當前的腳本):

Some languages like those running in a virtual machine (e.g. List of CLI languages/.NET CLI compliant languages in the Common Language Runtime (CLR), and List of JVM languages/JVM compliant languages in the Java Virtual Machine) can share an API. In this case, a virtual machine enables language interoperability, by abstracting a programming language using an intermediate bytecode and its language bindings. 
When used in the context of web development, an API is typically defined as a set of Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. While "web API" historically has been virtually synonymous for web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access Protocol (SOAP) based web services and service-oriented architecture (SOA) towards more direct representational state transfer (REST) style web resources and resource-oriented architecture (ROA). Part of this trend is related to the Semantic Web movement toward Resource Description Framework (RDF), a concept to promote web-based ontology engineering technologies. Web APIs allow the combination of multiple APIs into new applications known as mashup (web application hybrid)/mashups. 
+3

也許一個輸入數據樣本和一個輸出樣本會有幫助,所以我們可能會提出一個完整的不同方法......?! – deceze

+0

當然可以!現在編輯帖子... –

+2

您可以將所有這些模式寫入數組,並使用一個preg_replace調用來簡化代碼。與str_replace一樣。除此之外,你有什麼特別的問題讓它更「高效」。是什麼讓你認爲這不是最「高效」的解決方案?你做過任何基準測試嗎? –

回答

2

由於你只是從你的字符串中刪除東西(即你總是有相同的替換模式),你可以把所有的都放在一個preg_replace中。用這種方法你只能解析字符串一次。

您可以通過避免惰性量詞和刪除無用的捕獲組來優化子模式。

例如:

$str = preg_replace('~{{(?>[^}]++|}(?!}))*+}}|\||\[\[|]]~', '', $str); 

會取代你的二線和三str_replace函數

細節:

~   # pattern delimiter 
{{   # literal: {{ 
(?>   # open an atomic group (no backtracking inside, make the pattern fail faster) 
    [^}]++ # all characters except } one or more times (possessive: same thing than atomic grouping) 
    |   # OR 
    }(?!}) # a } not followed by } 
)*+   # repeat the atomic group zero or more time (possessive) 
}}   # literal: }} 
|   # OR 
\|   # literal: | 
|   # OR 
\[\[   # literal: [[ 
|   # OR 
]]   # literal: ]] 
~   # pattern delimiter 

你只現在需要的子模式1,3,4添加到該以同樣的方式模式。請注意,您不需要s修飾符,因爲它從不使用點。

關於用strip_tags:

您可以嘗試使用一個子模式太:

$str = preg_replace('~<[^>]++>~', '', $str); 

但要小心,因爲你的代碼可以包含幾個陷阱,例如:

blah blah blah <!-- blah > --> blah blah 
or 
<div theuglyattribute=">"> 

它有可能避免所有這些問題,但是你的模式會變得很長。

+0

如果替換絕對必須按照該順序發生,那麼我不認爲將所有替換合併爲一個將會正常工作。 – tenub

+1

@tenub我不認爲這應該是一個問題,從OP使用的模式。最多可能有兩個單獨的替換,第二個替換'|'到'/'。 – Jerry

+0

@tenub:我不認爲這是一個問題,但如果你害怕這樣做,你可以用相同的順序來完成。 –