2016-10-04 39 views
0

我似乎每年都會寫一個Reg表達式,並且總是最終尋求幫助。REGEX查找字符串中的字符串

這裏是一個字符串(它是來自Solr的搜索字符串),我想選擇搜索詞的每個實例。

這裏的輸入: -

http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A 

,我需要選擇每一個「%3A」和「+OR」以及最後的「%3Atheory))」之間的任何文本 - 在這種情況下,單詞「theory」但每次都會變成另外一個詞 - 唯一已知的是它將是'%3A'和'+OR'之間的任何字母文本。它需要停止在「+AND+

/%3A(.*?)[+OR]/g我已經得到了 - 這是好的開始。我猜... 它沒有找到「%3Atheory))」,並沒有在「+AND+」停止

我正在努力'找到這個'或者'找到那個'以及停止在一個字符串。

有人提供一些指導?

+1

類似[this](https://regex101.com/r/xhEq1j/1)? –

+0

是的!我沒有意識到我必須逃避每個關閉支架。幾乎在那裏 - 我需要停下'+ AND +'表達。 –

+0

贊[so](https://regex101.com/r/xhEq1j/2)? –

回答

0

如果您使用它可能是更好的使用String.Split和Regex.Matches像這樣兩個操作分裂:

string input = @"http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A"; 
Regex regex = new Regex(@"%3A(.*?)(?:\+OR|\)\))"); 

var splitted = input.Split(new[] { "AND" }, StringSplitOptions.None); 
var matches = regex.Matches(splitted.First()); 

foreach (Match m in matches) 
{ 
    // Or whatever you like to do with your matches 
    Console.WriteLine(m.Groups[1].Value); 
} 
0

Regex.Split有一個選項,以保持獨立字符串。所以對於這個問題給出的文字,像下面的代碼將其分割成塊:

string[] pieces = Regex.Split(theInputText, "(%3A.*?\\+(?:AND|OR))"); 
foreach (string ss in pieces) 
{ 
    Console.WriteLine(ss); 
} 

下面是輸出的一小部分:

+virtualPath 
%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR 
+virtualPath 
%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND 
+-(virtualPath 
%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR 
+virtualPath 

具有字符串分割成它應該塊用一個簡單的事情來篩選具有正確起始和結束字符的數組元素,並找到最後的%3Atheory...條目。

注:這個問題討論+OR+AND+但所有+OR s的緊隨其後,+所以它可能會更好,包括在表達的最後+,爲...OR)\\+)

注意:正則表達式中的內括號是非捕獲的,即(?: )。如果他們正在捕獲括號,則ANDOR捕獲將包含在輸出數組中。