正則表達式解析的robots.txt

我有以下的robots.txt爲例 -正則表達式解析的robots.txt

 
User-agent: googlebot 
User-agent: slurp 
User-agent: msnbot 
User-agent: teoma 
User-agent: W3C-checklink 
User-agent: WDG_SiteValidator 
Disallow:/
Disallow: /js/ 
Disallow: /Web_References/ 
Disallow: /webresource.axd 
Disallow: /scriptresource.axd 

User-agent: Mediapartners-Google* 
Disallow: 

User-agent: * 
Disallow: /webresource.axd 
Disallow: /scriptresource.axd 
Disallow: /js/ 
Disallow: /Web_References/

我可能會問太多的正則表達式，但我想寫這將返回匹配的表達式下面的分組和有序的方式 -

 
Matches 
- [0] 
    - [UserAgents] 
     - "googlebot" 
     - "slurp" 
     - "msnbot" 
     - "teoma" 
     - "W3C-checklink" 
     - "WDG_SiteValidator" 
    - [Routes] 
     - [0] 
     - [Permission] "Allow" 
     - [Url] "/" 
     - [1] 
     - [Permission] "Disallow" 
     - [Url] "/js/" 
     - [2] 
     - [Permission] "Disallow" 
     - [Url] "/Web_References/" 

... 

etc 

...

我寫個人則表達式匹配的文檔的元素，但我不能讓他們當拼湊在一起工作。也許有人可以指出我要去哪裏錯了？

模式

用戶代理：(?:user-agent:\s*)(?<UserAgent>[a-z_0-9-*]*)

權限：(?<Permission>(?:allow|disallow))(?:\s*:\s*)(?<Url>[/0-9_a-z.]*)

我嘗試

((?<UserAgents>(?:user-agent:\s*)(?<UserAgent>[a-z_0-9-*]*))+(?<Routes>(?<Permission>(?:allow|disallow))(?:\s*:\s*)(?<Url>[/0-9_a-z.]*))+)+

通知你，我用防爆布萊索調試這些腳本，並具有下列檢查 - 多行，編譯和忽略大小寫

來源

2011-09-20 jameskind

什麼樣的比賽*你*得到？ – Alex

@亞瑟：我沒有得到任何比賽。 – jameskind

是不是有一個原因，你試圖用一個大的正則表達式來做到這一點，而不是單獨使用你的單獨的正則表達式？（或者甚至使用正則表達式呢？）將它們全部粉碎並不會讓你的程序更好，它只會讓事情變得不可讀，代碼更加蛇紋石。 – Amber

試試這個：

(?:^User-agent: (?<UserAgent>.*?)$)|(?<Permission>^(?:Allow)|(?:Disallow)): (?<Url>.*?)$

我不知道你想要的格式，但上述正則表達式匹配和名稱你感興趣的部分。也許你可以建立在該正則表達式之上。我很難做C＃，但也許這可能工作：

try { 
    Regex regexObj = new Regex("(?:^User-agent: (?<UserAgent>.*?)$)|(?<Permission>^(?:Allow)|(?:Disallow)): (?<Url>.*?)$", RegexOptions.IgnoreCase | RegexOptions.Multiline); 
    Match matchResults = regexObj.Match(subjectString); 
    while (matchResults.Success) { 
     for (int i = 1; i < matchResults.Groups.Count; i++) { 
      Group groupObj = matchResults.Groups[i]; 
      if (groupObj.Success) { 
       // matched text: groupObj.Value 
       // match start: groupObj.Index 
       // match length: groupObj.Length 
      } 
     } 
     matchResults = matchResults.NextMatch(); 
    } 
} catch (ArgumentException ex) { 
    // Syntax error in the regular expression 
}

來源

2011-09-20 14:18:37 StackOverflowNewbie

正則表達式解析的robots.txt

回答

相關問題