正則表達式 - 拿到冠軍

的特定部分，我已經有了一個稱號的結構是這樣的：正則表達式 - 拿到冠軍

<title>WebsiteName | Page title | Slogan</title>

目前，在C＃中我用這個，拿到冠軍了：

Regex.Match(pageSource, 
       @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", 
       RegexOptions.IgnoreCase).Groups["Title"].Value;

但是，我想出去的只是網頁標題。

來源

2013-05-08 ItsGreg

是，HTML你解析？ – Anirudha 2013-05-08 17:46:55

你想要在你提供的標題中匹配什麼？只是'頁面標題'？ – 2013-05-08 17:51:21

分解你的問題。使用DOM解析工具的som形式來解析html。請參閱下面的答案。然後在標題內容中使用正則表達式或簡單的字符串。 – Mithon 2013-05-08 18:00:27

試試這個：

@"\<title[^>]*\>[^|]*\|\s*(?<Title>[^|]*?)\|[^<]*\</title\>" 

"\<title[^>]*\>" //Title tag 
"[^|]*"   //Everything up to the first pipe 
"\|\s*"   //First pipe and any leading white space 
"(?<Title>[^|]*?)" //The page title section between the pipes 
"\|"    //Second pipe 
"[^<]*\"   //Everything after the first pipe up to closing title tag 
"</title\>"  //closing title tag

來源

2013-05-08 17:54:42 Cemafor

工作就像一個魅力！謝謝：） – ItsGreg 2013-05-11 17:33:33

如果你只是想獲得Page Title那就試試這個：

\|(.*)\|

你的第二場比賽將包含標題，如果你通過你提供的字符串。如果你發現自己做了比這更復雜的事情，那麼正則表達式可能不是你的工具。有更好的方法來解析HTML。

來源

2013-05-08 17:47:52

避免使用regex解析html。

則可以使用htmlAgilityPack

這將讓HTML的標題呢！

HtmlDocument doc = new HtmlDocument(); 
doc.Load(yourStream);  
string title=doc.DocumentNode.SelectSingleNode("//title").InnerText;

現在越來越頁面標題你可以用這個表達式

考慮您的標題一定是相同的形式在你的例子給出獲取所需的數據後，就可以使用

(?<=\|).+?(?=\|)

來源

2013-05-08 17:49:20 Anirudha

我認爲他想要在標題標籤內使用「頁面標題」？這並不完全清楚... – 2013-05-08 17:53:40

@AbeMiessler很好catch..would編輯答案..感謝 – Anirudha 2013-05-08 17:58:07

我的第一個想法是使用HAP，但決定不會導致我認爲它會更慢.. – ItsGreg 2013-05-08 18:32:40

正則表達式 - 拿到冠軍

回答

相關問題