感覺需要改進我的RegExp的

我正在使用一個HTTP庫（winhttp）2個星期，現在我想改進我的RegExp以檢索目標網站上的一些數據。感覺需要改進我的RegExp的

考慮下面的HTML代碼：

Total Posts:</span> 22,423</li>

現在我想做的是僅檢索數量並存入一個變量：

regex = "Total Posts:</span> \\S+"; 

if(std::regex_search(regexs, regexmatch, regex)) 
{ 
    temp = regexmatch[0]; 
    found = temp.find(","); 
    if(found != std::string::npos) 
     temp.erase(found, 1); 
    temp.erase(0, 19); 
    temp.erase(temp.end() - 5, temp.end()); 
    User._Posts = ConvertStringToInteger(temp); 
}

使用了一些正則表達式，這和剝離部分因爲我沒有得到我如何檢索模式，而不是整個結果。希望有人能理解我。已經查閱了文檔，但沒有發現什麼可以幫助我。

來源

2013-10-21 user23842348943292

這可能是相關的：http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu- way.html –

是的，最好不要爲此任務使用RegEx。正則表達式幾乎沒有任何性能，並且有大量的（X）HTML解析器，它可以在指針的基礎上快速解析文本。 – schmijos

要匹配您所需的圖案，您希望使用帶有std::regex_search的捕獲組。

捕獲組用於捕獲正則表達式內的匹配區域，每個捕獲區域由sub_match表示。您可以使用smatch專用match_results來處理字符串子匹配，然後使用運算符[]來獲得匹配。

實施例：

const std::string foo = "Total Posts:</span> 22,423</li>"; 

std::regex rgx("Total Posts:</span> ([^<]+)"); 
std::smatch match; 

if (std::regex_search(foo.begin(), foo.end(), match, rgx)) { 
    std::cout << match[1] << '\n'; 
}

輸出：

22,423

來源

2013-10-21 08:04:13 hwnd

感覺需要改進我的RegExp的

回答

相關問題