我如何使用fscanf提取html

我有一個文件，每行都包含一個。我如何使用fscanf提取html

<div style="random properties" id="keyword1:string id:int">text</div> 
<div style="random properties" id="keyword1:string id:int">text</div> 
<div style="random properties" id="keyword2:string id:int">text</div> 
<div style="random properties" id="keyword2:string id:int">text</div>

我的fscanf可以返回文本和id的列表匹配的關鍵字1和1關鍵字？

來源

2012-10-17 Poul K. Sørensen

你試過了嗎？ – marcinj

是的，你可以。但是，如果你使用html解析庫，或者甚至使用像yacc這樣的解析器生成器，你也會保持清醒。 – StoryTeller

你有沒有特別想使用'fscanf'的理由？這讀起來有點像[XY問題]（http://meta.stackexchange.com/a/66378/166663）... – ildjarn

你可以簡單地用正則表達式閱讀：

std::string s; 
std::regex r("<div style=\"[^\"]*\" id=\".*(\\d+)\">((?:(?!</div>).)*)</div>"); 
while(std::getline(in, s)) { 
    std::smatch m; 
    if(std::regex_match(s, m, r)) { 
     std::cout << "id = " << m.str(1) << ", text = " << m.str(2) << std::endl; 
    } else { 
     std::cout << "invalid pattern" << std::endl; 
    } 
}

但是如果你想了解更多關於regex請到http://en.cppreference.com/w/cpp/regex

來源

2012-10-17 23:02:44 BigBoss

假設這不是一個真正的html文件，並且每行實際上以

and ends with

開頭。 – Beached

@問題 – BigBoss

中所示的輸入格式是正確的，但聽起來他/她想要一種方法。像這樣的解決方案可能會工作，但不能像div標籤或格式更改中的新行一樣脆弱 http://stackoverflow.com/questions/489522/library-recommendation-c-html-解析器 – Beached

我如何使用fscanf提取html

回答

相關問題