從demangled符號中提取類

-2

我正在嘗試使用boost::regex從nm的demangled符號輸出中提取（完整）類名。此示例程序從demangled符號中提取類

#include <vector> 

namespace Ns1 
{ 
namespace Ns2 
{ 
    template<typename T, class Cont> 
    class A 
    { 
    public: 
     A() {} 
     ~A() {} 
     void foo(const Cont& c) {} 
     void bar(const A<T,Cont>& x) {} 

    private: 
     Cont cont; 
    }; 
} 
} 

int main() 
{ 
    Ns1::Ns2::A<int,std::vector<int> > a; 
    Ns1::Ns2::A<int,std::vector<int> > b; 
    std::vector<int> v; 

    a.foo(v); 
    a.bar(b); 
}

將產生類中的下列符號A

Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::A() 
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::bar(Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > > const&) 
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::foo(std::vector<int, std::allocator<int> > const&) 
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::~A()

我想優選使用單一的正則表達式模式來提取類（實例）名稱Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >，但是我有問題解析<>對中遞歸發生的類說明符。

有誰知道如何使用正則表達式模式（這是由boost::regex支持）做到這一點？

我的解決方案（基於David Hammen的答案，因此接受）：

我不使用（單）正則表達式來提取類和命名空間的符號。我已經創建了一個簡單的函數，剝掉從符號串的尾部包圍字符對（例如<>或()）：

std::string stripBracketPair(char openingBracket,char closingBracket,const std::string& symbol, std::string& strippedPart) 
{ 
    std::string result = symbol; 

    if(!result.empty() && 
     result[result.length() -1] == closingBracket) 
    { 
     size_t openPos = result.find_first_of(openingBracket); 
     if(openPos != std::string::npos) 
     { 
      strippedPart = result.substr(openPos); 
      result = result.substr(0,openPos); 
     } 
    } 
    return result; 
}

這在從符號提取命名空間/類其他兩種方法中使用：

std::string extractNamespace(const std::string& symbol) 
{ 
    std::string ns; 
    std::string strippedPart; 
    std::string cls = extractClass(symbol); 
    if(!cls.empty()) 
    { 
     cls = stripBracketPair('<','>',cls,strippedPart); 
     std::vector<std::string> classPathParts; 

     boost::split(classPathParts,cls,boost::is_any_of("::"),boost::token_compress_on); 
     ns = buildNamespaceFromSymbolPath(classPathParts); 
    } 
    else 
    { 
     // Assume this symbol is a namespace global function/variable 
     std::string globalSymbolName = stripBracketPair('(',')',symbol,strippedPart); 
     globalSymbolName = stripBracketPair('<','>',globalSymbolName,strippedPart); 
     std::vector<std::string> symbolPathParts; 

     boost::split(symbolPathParts,globalSymbolName,boost::is_any_of("::"),boost::token_compress_on); 
     ns = buildNamespaceFromSymbolPath(symbolPathParts); 
     std::vector<std::string> wsSplitted; 
     boost::split(wsSplitted,ns,boost::is_any_of(" \t"),boost::token_compress_on); 
     if(wsSplitted.size() > 1) 
     { 
      ns = wsSplitted[wsSplitted.size() - 1]; 
     } 
    } 

    if(isClass(ns)) 
    { 
     ns = ""; 
    } 
    return ns; 
}

std::string extractClass(const std::string& symbol) 
{ 
    std::string cls; 
    std::string strippedPart; 
    std::string fullSymbol = symbol; 
    boost::trim(fullSymbol); 
    fullSymbol = stripBracketPair('(',')',symbol,strippedPart); 
    fullSymbol = stripBracketPair('<','>',fullSymbol,strippedPart); 

    size_t pos = fullSymbol.find_last_of(':'); 
    if(pos != std::string::npos) 
    { 
     --pos; 
     cls = fullSymbol.substr(0,pos); 
     std::string untemplatedClassName = stripBracketPair('<','>',cls,strippedPart); 
     if(untemplatedClassName.find('<') == std::string::npos && 
     untemplatedClassName.find(' ') != std::string::npos) 
     { 
      cls = ""; 
     } 
    } 

    if(!cls.empty() && !isClass(cls)) 
    { 
     cls = ""; 
    } 
    return cls; 
}

的buildNamespaceFromSymbolPath()方法簡單地串接有效命名空間部分：

std::string buildNamespaceFromSymbolPath(const std::vector<std::string>& symbolPathParts) 
{ 
    if(symbolPathParts.size() >= 2) 
    { 
     std::ostringstream oss; 
     bool firstItem = true; 
     for(unsigned int i = 0;i < symbolPathParts.size() - 1;++i) 
     { 
      if((symbolPathParts[i].find('<') != std::string::npos) || 
       (symbolPathParts[i].find('(') != std::string::npos)) 
      { 
       break; 
      } 
      if(!firstItem) 
      { 
       oss << "::"; 
      } 
      else 
      { 
       firstItem = false; 
      } 
      oss << symbolPathParts[i]; 
     } 
     return oss.str(); 
    } 
    return ""; 
}

至少isClass()方法使用正則表達式來掃描一個構造方法中的所有碼元（不幸似乎不爲類只含有成員函數工作）：

std::set<std::string> allClasses; 

bool isClass(const std::string& classSymbol) 
{ 
    std::set<std::string>::iterator foundClass = allClasses.find(classSymbol); 
    if(foundClass != allClasses.end()) 
    { 
     return true; 
    } 

std::string strippedPart; 
    std::string constructorName = stripBracketPair('<','>',classSymbol,strippedPart); 
    std::vector<std::string> constructorPathParts; 

    boost::split(constructorPathParts,constructorName,boost::is_any_of("::"),boost::token_compress_on); 
    if(constructorPathParts.size() > 1) 
    { 
     constructorName = constructorPathParts.back(); 
    } 
    boost::replace_all(constructorName,"(","[\\(]"); 
    boost::replace_all(constructorName,")","[\\)]"); 
    boost::replace_all(constructorName,"*","[\\*]"); 

    std::ostringstream constructorPattern; 
    std::string symbolPattern = classSymbol; 
    boost::replace_all(symbolPattern,"(","[\\(]"); 
    boost::replace_all(symbolPattern,")","[\\)]"); 
    boost::replace_all(symbolPattern,"*","[\\*]"); 
    constructorPattern << "^" << symbolPattern << "::" << constructorName << "[\\(].+$"; 
    boost::regex reConstructor(constructorPattern.str()); 

    for(std::vector<NmRecord>::iterator it = allRecords.begin(); 
     it != allRecords.end(); 
     ++it) 
    { 
     if(boost::regex_match(it->symbolName,reConstructor)) 
     { 
      allClasses.insert(classSymbol); 
      return true; 
     } 
    } 
    return false; 
}

如所提到的如果類沒有提供任何構造函數，則last方法不能安全地找到類名，並且在大的符號表上很慢。但至少這似乎涵蓋了你可以從nm的符號信息中得到什麼。

我已經離開regex這個問題的標籤，其他用戶可能會發現正則表達式不是正確的方法。

來源

2012-09-16 πάντα ῥεῖ

不'nm'配備了'--demangle'的選擇嗎？爲什麼要重新發起全面掛鉤？ –

@KerrekSB我已經使用了demangled符號，我想從它們中提取類名。 –

哦，好的。但是，看起來模板語法沒有描述常規語言。這更像是XML（[我們都知道這是怎麼回事]]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454））。 –

這很難用perl的擴展正則表達式來實現，它比C++中的任何東西都強大得多。我建議不同的粘性：

首先擺脫看起來不像數據等功能的東西（尋找D標誌符）。像virtual thunk to this，virtual table for that等東西，也會擋你的路;在你做主要解析之前擺脫它們。這個過濾是正則表達式可以提供幫助的地方。你應該留下的是功能。對於每個功能，

擺脫最後右括號後的東西。例如，Foo::Bar(int,double) const變成Foo::Bar(int,double)。
剝離函數參數。這裏的問題是你可以在圓括號中有括號，例如，函數指針作爲參數的函數，這又可能將函數指針作爲參數。不要使用正則表達式。使用括號匹配的事實。在此步驟之後，Foo::Bar(int,double)變爲Foo::Bar，而a::b::Baz<lots<of<template>, stuff>>::Baz(int, void (*)(int, void (*)(int)))變成a::b::Baz<lots<of<template>, stuff>>::Baz。
現在工作在前端。使用類似的方案來解析該模板的內容。有了這個，那個混亂的a::b::Baz<lots<of<template>, stuff>>::Baz變成了a::b::Baz::Baz。
在這個階段，你的功能看起來像a::b:: ... ::ClassName::function_name。在某些命名空間中，這裏有一個小問題，那就是自由函數。破壞者是一個階級的死亡贈品;毫無疑問，如果函數名以代字號開頭，那麼您有一個類名。只要你沒有一個名稱空間Foo，你可以在其中定義一個函數Foo，構造函數是一個近似的贈品。
最後，您可能需要重新插入您剪下的模板內容。

來源

2012-09-16 13:17:18

感謝您的回答，我現在已經遵循了這個方向，但仍然使用RE。我試圖避免爲'（）'''''對匹配的東西寫一個解析器，但它似乎是更好的方法，而不是RE。 –

它確實不是。看看perl模塊Text :: Balanced。它具有perl正則表達式的全部功能，但它仍然使用計數機制。 –

感謝David的提示。似乎簡單剝離包圍字符對對分析來說更有希望。只要我對結果感到滿意，我就會發布解決方案。我也試圖提取名稱空間，所以我至少要考慮查找類構造函數方法來區分（嵌套）類和命名空間的問題。 –

我用簡單的C++ function進行了提取。

見鏈接，完整的代碼，背後的想法是：

有由::分開的基本級別的令牌。
如果有N個基本級別的令牌，第一N-1描述的className，最後是功能
我們通過(或<
上去水平（+1）的收盤)或>我們進入下一層（ - 1）
基本水平當然，這意味着 - level == 0

我有強烈的感覺，這不能用正則表達式來完成，因爲我們有括號的無限的水平。我在我的功能255 - 可以切換到std::stack<char>無限級。

功能：

std::vector<std::string> parseCppName(std::string line) 
{ 
    std::vector<std::string> retVal; 
    int level = 0; 
    char closeChars[256]; 

    size_t startPart = 0; 
    for (size_t i = 0; i < line.length(); ++i) 
    { 
     if (line[i] == ':' && level == 0) 
     { 
      if (i + 1 >= line.length() || line[i + 1] != ':') 
      throw std::runtime_error("missing :"); 
      retVal.push_back(line.substr(startPart, i - startPart)); 
      startPart = ++i + 1; 
     } 
     else if (line[i] == '(') { 
     closeChars[level++] = ')'; 
     } 
     else if (line[i] == '<') { 
     closeChars[level++] = '>'; 
     } 
     else if (level > 0 && line[i] == closeChars[level - 1]) { 
     --level; 
     } 
     else if (line[i] == '>' || line[i] == ')') { 
     throw std::runtime_error("Extra)>"); 
     } 
    } 
    if (level > 0) 
     throw std::runtime_error("Missing)>"); 
    retVal.push_back(line.substr(startPart)); 
    return retVal; 
}

來源

2012-09-16 17:52:26 PiotrNycz

我同意你使用正則表達式來實現這一點的感覺。到目前爲止，我已經開發了一個基於@David Hammen的提示的解析器。可能是當我看到需要改進當前的解決方案時，我會回到您的提案。另請注意我對命名空間提取用例的評論。 –

我認爲這將很難區分嵌套類和命名空間而不記住所有行。解析所有行後 - 每個N-1部分（由我的函數給出）命名一個類。其他是命名空間。但是這將被空課程打破，我的意思是沒有功能的課程，c-tors和d-tors。 – PiotrNycz

其實我會存儲所有輸入行並查找構造函數符號以在其中找到「真實」類。這不會涵蓋任何只包含靜態函數並且沒有（甚至沒有默認）構造函數的類。但是這對我的目的來說還是很好的。 –

從demangled符號中提取類

回答

相關問題