提升精神：緩慢解析優化

我是新來的精神和一般的提升。我試圖解析類似如下的VRML文件的一部分：提升精神：緩慢解析優化

point 
      [ 
      #coordinates written in meters. 
      -3.425386e-001 -1.681608e-001 0.000000e+000, 
      -3.425386e-001 -1.642545e-001 0.000000e+000, 
      -3.425386e-001 -1.603483e-001 0.000000e+000,

註釋與＃是可選的開始。

我寫了一個語法，工作正常，但解析過程花了很長時間。我想優化它以更快地運行。我的代碼如下所示：

struct Point 
{ 
    double a; 
    double b; 
    double c; 

    Point() : a(0.0), b(0.0), c(0.0){} 
}; 

BOOST_FUSION_ADAPT_STRUCT 
(
    Point, 
    (double, a) 
    (double, b) 
    (double, c) 
) 

namespace qi = boost::spirit::qi; 
namespace repo = boost::spirit::repository; 

template <typename Iterator> 
struct PointParser : 
public qi::grammar<Iterator, std::vector<Point>(), qi::space_type> 
{ 
    PointParser() : PointParser::base_type(start, "PointGrammar") 
    { 
     singlePoint = qi::double_>>qi::double_>>qi::double_>>*qi::lit(","); 
     comment = qi::lit("#")>>*(qi::char_("a-zA-Z.") - qi::eol); 
     prefix = repo::seek[qi::lexeme[qi::skip[qi::lit("point")>>qi::lit("[")>>*comment]]]; 
     start %= prefix>>qi::repeat[singlePoint];  

     //BOOST_SPIRIT_DEBUG_NODES((prefix)(comment)(singlePoint)(start)); 
    } 

    qi::rule<Iterator, Point(), qi::space_type>    singlePoint; 
    qi::rule<Iterator, qi::space_type>      comment; 
    qi::rule<Iterator, qi::space_type>      prefix; 
    qi::rule<Iterator, std::vector<Point>(), qi::space_type> start; 
};

，我打算解析部分，位於輸入文本的中間，所以我需要跳過的文字部分，以得到它。我使用repo :: seek來實現它。這是最好的方法嗎？

我以下列方式運行解析器：

std::vector<Point> points; 
typedef PointParser<std::string::const_iterator> pointParser; 
pointParser g2; 

auto start = ch::high_resolution_clock::now(); 
bool r = phrase_parse(Data.begin(), Data.end(), g2, qi::space, points); 
auto end = ch::high_resolution_clock::now(); 

auto duration = ch::duration_cast<boost::chrono::milliseconds>(end - start).count();

要在輸入文本解析約80K項，大約需要2.5秒，這是我的需求相當緩慢。我的問題是有沒有辦法以更優化的方式編寫解析規則以使其更快（更快）？我怎樣才能改進這個實現呢？

我是新來的精神，所以一些解釋將不勝感激。

來源

2015-10-06 Slava C

如果相同的代碼36毫秒和2.5秒的類似系統（球場）和類似的輸入運行時間之間獲得，我的辦法是使後者沒有優化上。 – rubenvb

對此不確定。我使用VS2013的預編譯版本的boost 1.59庫。你能建議任何優化標誌嗎？ –

Boost（.Spirit）非常模板化，預編譯的庫不會讓你放慢速度......你爲什麼認爲*你的代碼編譯得太慢了？精神是在你使用它的時候編譯的，而不是在前面。你需要確保你的代碼的定時發佈版本。 – rubenvb

我已將您的語法掛鉤到Nonius基準測試中，並生成約85k行的統一隨機輸入數據（下載：http://stackoverflow-sehe.s3.amazonaws.com/input.txt，7.4 MB）。

你在測量發佈版本的時間嗎？
你使用慢速文件輸入嗎？

當讀取文件的前期我一直得到〜36ms時間來解析一大堆。

clock resolution: mean is 17.616 ns (40960002 iterations) 

benchmarking sample 
collecting 100 samples, 1 iterations each, in estimated 3.82932 s 
mean: 36.0971 ms, lb 35.9127 ms, ub 36.4456 ms, ci 0.95 
std dev: 1252.71 μs, lb 762.716 μs, ub 2.003 ms, ci 0.95 
found 6 outliers among 100 samples (6%) 
variance is moderately inflated by outliers

代碼：見下文。

注：

你似乎發生衝突上使用的船長，並尋求在一起。我建議你簡化prefix：
```
comment  = '#' >> *(qi::char_ - qi::eol); 

prefix  = repo::seek[ 
        qi::lit("point") >> '[' >> *comment 
       ]; 
```
prefix將使用空間船長，而忽略任何匹配的屬性（因爲該規則聲明的類型）。讓comment隱含一個語義從規則聲明下降船長：
```
// implicit lexeme: 
    qi::rule<Iterator>      comment; 
```
注意更多的背景信息，請參閱Boost spirit skipper issues。

Live On Coliru

#include <boost/fusion/adapted/struct.hpp> 
#include <boost/spirit/include/qi.hpp> 
#include <boost/spirit/repository/include/qi_seek.hpp> 

namespace qi = boost::spirit::qi; 
namespace repo = boost::spirit::repository; 

struct Point { double a = 0, b = 0, c = 0; }; 

BOOST_FUSION_ADAPT_STRUCT(Point, a, b, c) 

template <typename Iterator> 
struct PointParser : public qi::grammar<Iterator, std::vector<Point>(), qi::space_type> 
{ 
    PointParser() : PointParser::base_type(start, "PointGrammar") 
    { 
     singlePoint = qi::double_ >> qi::double_ >> qi::double_ >> *qi::lit(','); 

     comment  = '#' >> *(qi::char_ - qi::eol); 

     prefix  = repo::seek[ 
      qi::lit("point") >> '[' >> *comment 
      ]; 

     //prefix  = repo::seek[qi::lexeme[qi::skip[qi::lit("point")>>qi::lit("[")>>*comment]]]; 

     start  %= prefix >> *singlePoint; 

     //BOOST_SPIRIT_DEBUG_NODES((prefix)(comment)(singlePoint)(start)); 
    } 

    private: 
    qi::rule<Iterator, Point(), qi::space_type>    singlePoint; 
    qi::rule<Iterator, std::vector<Point>(), qi::space_type> start; 
    qi::rule<Iterator, qi::space_type>      prefix; 
    // implicit lexeme: 
    qi::rule<Iterator> comment; 
}; 

#include <nonius/benchmark.h++> 
#include <nonius/main.h++> 
#include <boost/iostreams/device/mapped_file.hpp> 

static boost::iostreams::mapped_file_source src("input.txt"); 

NONIUS_BENCHMARK("sample", [](nonius::chronometer cm) { 
    std::vector<Point> points; 

    using It = char const*; 
    PointParser<It> g2; 

    cm.measure([&](int) { 
     It f = src.begin(), l = src.end(); 
     return phrase_parse(f, l, g2, qi::space, points); 
     bool ok = phrase_parse(f, l, g2, qi::space, points); 
     if (ok) 
      std::cout << "Parsed " << points.size() << " points\n"; 
     else 
      std::cout << "Parsed failed\n"; 

     if (f!=l) 
      std::cout << "Remaining unparsed input: '" << std::string(f,std::min(f+30, l)) << "'\n"; 

     assert(ok); 
    }); 
})

圖：

另一個運行輸出，住：

http://stackoverflow-sehe.s3.amazonaws.com/30dd790b-8b52-4eab-a130-8d6896207b2f.html（點擊所有個別樣品）

來源

2015-10-06 12:43:40 sehe

我已經把實況流錄製（減去前13分鐘......）放在這裏：https：//www.livecoding.tv/video/another-spirit-grammar-benchmark-nonius/（[experiment]（http： //chat.stackoverflow.com/transcript/10?m=24182469#24182469）） – sehe

非常感謝您的回覆。我在** debug ** build中測量時間。我不確定你是什麼意思「預先讀取文件」。我正在使用** boost :: iostreams :: mapped_file_source **讀取文件，並通過** std :: string **將數據傳遞給解析器。除了** singlePoint **之外，我已經剝奪了所有規則，並且我仍然獲得了大約2.5秒的時間。 –

我試圖從_comment_規則中刪除船長並按照您的建議簡化規則，但解析器無法正常工作。我在這裏做錯了什麼？ –

提升精神：緩慢解析優化

回答

相關問題