2015-02-11 58 views
1

我正在努力使用Nodejs將HTML文件編入索引。然而,即使在使用Nodejs之前,我試圖運行下面的手動索引,這似乎不工作。我錯過了什麼?在索引到elasticsearch之前去除HTML標籤

指數樣本HTML標籤使用html_strip過濾器:

curl -XPOST 'localhost:9200/bhs/articles/_analyzer?tokenizer=standard&char_filters=html_strip' -d ' 
{ 
    "content" : "<title>Dilip Kumar</title>" 
}' 

搜索得到的所有文件:

http://localhost:9200/bhs/articles/_search 

它提供了以下的結果:

{ 
    "took": 4, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 1, 
    "hits": [ 
     { 
     "_index": "bhs", 
     "_type": "articles", 
     "_id": "AUt2TGl9aadd5iLJ3mue", 
     "_score": 1, 
     "_source": { 
      "content": "<title>Dilip Kumar</title>" 
     } 
     } 
    ] 
    } 
} 

理想情況下,不應該指數標籤,因爲我已經使用html_filter去除標籤。

+0

我期待在上下文elasticsearch。不是JavaScript。 – joy 2015-02-11 02:04:44

+0

我看到標籤也被索引,因此當我搜索「標題」時,它就是結果。似乎我缺少基礎知識。 – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1122154/">joy</a></span> <span>2015-02-11 02:23:13</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">什麼是您的文章類型的映射 - 你告訴它使用自定義分析器? – <span class="text-secondary"> <small> <span>2015-02-11 17:31:05</span> </small> </span> </p> </div> </div> </div> </div> </div> </article> </div> <div class="answer-title"> <span class="text-logo margin-top-sm">A</span> <h2 class="title h4">回答</h2> </div> <div class="item-description text-md markdown-body margin-bottom-40 voidso"> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">0<i class="fa fa-thumbs-up"></i></span> </div> <div class="post-offset"> <div class="answer fmt"> <p>您在返回的搜索結果中看到的是存儲的內容,即,這不是已經編制索引的單個條款。</p> ​​ <p>要查看已被索引是一個更具有挑戰性 - 索引條款沒有被設計要返回給用戶,而僅使用時查找。</p> <p>但是,您可以訪問和使用腳本來查看它們:</p> <pre><code class="prettyprint-override">curl 'http://localhost:9200/bhs/articles/_search?pretty=true' -d '{ "query" : { "match_all" : { } }, "script_fields": { "terms" : { "script": "doc[field].values", "params": { "field": "content" } } } }' </code></pre> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/28460670">來源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2015-02-11 17:29:49</span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> <!-- comments --> <div class="comments"> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">感謝您解釋_source。我不想索引標籤,即<title>。目前,我可以使用「標題」字搜索,而我不想將「標題」作爲<title>的一部分。我應該如何索引沒有HTML標籤的內容? – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1122154/">joy</a></span> <span>2015-02-12 06:09:02</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">什麼是您的文章類型的映射 - 你告訴它使用自定義分析器? – <span class="text-secondary"> <small> <span>2015-02-12 08:13:13</span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">由於我錯誤地創建了兩個帖子,因爲我沒有意識到兩者都涉及到相同的問題....你能檢查下面的帖子來映射使用http://stackoverflow.com/questions/28445684/why-html-tag-被搜索的偶數如果-IT-被過濾的功能於彈性搜索/ 28446814?noredirect = 1個#comment45231786_28446814 – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/1122154/">joy</a></span> <span>2015-02-12 16:44:50</span> </small> </span> </p> </div> </div> </div> </div> </div> </article> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1038284119" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="clearfix"> </div> <div class="relative-box"> <div class="relative">相關問題</div> <ul class="relative_list"> <li> 1. <a href="http://hk.uwenku.com/question/p-dbfinyqx-bks.html" target="_blank" title="在回調之前去掉html標記"> 在回調之前去掉html標記 </a> </li> <li> 2. <a href="http://hk.uwenku.com/question/p-blrxwkhd-wz.html" target="_blank" title="ElasticSearch防止搜索html標籤"> ElasticSearch防止搜索html標籤 </a> </li> <li> 3. <a href="http://hk.uwenku.com/question/p-rohlfokh-v.html" target="_blank" title="用Jquery去除數字前置標籤"> 用Jquery去除數字前置標籤 </a> </li> <li> 4. <a href="http://hk.uwenku.com/question/p-xkylxwda-bhb.html" target="_blank" title="回去一個標籤索引"> 回去一個標籤索引 </a> </li> <li> 5. <a href="http://hk.uwenku.com/question/p-ugibpbbe-a.html" target="_blank" title="如何使用PHPQuery去除HTML標籤?"> 如何使用PHPQuery去除HTML標籤? </a> </li> <li> 6. <a href="http://hk.uwenku.com/question/p-euvgtcuy-pn.html" target="_blank" title="選擇性地去除HTML標籤"> 選擇性地去除HTML標籤 </a> </li> <li> 7. <a href="http://hk.uwenku.com/question/p-bzevujda-vb.html" target="_blank" title="使用C#去除HTML標籤"> 使用C#去除HTML標籤 </a> </li> <li> 8. <a href="http://hk.uwenku.com/question/p-esexjhwx-bar.html" target="_blank" title="從SQL結果中去除Html標籤"> 從SQL結果中去除Html標籤 </a> </li> <li> 9. <a href="http://hk.uwenku.com/question/p-zjpftywd-sk.html" target="_blank" title="Python-HTML-如何使用BeautifulSoup去除標籤之間的內容"> Python-HTML-如何使用BeautifulSoup去除標籤之間的內容 </a> </li> <li> 10. <a href="http://hk.uwenku.com/question/p-dnrcnxbn-bnb.html" target="_blank" title="分別去除NA和索引標籤的X軸和Y軸"> 分別去除NA和索引標籤的X軸和Y軸 </a> </li> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-6208739752673518" data-ad-slot="4606349252"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <li> 11. <a href="http://hk.uwenku.com/question/p-rxivyibl-nx.html" target="_blank" title="BeautifulSoup標籤去除"> BeautifulSoup標籤去除 </a> </li> <li> 12. <a href="http://hk.uwenku.com/question/p-pzvfdcov-xk.html" target="_blank" title="BeautifulSoup去除標籤"> BeautifulSoup去除標籤 </a> </li> <li> 13. <a href="http://hk.uwenku.com/question/p-rhslfbxb-bgp.html" target="_blank" title="在document.ready()之前獲取HTML標籤,DOM呈現之前"> 在document.ready()之前獲取HTML標籤,DOM呈現之前 </a> </li> <li> 14. <a href="http://hk.uwenku.com/question/p-mqlwgdoe-bck.html" target="_blank" title="Elasticsearch:自動索引刪除/到期"> Elasticsearch:自動索引刪除/到期 </a> </li> <li> 15. <a href="http://hk.uwenku.com/question/p-ncrkmbun-ys.html" target="_blank" title="如何清除ElasticSearch索引?"> 如何清除ElasticSearch索引? </a> </li> <li> 16. <a href="http://hk.uwenku.com/question/p-xwwfuiiu-yb.html" target="_blank" title="Elasticsearch禁用刪除索引"> Elasticsearch禁用刪除索引 </a> </li> <li> 17. <a href="http://hk.uwenku.com/question/p-ecnndedr-mm.html" target="_blank" title="離線刪除Elasticsearch索引"> 離線刪除Elasticsearch索引 </a> </li> <li> 18. <a href="http://hk.uwenku.com/question/p-nkkuuvdu-eg.html" target="_blank" title="在特定日期之前獲取elasticsearch索引"> 在特定日期之前獲取elasticsearch索引 </a> </li> <li> 19. <a href="http://hk.uwenku.com/question/p-ydrqmqbc-yb.html" target="_blank" title="如何在使用php導出到csv之前刪除html標籤?"> 如何在使用php導出到csv之前刪除html標籤? </a> </li> <li> 20. <a href="http://hk.uwenku.com/question/p-twwmifob-bdv.html" target="_blank" title="在Elasticsearch在索引中刪除"> 在Elasticsearch在索引中刪除 </a> </li> <li> 21. <a href="http://hk.uwenku.com/question/p-mhnwmxjl-yg.html" target="_blank" title="Android - 在活動標籤之間切換,獲取標籤索引"> Android - 在活動標籤之間切換,獲取標籤索引 </a> </li> <li> 22. <a href="http://hk.uwenku.com/question/p-wefiwhpe-sw.html" target="_blank" title="elasticsearch的索引標準"> elasticsearch的索引標準 </a> </li> <li> 23. <a href="http://hk.uwenku.com/question/p-wtfchkfv-bdn.html" target="_blank" title="去除除錨定標記之外的所有HTML標記"> 去除除錨定標記之外的所有HTML標記 </a> </li> <li> 24. <a href="http://hk.uwenku.com/question/p-rknmuakr-bw.html" target="_blank" title="除去對標籤有BR標籤"> 除去對標籤有BR標籤 </a> </li> <li> 25. <a href="http://hk.uwenku.com/question/p-xtkqznyl-np.html" target="_blank" title="wp_update_comment()失去html標籤"> wp_update_comment()失去html標籤 </a> </li> <li> 26. <a href="http://hk.uwenku.com/question/p-xtfncfiq-bbv.html" target="_blank" title="如何刪除/索引之前?"> 如何刪除/索引之前? </a> </li> <li> 27. <a href="http://hk.uwenku.com/question/p-ervxilrr-bnh.html" target="_blank" title="Solr索引文件刪除html標籤和垃圾內容形式索引"> Solr索引文件刪除html標籤和垃圾內容形式索引 </a> </li> <li> 28. <a href="http://hk.uwenku.com/question/p-bczahzel-baz.html" target="_blank" title="XUL獲取當前標籤索引號"> XUL獲取當前標籤索引號 </a> </li> <li> 29. <a href="http://hk.uwenku.com/question/p-yoxshisa-bag.html" target="_blank" title="索引CJK和剝離HTML標籤"> 索引CJK和剝離HTML標籤 </a> </li> <li> 30. <a href="http://hk.uwenku.com/question/p-vlplyquq-gw.html" target="_blank" title="如何在Selenium WebDriver中找到HTML子標籤的索引?"> 如何在Selenium WebDriver中找到HTML子標籤的索引? </a> </li> </ul> </div> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1575177025"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="padding-top-10"></div> </div> </div> <script type="text/javascript" src="http://img.uwenku.com/uwenku/script/side.js?t=1644592048261"></script> <script type="text/javascript" src="http://img.uwenku.com/uwenku/plugin/highlight/highlight.pack.js"></script> <link href="http://img.uwenku.com/uwenku/plugin/highlight/styles/docco.css" media="screen" rel="stylesheet" type="text/css" /> <script type="text/javascript"> $('pre').each(function(i, e) { hljs.highlightBlock(e, "<span class='indent'> </span>", false) }); </script> <div class="col-lg-3 col-md-4 col-sm-5"> <div id="rightTop"> <div class="row"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="5415218910" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 最新問題 </div> <div class="m-b-sm m-t-sm clearfix"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-biqyvgby-bmt.html" target="_blank" title="openapi v3響應正文中的多行示例"> openapi v3響應正文中的多行示例 </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-oepdilmp-bmn.html" target="_blank" title="Python 3.6.3 urlopen從URI中刪除服務器名稱以存儲在遠程服務器上的html文件"> Python 3.6.3 urlopen從URI中刪除服務器名稱以存儲在遠程服務器上的html文件 </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-kdthjqxu-bnw.html" target="_blank" title="8位內聯彙編大小不匹配旋轉"> 8位內聯彙編大小不匹配旋轉 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-recklsrc-bnq.html" target="_blank" title="將字符串數組從WWW轉換爲Unity3d中的類列表"> 將字符串數組從WWW轉換爲Unity3d中的類列表 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-heqakphf-bnh.html" target="_blank" title="按鈕上的顏色隨時變化"> 按鈕上的顏色隨時變化 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-tjfkzchb-bcw.html" target="_blank" title="創建簡單的P2P網絡"> 創建簡單的P2P網絡 </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-psejxepn-bdc.html" target="_blank" title="添加和更改網頁中的動態內容"> 添加和更改網頁中的動態內容 </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-wkacljdr-bdn.html" target="_blank" title="字典分配"> 字典分配 </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-ewahdfbq-bgy.html" target="_blank" title="基於彈簧配置文件的彈簧引導應用程序屬性"> 基於彈簧配置文件的彈簧引導應用程序屬性 </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-chgwggio-bhu.html" target="_blank" title="NodeJs - 異步/待機異步/等待"> NodeJs - 異步/待機異步/等待 </a> </li> </ul> </div> </div> </div> <p class="article-nav-bar"></p> <div class="row sidebar article-nav"> <div class="row box_white visible-sm visible-md visible-lg margin-zero"> <div class="top"> <h3 class="title"><i class="glyphicon glyphicon-th-list"></i> 相關問題</h3> </div> <div class="article-relative-content"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-dbfinyqx-bks.html" target="_blank" title="在回調之前去掉html標記"> 在回調之前去掉html標記 </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-blrxwkhd-wz.html" target="_blank" title="ElasticSearch防止搜索html標籤"> ElasticSearch防止搜索html標籤 </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-rohlfokh-v.html" target="_blank" title="用Jquery去除數字前置標籤"> 用Jquery去除數字前置標籤 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-xkylxwda-bhb.html" target="_blank" title="回去一個標籤索引"> 回去一個標籤索引 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-ugibpbbe-a.html" target="_blank" title="如何使用PHPQuery去除HTML標籤?"> 如何使用PHPQuery去除HTML標籤? </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-euvgtcuy-pn.html" target="_blank" title="選擇性地去除HTML標籤"> 選擇性地去除HTML標籤 </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-bzevujda-vb.html" target="_blank" title="使用C#去除HTML標籤"> 使用C#去除HTML標籤 </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-esexjhwx-bar.html" target="_blank" title="從SQL結果中去除Html標籤"> 從SQL結果中去除Html標籤 </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-zjpftywd-sk.html" target="_blank" title="Python-HTML-如何使用BeautifulSoup去除標籤之間的內容"> Python-HTML-如何使用BeautifulSoup去除標籤之間的內容 </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-dnrcnxbn-bnb.html" target="_blank" title="分別去除NA和索引標籤的X軸和Y軸"> 分別去除NA和索引標籤的X軸和Y軸 </a> </li> </ul> </div> </div> </div> </div> </div> </div> </div><!-- wrap end--> <!-- footer --> <footer id="footer"> <div class="bg-simple lt"> <div class="container"> <div class="row padder-v m-t"> <div class="col-xs-8"> <ul class="list-inline"> <li><a href="http://hk.uwenku.com/contact">聯系我們</a></li> <li>© 2020 HK.UWENKU.COM</li> <li><a target="_blank" href="https://beian.miit.gov.cn/">沪ICP备13005482号-4</a></li> <li><script type="text/javascript" src="https://v1.cnzz.com/z_stat.php?id=1280101193&web_id=1280101193"></script></li> <li><a href="http://www.uwenku.com/" target="_blank" title="优文库">简体中文</a></li> <li><a href="http://hk.uwenku.com/" target="_blank" title="優文庫">繁體中文</a></li> <li><a href="http://ru.uwenku.com/" target="_blank" title="поле вопросов и ответов">Русский</a></li> <li><a href="http://de.uwenku.com/" target="_blank" title="Frage - und - antwort - Park">Deutsch</a></li> <li><a href="http://es.uwenku.com/" target="_blank" title="Preguntas y respuestas">Español</a></li> <li><a href="http://hi.uwenku.com/" target="_blank" title="कार्यक्रम प्रश्न और उत्तर पार्क">हिन्दी</a></li> <li><a href="http://it.uwenku.com/" target="_blank" title="IL Programma di chiedere Park">Italiano</a></li> <li><a href="http://ja.uwenku.com/" target="_blank" title="プログラム問答園区">日本語</a></li> <li><a href="http://ko.uwenku.com/" target="_blank" title="프로그램 문답 단지">한국어</a></li> <li><a href="http://pl.uwenku.com/" target="_blank" title="program o park">Polski</a></li> <li><a href="http://tr.uwenku.com/" target="_blank" title="Program soru ve cevap parkı">Türkçe</a></li> <li><a href="http://vi.uwenku.com/" target="_blank" title="Đáp ứng viên">Tiếng Việt</a></li> <li><a href="http://fr.uwenku.com/" target="_blank" title="Programme interrogation Park">Française</a></li> </ul> </div> </div> </div> </div> </div> </footer> <!-- / footer --> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?f78a970f17b19a79fc477a3378096f29"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>