2016-02-24 33 views
1

solr查詢Web界面我想要獲取術語向量值,例如術語頻率最高的術語等。Solr返回錯誤500的術語向量查詢

爲此,我使用查詢http://domain/tvrh?q=text:[* TO *]&wt=json&indent=true&tv.all=true&terms.fl=text

我得到這個查詢以下錯誤:

"termVectors": [ 
"uniqueKeyFieldName", 
"_id", 
"14708d4c-7145-46b7-98d0-727baff35ab9", 
[ 
    "uniqueKey", 
    "14708d4c-7145-46b7-98d0-727baff35ab9" 
] 
], 
    "error": { 
"trace": "java.lang.NullPointerException 
at org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:329) 
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) 
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) 
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) 
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) 
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) 
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) 
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) 
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) 
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) 
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) 
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) 
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) 
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) 
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) 
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) 
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) 
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) 
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) 
at org.eclipse.jetty.server.Server.handle(Server.java:499) 
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) 
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) 
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) 
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) 
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) 
at java.lang.Thread.run(Thread.java:745) 
", 
      "code": 500 

任何想法?

編輯:

我schema.xml中描述字段:

<field name="description" 
type="text_general" 
indexed="true" 
stored="true" 
multiValued="true" 
termVectors="true" 
termPositions="true" 
termOffsets="true"/> 


<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stop-words-all-sorted.txt" /> 
    <!-- in this example, we will only use synonyms at query time 
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
    --> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stop-words-all-sorted.txt" /> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
</fieldType> 

據Karsten的建議,我用/術語來獲得長期頻率不管查詢。 http://localhost:8983/solr/core/terms?wt=json&indent=true&terms.fl=description

我現在得到的術語頻率,但沒有顯示單個術語,而是存儲了整個文本。

回答

2

你有一個Nullpointer異常導致空結果爲IndexReader.html#getTermVectors所以你最有可能沒有索引TermVectortermVectors="true")。

您可以在schema.xml的field definition中添加TermVectors。例如:

<field name="includes" 
    type="text_general" 
    indexed="true" 
    stored="true" 
    multiValued="true" 
    termVectors="true" 
    termPositions="true" 
    termOffsets="true" /> 

順便說一句。:

如果你想在文檔頻率沒有限制,你應該使用Terms Component查詢。

如果您希望將文檔頻率限制爲查詢,則可以使用構面(以「文本」作爲構面)。

選擇
+0

卡斯滕,對於'在架構瀏覽器text'場我有以下標誌: 標誌:\t索引\t切分\t存儲\t多值 – Jakub

+0

@Jakub,我添加例如如何插入缺少的標誌termVectors。 –

+0

謝謝卡斯滕。 – Jakub