我使用下面的命令運行在Nutch的solrindex抓取的數據:錯誤,同時運行Solr的指數
bin/nutch solrindex <prep><code>http://127.0.0.1:8983/solr/ /app/hadoop/tmp/crawled_pages/crawldb -linkdb /app/hadoop/tmp/crawled_pages/linkdb /app/hadoop/tmp/crawled_pages/segments/*
我得到下面的錯誤,我不能夠根本原因,這一問題。
org.apache.solr.common.SolrException: ERROR: [doc=http://www.bbc.co.uk/portugueseafrica/arquivo/index.shtml] unknown field 'cache'
ERROR: [doc=http://www.bbc.co.uk/portugueseafrica/arquivo/index.shtml] unknown field 'cache'
request: <prep><code>http://127.0.0.1:8983/solr/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:195)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2012-12-10 10:05:49,198 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
有沒有人有類似的問題?
我不明白什麼是根本原因這個錯誤以下..
org.apache.solr.common.SolrException: ERROR: [doc=http://www.bbc.co.uk/portugueseafrica/arquivo/index.shtml] unknown field 'cache'
'cache'字段在哪裏?它看起來像問題是在你的schemma,請檢查 – luchosrock
,當我看看http://www.bbc.co.uk/portugueseafrica/arquivo/index.shtml頁面,有在html代碼中。除此之外,我沒有任何線索。 – Swamy
你是指solr配置中的schema.xml嗎? – Swamy