我正在使用nutch 2.1並爬行一個網站。問題在於抓取程序不斷顯示抓取url spinwaiting/active,並且由於抓取需要很多時間,所以與mysql的連接會超時。我怎樣才能減少一次提取數量,以便mysql不會超時?在nutch中有一個設置,我可以說只能獲取100或500個URL,然後解析並存儲到mysql,然後再次獲取下一個100或500個URL?nutch爬行陷入spinwaiting或活動。如何減少獲取週期?
錯誤消息:
Unexpected error for http://www.example.com
java.io.IOException: java.sql.BatchUpdateException: The last packet successfully received from the server was 36,928,172 milliseconds ago. The last packet sent successfully to the server was 36,928,172 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:663)
at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:534)
Caused by: java.sql.BatchUpdateException: The last packet successfully received from the server was 36,928,172 milliseconds ago. The last packet sent successfully to the server was 36,928,172 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2028)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1451)
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
... 5 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 36,928,172 milliseconds ago. The last packet sent successfully to the server was 36,928,172 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at sun.reflect.GeneratedConstructorAccessor49.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1116)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3364)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2127)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2427)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1980)
... 7 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345)
... 13 more