我正在創建一個Google Dork工具,它將一個URL編碼查詢發送到google.com,並將結果作爲鏈接數組返回。Ruby谷歌請求失敗
#!/usr/bin/env ruby
require 'cgi'
require 'socket'
# define full path to library
cwd = File.expand_path(File.dirname(__FILE__))
lib = File.join(cwd, "lib")
# require project library files
Dir.new(lib).each do |x|
next unless x[/\.rb$/]
begin
require File.join(lib, x)
rescue
raise LoadError, "Failed to load #{x}."
end
end
# build the google dork
def query(ext, site, inurl, intitle, intext)
query, values = "", []
dorks = %w(ext site inurl intitle intext)
values.push(ext, site, inurl, intitle, intext)
j = 0
values.each do |i|
dork = dorks[j]
if dork.match(/^in/)
value = %Q("#{i}")
else
value = i
end
query += "#{dork}:#{value} " unless i.nil?
j += 1
end
query
end
# sends the search query to google.com
def search(host, query, agent)
sock, links = TCPSocket.new(host, 80), []
query = CGI::escape(query).chop
request = "GET /search?q=#{query} HTTP/1.0\r\n\r\n"# HTTP/1.0\r\nUser-Agent: #{agent}\r\nConnection: Close\r\n\r\n"
sock.puts request
response = sock.read
body = response.split("\r\n\r\n", 2)[1]
body.split("url?q=").each do |link|
link = link.to_s.split("&", 0)[0]
links << link if link.match(/^http|^https/) and link !~ /^http:\/\/webcache/
end
links
end
agent = RandomAgent.new
host = "google.com"
q = query(ARGV[0], ARGV[1], ARGV[2], ARGV[3], ARGV[4])
puts search(host, q, agent.randomize)
由於某些原因,我還沒有弄清楚,如果我手動發送請求,它的工作原理。但是,如果我使用ruby腳本發送它,它將返回一個302錯誤。例如:
GET /search?q=ext%3Apdf+site%3Agithub.com+inurl%3A%22email%22 HTTP/1.0
這是我的腳本所產生的請求。但是,當使用腳本時,我得到一個HTTP 302錯誤。如果我使用nc手動發送相同的請求,則返回結果。
NC google.com 80
GET /search?q=ext%3Apdf+site%3Agithub.com+inurl%3A%22email%22 HTTP/1.0
論的頂,如果我只發送這樣的:
GET /search?q=ext%3Apdf+site%3Agithub.com HTTP/1.0
它的工作原理。第三個參數導致它出於某種原因而出現問題。我似乎無法弄清楚。謝謝。