2012-11-02 43 views
3

我在我的服務器端口80上安裝了清漆緩存,Apache作爲端口8080上的內容服務器。如果我運行wget --mirror example.com,它應該爬行我的整個網站並創建清漆緩存,對吧?它不是。例如,通過運行wget --mirror example.com,我可以在輸出中看到它已通過example.com/abc.html運行。但是,當我從我的瀏覽器訪問example.com/abc.html時,從我的響應標題中可以看到它正在返回Varnish MISS(並且需要很長時間)。但是,如果我通過瀏覽器再次訪問相同的URL,則此時已生成緩存,因爲我可以在響應頭中看到Varnish HITwget --mirror沒有創建清漆緩存

這裏還有一個有趣的事實:如果我只運行wget example.com/ abc.html它會創建清漆緩存!另一個有趣的事實是:如果我運行wget --mirror example.com/abc.html,它將爲abc.html創建清漆緩存,但不會創建更多頁面

因此,出於某種原因,使用--mirror example.com創建清漆緩存首頁但不是更多頁面。

我使用Magento的,如果這有什麼差別

我曾嘗試: wget的--mirror --no-HTTP-保持活動example.com ,但它不工作

這裏我的清漆VCL

# This is a basic VCL configuration file for PageCache powered by Varnish for Magento module. 

# default backend definition. Set this to point to your content server. 
backend default { 
    .host = "127.0.0.1"; 
    .port = "8080"; 
} 

# admin backend with longer timeout values. Set this to the same IP & port as your default server. 
backend admin { 
    .host = "127.0.0.1"; 
    .port = "8080"; 
    .first_byte_timeout = 18000s; 
    .between_bytes_timeout = 18000s; 
} 

# add your Magento server IP to allow purges from the backend 
acl purge { 
    "localhost"; 
    "127.0.0.1"; 
} 


sub vcl_recv { 
    if (client.ip ~ purge) { 
     set req.hash_always_miss = true; 
    } 

if (req.restarts == 0) { 
    if (req.http.x-forwarded-for) { 
     set req.http.X-Forwarded-For = 
     req.http.X-Forwarded-For + ", " + client.ip; 
    } else { 
     set req.http.X-Forwarded-For = client.ip; 
    } 
} 

if (req.request != "GET" && 
    req.request != "HEAD" && 
    req.request != "PUT" && 
    req.request != "POST" && 
    req.request != "TRACE" && 
    req.request != "OPTIONS" && 
    req.request != "DELETE" && 
    req.request != "PURGE") { 
    /* Non-RFC2616 or CONNECT which is weird. */ 
    return (pipe); 
} 

# purge request 
if (req.request == "PURGE") { 
    if (!client.ip ~ purge) { 
     error 405 "Not allowed."; 
    } 
    ban("obj.http.X-Purge-Host ~ " + req.http.X-Purge-Host + " && obj.http.X-Purge-URL ~ " + req.http.X-Purge-Regex + " && obj.http.Content-Type ~ " + req.http.X-Purge-Content-Type); 
    error 200 "Purged."; 
} 

# switch to admin backend configuration 
if (req.http.cookie ~ "adminhtml=") { 
    set req.backend = admin; 
} 

# we only deal with GET and HEAD by default  
if (req.request != "GET" && req.request != "HEAD") { 
    return (pass); 
} 

# normalize url in case of leading HTTP scheme and domain 
set req.url = regsub(req.url, "^http[s]?://[^/]+", ""); 

# static files are always cacheable. remove SSL flag and cookie 
if (req.url ~ "^/(media|js|skin)/.*\.(png|jpg|jpeg|gif|css|js|swf|ico)$") { 
    unset req.http.Https; 
    unset req.http.Cookie; 
} 

# not cacheable by default 
if (req.http.Authorization || req.http.Https) { 
    return (pass); 
} 

# do not cache any page from 
# - index files 
# - ... 
if (req.url ~ "^/(index)") { 
    return (pass); 
} 

# as soon as we have a NO_CACHE cookie pass request 
if (req.http.cookie ~ "NO_CACHE=") { 
    return (pass); 
} 

# normalize Aceept-Encoding header 
# http://varnish.projects.linpro.no/wiki/FAQ/Compression 
if (req.http.Accept-Encoding) { 
    if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$") { 
     # No point in compressing these 
     remove req.http.Accept-Encoding; 
    } elsif (req.http.Accept-Encoding ~ "gzip") { 
     set req.http.Accept-Encoding = "gzip"; 
    } elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "MSIE") { 
     set req.http.Accept-Encoding = "deflate"; 
    } else { 
      # unkown algorithm 
      remove req.http.Accept-Encoding; 
     } 
    } 

    # remove Google gclid parameters 
    set req.url = regsuball(req.url,"\?gclid=[^&]+$",""); # strips when QS = "?gclid=AAA" 
    set req.url = regsuball(req.url,"\?gclid=[^&]+&","?"); # strips when QS = "?gclid=AAA&foo=bar" 
    set req.url = regsuball(req.url,"&gclid=[^&]+",""); # strips when QS = "?foo=bar&gclid=AAA" or QS = "?foo=bar&gclid=AAA&bar=baz" 

    return (lookup); 
} 

# sub vcl_pipe { 
#  # Note that only the first request to the backend will have 
#  # X-Forwarded-For set. If you use X-Forwarded-For and want to 
#  # have it set for all requests, make sure to have: 
#  # set bereq.http.connection = "close"; 
#  # here. It is not set by default as it might break some broken web 
#  # applications, like IIS with NTLM authentication. 
#  return (pipe); 
# } 
# 
# sub vcl_pass { 
#  return (pass); 
# } 
# 
sub vcl_hash { 
    hash_data(req.url); 
    if (req.http.host) { 
     hash_data(req.http.host); 
    } else { 
     hash_data(server.ip); 
    } 
    if (!(req.url ~ "^/(media|js|skin)/.*\.(png|jpg|jpeg|gif|css|js|swf|ico)$")) { 
     call design_exception; 
    } 
    return (hash); 
} 
# 
# sub vcl_hit { 
#  return (deliver); 
# } 
# 
# sub vcl_miss { 
#  return (fetch); 
# } 

sub vcl_fetch { 
    if (beresp.status == 500) { 
     set beresp.saintmode = 10s; 
     return (restart); 
    } 
    set beresp.grace = 5m; 

    # add ban-lurker tags to object 
    set beresp.http.X-Purge-URL = req.url; 
    set beresp.http.X-Purge-Host = req.http.host; 

    if (beresp.status == 200 || beresp.status == 301 || beresp.status == 404) { 
     if (beresp.http.Content-Type ~ "text/html" || beresp.http.Content-Type ~ "text/xml") { 
      if ((beresp.http.Set-Cookie ~ "NO_CACHE=") || (beresp.ttl < 1s)) { 
       set beresp.ttl = 0s; 
       return (hit_for_pass); 
      } 

      # marker for vcl_deliver to reset Age: 
      set beresp.http.magicmarker = "1"; 

      # Don't cache cookies 
      unset beresp.http.set-cookie; 
      } else { 
       # set default TTL value for static content 
       set beresp.ttl = 4h; 
      } 
     return (deliver); 
    } 

    return (hit_for_pass); 
} 

sub vcl_deliver { 
    # debug info 
    if (resp.http.X-Cache-Debug) { 
     if (obj.hits > 0) { 
      set resp.http.X-Cache = "HIT"; 
      set resp.http.X-Cache-Hits = obj.hits; 
     } else { 
      set resp.http.X-Cache = "MISS"; 
     } 
     set resp.http.X-Cache-Expires = resp.http.Expires; 
    } else { 
     # remove Varnish/proxy header 
     remove resp.http.X-Varnish; 
     remove resp.http.Via; 
     remove resp.http.Age; 
     remove resp.http.X-Purge-URL; 
     remove resp.http.X-Purge-Host; 
    } 

    if (resp.http.magicmarker) { 
     # Remove the magic marker 
     unset resp.http.magicmarker; 

     set resp.http.Cache-Control = "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"; 
     set resp.http.Pragma = "no-cache"; 
     set resp.http.Expires = "Mon, 31 Mar 2008 10:00:00 GMT"; 
     set resp.http.Age = "0"; 
    } 
} 

# sub vcl_error { 
#  set obj.http.Content-Type = "text/html; charset=utf-8"; 
#  set obj.http.Retry-After = "5"; 
#  synthetic {" 
# <?xml version="1.0" encoding="utf-8"?> 
# <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
# "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
# <html> 
# <head> 
#  <title>"} + obj.status + " " + obj.response + {"</title> 
# </head> 
# <body> 
#  <h1>Error "} + obj.status + " " + obj.response + {"</h1> 
#  <p>"} + obj.response + {"</p> 
#  <h3>Guru Meditation:</h3> 
#  <p>XID: "} + req.xid + {"</p> 
#  <hr> 
#  <p>Varnish cache server</p> 
# </body> 
# </html> 
# "}; 
#  return (deliver); 
# } 
# 
# sub vcl_init { 
# return (ok); 
# } 
# 
# sub vcl_fini { 
# return (ok); 
# } 

sub design_exception { 
} 

編輯答案:

我不不知道添加--no-cookies是否修復它(不知道wget --mirror是否存儲cookie,如果它確實會修復它)或者添加標頭是否修復它,但是這會起作用並創建清漆緩存,我可以通過我的瀏覽器中看到:

wget --spider --recursive --no-cookies --header "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" --header "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3" --header "Accept-Language: en-US,en;q=0.8" --header "Cache-Control: max-age=0" --header "Connection: keep-alive" --header "Host: www.example.com" --header "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.19 (KHTML, like Gecko) Ubuntu/10.04 Chromium/18.0.1025.168 Chrome/18.0.1025.168 Safari/535.19" www.example.com 

第二個編輯,與前面的回答。重要

任何人使用Magento,不要使用我的上述解決方案。由於--no-cookies,Magento最終爲每個請求在var/session文件夾下創建一個新的會話文件。這導致我的會話文件夾被上面命令的每個wget運行的250,000個文件填充!這導致文件夾已滿,我的客戶都無法將任何內容添加到他們的購物籃中,因爲Magento無法爲他們創建更多的會話文件。我找了更多的選擇我的問題

回答

1

你應該考慮兩點:

  • 所有現代瀏覽器發送一些Accept-Encoding ~ 'gzip'頭,所以緩存條目不會,如果你的蜘蛛不使用這一個使用(一個體面的後端生成gzipped響應增加了一個變化:Accept-Encoding標題)。
  • 您的後端爲每個無cookie用戶生成cookie。你的腳本應該保留它的cookie,但是如果它們無關緊要,你的緩存規則應該忽略cookie。但是,如果您的回覆中包含購物車或其他內容(非常依賴用戶/狀態/ Cookie),則無法緩存該內容,因此每次都必須重新創建此響應。你可以用javascript/iframe來考慮我們的可變部分,但是你需要(重新)設計你的應用程序使其可以被緩存。