2013-04-11 49 views
0

我正在開發一個GWT過濾器,以使我的GWT應用可以被抓取。這個想法是,當它找到一個醜陋的URL是這樣的:GWT過濾器從未執行

http://www.myapp.com/?_escaped_fragment_=v;id=Mv67mC13Yizr

呈現出良好的一個:

http://www.myapp.com/#!v;id=Mv67mC13Yizr

然而,代碼永遠不會reachs中的doFilter()。爲什麼?

Web.xml中

<filter> 
    <filter-name>guiceFilter</filter-name> 
    <filter-class>com.google.inject.servlet.GuiceFilter</filter-class> 
</filter> 

<filter-mapping> 
    <filter-name>guiceFilter</filter-name> 
    <url-pattern>/*</url-pattern> 
</filter-mapping> 

DispatchServletModule.java

public class DispatchServletModule extends ServletModule { 

    @Override 
    public void configureServlets() { 
     serve("/" + ActionImpl.DEFAULT_SERVICE_NAME) 
       .with(DispatchServiceImpl.class); 
     filter("/").through(CrawlerServiceImpl.class); 
    } 
} 

CrawlerServiceImpl.java

@Singleton 
    public final class CrawlerServiceImpl implements Filter { 
     private static final String ESCAPED_FRAGMENT_FORMAT1 = "_escaped_fragment_="; 
     private final int ESCAPED_FRAGMENT_LENGTH1 = ESCAPED_FRAGMENT_FORMAT1.length(); 
     private static final String ESCAPED_FRAGMENT_FORMAT2 = "&"+ESCAPED_FRAGMENT_FORMAT1; 
     private final int ESCAPED_FRAGMENT_LENGTH2 = ESCAPED_FRAGMENT_FORMAT2.length(); 

     @Inject(optional = true) 
     private final Provider<WebClient> webClientProvider = null; 

     @Override 
     public void init(FilterConfig filterConfig) throws ServletException { 
     } 
     @Override 
     public void destroy() { 
     } 

     @Override 
     public void doFilter(ServletRequest request, ServletResponse response, 
      FilterChain chain) throws IOException, ServletException { 
     HttpServletRequest req = (HttpServletRequest) request; 
     HttpServletResponse res = (HttpServletResponse) response; 
     String queryString = req.getQueryString(); 

     final String requestURI = req.getRequestURI(); 
     if ((queryString != null) && (queryString.contains(ESCAPED_FRAGMENT_FORMAT1))) { 
      try { 
      StringBuilder pageNameSb = new StringBuilder("http://"); 
      pageNameSb.append(req.getServerName()); 
      if (req.getServerPort() != 0) { 
       pageNameSb.append(":"); 
       pageNameSb.append(req.getServerPort()); 
      } 
      pageNameSb.append(requestURI); 
      queryString = rewriteQueryString(queryString); 
      pageNameSb.append(queryString); 
      String pageName = pageNameSb.toString(); 
      WebClient webClient; 
      if(webClientProvider == null) 
       webClient = new WebClient(BrowserVersion.FIREFOX_3_6); 
      else 
       webClient = webClientProvider.get(); 

      webClient.setThrowExceptionOnScriptError(false); 
      webClient.setJavaScriptEnabled(true); 
      HtmlPage page = webClient.getPage(pageName); 

      res.setContentType("text/html;charset=UTF-8"); 
      PrintWriter out = res.getWriter(); 
      out.println("<hr />"); 
      out.println("<center><h3>You are viewing a non-interactive page that is intended for the crawler. " 
       + "You probably want to see this page: <a href=\"" 
       + pageName 
       + "\">" 
       + pageName + "</a></h3></center>"); 
      out.println("<hr />");  
      out.println(page.asXml()); 
      webClient.closeAllWindows(); 
      out.println(""); 
      out.close(); 
      } 
      catch(Exception e) { 
      } 
     } else { 
      chain.doFilter(request, response); 
     } 
     } 

     private String rewriteQueryString(String queryString) throws UnsupportedEncodingException { 
     int index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT2); 
     int length = ESCAPED_FRAGMENT_LENGTH2; 
     if (index == -1) { 
      index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT1); 
      length = ESCAPED_FRAGMENT_LENGTH1; 
     } 
     if (index != -1) { 
      StringBuilder queryStringSb = new StringBuilder(); 
      if (index > 0) { 
      queryStringSb.append("?"); 
      queryStringSb.append(queryString.substring(0, index)); 
      } 
      queryStringSb.append("#!"); 
      queryStringSb.append(URLDecoder.decode(queryString.substring(index 
       + length, queryString.length()), "UTF-8")); 
      return queryStringSb.toString(); 
     } 
     return queryString; 
     } 
} 

回答

1

<url-pattern>是無效的,*只允許爲/*後綴或者圖案的*.前綴;而模式只適用於路徑,而不是查詢字符串。

你有你的過濾器映射到/和過濾器檢查爲_escaped_fragment_參數中(我personnaly檢查getMethod()"GET"然後用getParameter("_escaped_fragment_"))來決定是否使用WebClient抓取和呈現網頁服務器端,或只是鏈接到下一個過濾器。

需要注意的是,你在你的web.xml聲明你的過濾器將不會被注入吉斯,所以像Dvd Prd說你可能寧願聲明在Guice的ServletModule過濾器。請注意,與標準映射類似,只有路徑匹配,所以上述情況仍然適用(即使filterRegex()也不起作用)。

+0

謝謝。然後,按照您在最後一段中的說法,從上面的代碼中,我剛剛從web.xml中刪除了這些過濾條目,並將代碼Dvd Prd插入到我的DispatchServletModule.java中。但仍然不起作用。請在新代碼上方找到。 – Arturo 2013-04-11 12:17:54

+1

正如我所說的,模式只在路徑上匹配,所以即使綁定在Guice'ServletModule'上,你必須綁定到'/'並修改你的過濾器來處理查詢字符串(我的響應的第一部分) – 2013-04-11 12:55:48

+0

謝謝。我再次更新它:filter(「/」)。through(CrawlerServiceImpl.class);但是,doFilter()中的代碼永遠不會執行。有任何想法嗎? – Arturo 2013-04-11 14:32:40

1

Guice採取所有過濾器。要添加你需要聲明它在你的吉斯servlet module過濾器:

filter("/?_escaped_fragment_=*").through(CrawlerServiceImpl.class);

+1

'CrawlerServletFilter'聲明並映射'GuiceFilter',以便在'GuiceFilter'之前執行'GuiceFilter'。 – 2013-04-11 11:28:03