2014-02-20 64 views
0

內返回腳本生成的電子郵件ID我有一個文件對象爲:Jsoup如何

Document secDoc = Jsoup.connect(a.attr("abs:href")).timeout(30*1000).get(); 
String txt = secDoc.text(); 

現在,當我調試的上方,我檢查secDoc的價值,我得到它有一個正常的頁面源元素:

For questions about your order, including anything shipping or billing related, please email <script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>. 

如果你看到自己的網頁,你可以看到一個路線爲:For questions about your order, including anything shipping or billing related, please email [email protected] We only do email support at this time. 有趣的是,這個腳本生成的頁面上的電子郵件ID。做一個檢查元素,我得到:

<p> 
       For questions about your order, including anything shipping or billing related, please email <a href="mailto:[email protected]">[email protected]</a><script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>. 
       We only do email support at this time.<br><br> 
       Hours of operation: <strong>Monday-Friday 8am - 6pm PT.</strong> 
       <br> 
       <strong>Shipping Times</strong>: 
       We strive to fulfill the orders within 3-5 working days. When we are really busy we may take a day or two longer. 
       We ship orders Monday - Friday, so if your order is placed Friday evening we may not be able to process it until the following Monday. 
       If we are behind, it may be a few days before we respond. The Oatmeal is an extremely small operation so please be patient. 
       <br> 
       <a href="http://shop.theoatmeal.com/pages/shipping">More Shipping Info</a><br><br> 
       Questions about shirt sizes? <a href="http://shop.theoatmeal.com/pages/shipping#shirts">Shirt Sizing Info</a> 
      </p> 

所以錨:<a href="mailto:[email protected]">[email protected]</a> 越來越由腳本生成。

是否有無論如何我可以得到這個錨使用Jsoup(或任何其他手段)?

回答

1

對於此特定網站,地址的用戶和域部分位於腳本標記中,因此選擇腳本標記,獲取其文本,使用正則表達式解析該文本,然後將用戶和電子郵件連接起來,並將其與@在之間。您的選擇器可能只是script:contains(write_email),假設write_email未在頁面的其他位置使用。這僅適用於地址在文本中顯示的地方,即使它是兩件。

一般來說,Jsoup不是JavaScript引擎。如果您想使用Web瀏覽器查看人類看到的同一頁面,您可以嘗試像Selenium這樣的瀏覽器自動化工具。