如何從URL格式解析日期？

我的數據庫包含以文本字段形式存儲的URL，每個URL都包含報告日期的表示，報告本身缺失。如何從URL格式解析日期？

所以我需要從URL字段中的日期解析爲一個字符串表示，例如：

2010-10-12 
2007-01-03 
2008-02-07

什麼是提取日期的最好方法？

有些是這種格式：

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html 

http://e.com/data/invoices/2010/09/invoices-report-thursday-september-2-2010.html 

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html 

http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html 

http://e.com/data/invoices/2010/08/invoices-report-monday-august-30th-2010.html 

http://e.com/data/invoices/2009/05/invoices-report-friday-may-8th-2009.html 

http://e.com/data/invoices/2010/10/invoices-report-wednesday-october-6th-2010.html 

http://e.com/data/invoices/2010/09/invoices-report-tuesday-september-21-2010.html

的使用注意事項不一致的th月的情況下，如這兩個翌日：

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html 

http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html

其他人則在這格式（在日期開始之前帶有三個連字符，末尾沒有一年，並且在report之前可選使用invoices-）：

http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-1.html 

http://e.com/data/invoices/2010/09/invoices-report---thursday-september-2.html 

http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-15.html 

http://e.com/data/invoices/2010/09/invoices-report---monday-september-13.html 

http://e.com/data/invoices/2010/08/report---monday-august-30.html 

http://e.com/data/invoices/2009/05/report---friday-may-8.html 

http://e.com/data/invoices/2010/10/report---wednesday-october-6.html 

http://e.com/data/invoices/2010/09/report---tuesday-september-21.html

來源

2010-10-19 snoopy

你想這樣的正則表達式：

"^http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})"

這利用一切了通過URL的/年/月/部分始終是相同的，而且沒有數量如下直到一天月份。你有這些之後，你不會在意別的什麼。

第一個捕獲組是當年，第二個月和第三個。這一天可能沒有領先的零點;根據需要將字符串轉換爲整數和格式，或者只是獲取字符串長度，如果不是兩個，則將其連接到字符串「0」。

舉個例子：

import java.util.regex.*; 

class URLDate { 
    public static void 
    main(String[] args) { 
    String text = "http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html"; 
    String regex = "http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})"; 
    Pattern p = Pattern.compile(regex); 
    Matcher m = p.matcher(text); 
    if (m.find()) { 
     int count = m.groupCount(); 
     System.out.format("matched with groups:\n", count); 
     for (int i = 0; i <= count; ++i) { 
      String group = m.group(i); 
      System.out.format("\t%d: %s\n", i, group); 
     } 
    } else { 
     System.out.println("failed to match!"); 
    } 
    } 
}

給出了輸出：

matched with groups: 
    0: http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html 
    1: 2010 
    2: 09 
    3: 1

（注意，要使用Matcher.matches()而不是Matcher.find()，你將不得不通過追加，使圖案吃整個輸入字符串.*$）。

來源

2010-10-19 17:21:35

完美。感謝關於'matches（）'和'find（）'的警告。 – snoopy 2010-10-19 19:00:28

如何從URL格式解析日期？

回答

相關問題