2011-11-04 69 views
0

我是一名python新手。我的腳本(如下)包含一個名爲 「fn_regex_raw_date_string」的函數,用於將 這樣的一個「原始」日期字符串轉換爲如下所示的日期字符串:2011年10月31日星期一8:15 PM -31_PM_8-15_正則表達式替換函數包含太多文本

問題1號:當「原始」的日期字符串包含多餘的字符 如(xxxxxMon,2011年10月31日8時15分PMyyyyyy),應該如何 修改我的正則表達式例行排除無關字符?

I was tempted to remove my comments from the script below to make it 
    simpler to read, but I thought it might be more helpful for me to leave 
    them in the script. 

問題2:我懷疑,我應該代碼的另一種功能在 「2011-OCT-31_PM_8-15_」 與 「11」,將 取代 「十月」。但我不能 幫助想知道是否有某種方法可以在我的fn_regex_raw_date_string函數 中包含該功能。

任何幫助將不勝感激。

謝謝 Marceepoo

import sys 
import re, pdb 
#pdb.set_trace() 

def fn_get_datestring_sysarg(): 
    this_scriptz_FULLName = sys.argv[0] 
    try: 
     date_string_raw = sys.argv[1] 
    #except Exception, e: 
    except Exception: 
     date_string_raw_error = this_scriptz_FULLName + ': sys.argv[1] error: No command line argument supplied' 
     print date_string_raw_error 
    #returnval = this_scriptz_FULLName + '\n' + date_string_raw 
    returnval = date_string_raw 
    return returnval 

def fn_regex_raw_date_string(date_string_raw): 
    # Do re replacements 
    # p:\Data\VB\Python_MarcsPrgs\Python_ItWorks\FixCodeFromLegislaturezCalifCode_MikezCode.py 
    # see also (fnmatch) p:\Data\VB\Python_MarcsPrgs\Python_ItWorks\bookmarkPDFs.aab.py 

    #srchstring = r"(.?+)(Sun|Mon|Tue|Wed|Thu|Fri|Sat)(,)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)()([\d]{1,2})(,)([\d]{4})(at)([\d]{1,2})(\:)([\d]{1,2})()(A|P)(M)(.?+)" 
    srchstring = r"(Sun|Mon|Tue|Wed|Thu|Fri|Sat)(,)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)()([\d]{1,2})(,)([\d]{4})(at)([\d]{1,2})(\:)([\d]{1,2})()(A|P)(M)" 

    srchstring = re.compile(srchstring)  
    replacement = r"_\7-\3-\5_\13M_\9-\11_" 
    #replacement = r"_\8-\4-\6_\14M_\10-\12_"  
    regex_raw_date_string = srchstring.sub(replacement, date_string_raw) 

    return regex_raw_date_string 

    # Mon, Oct 31, 2011 at 8:15 PM 
if __name__ == '__main__': 
    try: 
     this_scriptz_FULLName = sys.argv[0] 
     date_string_raw = fn_get_datestring_sysarg() 
     date_string_mbh = fn_regex_raw_date_string(date_string_raw) 
     print date_string_mbh 
    except: 
     print 'error occurred - fn_get_datestring_sysarg()' 

回答

0

該代碼使用正則表達式,在替換一切在縮短的工作日之前匹配字符串的開始,以及那麼匹配AM或PM後,所有內容都將放在字符串末尾。

然後調用datetime.strptime(date_str, date_format)這確實解析的辛勤工作和爲我們提供了一個datetime實例:

from datetime import datetime 

import calendar 
import re 

# ------------------------------------- 

# _months = "|".join(calendar.month_abbr[1:]) 
_weekdays = "|".join(calendar.day_abbr) 

_clean_regex = re.compile(r""" 
    ^
    .*? 
    (?=""" + _weekdays + """) 
    | 
    (?<=AM|PM) 
    .*? 
    $ 
""", re.X) 

# ------------------------------------- 

def parseRawDateString(raw_date_str): 
    try: 
     date_str = _clean_regex.sub("", raw_date_str) 
     return datetime.strptime(date_str, "%a, %b %d, %Y at %I:%M %p") 

    except ValueError as ex: 
     print("Error parsing date from '{}'!".format(raw_date_str)) 
     raise ex 

# ------------------------------------- 

if __name__ == "__main__": 
    from sys import argv 

    s = argv[1] if len(argv) > 1 else "xxxxxMon, Oct 31, 2011 at 8:15 PMyyyyyy" 

    print("Raw date:  '{}'".format(s)) 
    d = parseRawDateString(s) 
    print("datetime object:") 
    print(d) 
    print("Formatted date: '{}'".format(d.strftime("%A, %d %B %Y @ %I:%M %p")))