我想解析一個有意義的日期的自由窗體日期字符串。到目前爲止,我已經提出了這個功能:有沒有更好的方法在python中進行日期解析呢?
"""Parse raw date string into YYYY-MM-DD"""
def __parseDate(self, rawDate):
if len(rawDate) == 0:
return u""
if "{{Birth year and age|" in rawDate:
rawDate = rawDate.replace("{{","").replace("}}","")
year = rawDate.split("|")[1].strip()
return year + "-01-01"
elif "{{Birth date and age|" in rawDate:
rawDate = rawDate.replace("{{","").replace("}}","")
year = rawDate.split("|")[1].strip()
month = rawDate.split("|")[2].strip()
day = rawDate.split("|")[3].strip()
if len(month) == 1:
month = "0" + month
if len(day) == 1:
day = "0" + day
return year + "-" + month + "-" + day
elif "{{" in rawDate:
self.__log(u"XXX Date parse error (unknown template): " + rawDate)
return u"1770-01-01"
elif re.match("([a-zA-Z]* [0-9][0-9]?, [0-9][0-9][0-9][0-9])", rawDate):
match = re.findall("([a-zA-Z]* [0-9][0-9]?, [0-9][0-9][0-9][0-9])", rawDate)[0]
parts = match.replace(",","").split(" ")
year = parts[2].strip()
month = parts[0].replace(".","").strip()
day = parts[1].strip()
tryAgain = False
try:
month = str(strptime(month,'%B').tm_mon)
except:
tryAgain = True
pass
try:
if tryAgain:
month = str(strptime(month,'%b').tm_mon)
except:
self.__log(u"XXX Date parse error: " + rawDate)
return u"1770-01-01"
pass
if len(month) == 1:
month = "0" + month
if len(day) == 1:
day = "0" + day
return year + "-" + month + "-" + day
elif re.match("[0-9][0-9][0-9][0-9]-[0-9][0-9]?-[0-9][0-9]?", rawDate):
parts = rawDate.split("-")
year = parts[0].strip()
month = parts[1].strip()
day = parts[2].strip()
if len(month) == 1:
month = "0" + month
if len(day) == 1:
day = "0" + day
return year + "-" + month + "-" + day
else:
self.__log(u"XXX Date parse error: " + rawDate)
return u"1770-01-01"
我在正確的軌道上還是有更好的方式去?
更新通過自由格式字符串我的意思是這是來自維基頁面,特別是人的數據模板。這個模板中的日期字段是自由形式的,因爲人類已經在其中輸入了一些東西。通常這是以任意數量格式的日期,或者它本身是另一個描述日期的維基模板。以下是一些可能在現場的例子:
{{Birth year and age|1933}}
August 23, 1967
1990-01-29
23 August 1967
1999
a;lsdfhals;djkfh
也許你可以指定「自由格式日期字符串」的含義。你的功能是做什麼的? '「{{出生年齡和年齡|」'? –
啊哈,我剛纔意識到你正在解析*維基百科*數據頁面,這就是有趣的'{{'語法來自。 –