數據硝基和正則表達式（Python）的

我在MS Excel 2010中此列 - 其具有「郵政編碼」和一個組合的電子郵件的IDS'數據硝基和正則表達式（Python）的

我試圖提取這些拉鍊碼 （20530，90012-3308等）。

20530 [email protected] 
    20530 [email protected] 
    20530 [email protected] 
    20530 [email protected] 
    20004 [email protected] 
    20530 [email protected] 
    90012-3308 [email protected] 
    90012-3308 [email protected] 
    90012 [email protected]

我試過Python的re模塊。

import re 


for i in range(1, 9): 
    Cell(i, 4).value = re.findall(r'\d+', Cell(i, 1).value) #storing result in column4

我跑了該列的正則表達式，我得到了這樣的結果：

[u'20530'] 
[u'20530'] 
[u'20530'] 
[u'20530'] 
[u'20004', u'9'] 
[u'20530', u'8'] 
[u'90012', u'3308'] 
[u'90012', u'3308'] 
[u'90012']

我怎樣才能提取結果，進入人類可讀的郵遞區號形式？

來源

2014-05-15 hky404

什麼都牛逼他的正則表達式實驗的結果？爲什麼不只是'.split（）[0]'？ – jonrsharpe

'[u'20530 '] [u'20530'] [u'20530 '] [u'20530'] [u'20004' ，u'9 '] [u'20530'，U」 8'] [u'90012'，u'3308'] [u'90012'，u'3308'] [u'90012']' – hky404

下面的正則表達式將每個字符串匹配並提取郵政編碼爲組1：

([\d\-]+)\s+[\[email protected]\.]+

這是Pyt漢典提取所有郵政編碼的一次：

import re 
text = r''' 20530 [email protected] 
    20530 [email protected] 
    20530 [email protected] 
    20530 [email protected] 
    20004 [email protected] 
    20530 [email protected] 
    90012-3308 [email protected] 
    90012-3308 [email protected] 
    90012 [email protected]''' 
re.compile(r'([\d\-]+)\s+[\[email protected]\.]+').findall(text)

來源

2014-05-15 19:05:50

爲什麼你不能只split？

>>> '20530 [email protected]'.split() 
['20530', '[email protected]']

然後只需抓住第一個元素。

>>> '20530 [email protected]'.split()[0] 
'20530'

對於所有的數據：

l = ['20530 [email protected]', 
    '20530 [email protected]', 
    '20530 [email protected]', 
    '20530 [email protected]', 
    '20004 [email protected]', 
    '20530 [email protected]', 
    '90012-3308 [email protected]', 
    '90012-3308 [email protected]', 
    '90012 [email protected]'] 

[entry.split()[0] for entry in l]

結果

['20530', '20530', '20530', '20530', '20004', '20530', '90012-3308', '90012-3308', '90012']

來源

2014-05-15 19:03:47 CoryKramer

實際上，我在這裏使用了400行，並且手動添加了逗號並且引用一個特定的列表項是不可行的，我說的是你的**列表l ** – hky404

你可以把'str（Cell（i）.value）'例如把它變成一個字符串。 – CoryKramer

只是附加的註釋使得特定於DataNitro你原來的問題的答案。

做了大量DataNitro的loopinfg這樣的，在一整列閱讀的是efficiant的方法是：

l = Cell("A1").vertical 
# returns a list of all values starting in A1 going down to 1st blank cell

與@網絡的解決方案二襯結合會給你答案：

l = Cell("A1").vertical 
[entry.split()[0] for entry in l]

或者如果你喜歡的正則表達式喬納森·本答案becomomes的靈活性：

l = Cell("A1").vertical 
[re.compile(r'([\d\-]+)\s+[\[email protected]\.]+').findall(entry) for entry in l]

來源

2014-07-30 08:22:06 Joop

數據硝基和正則表達式（Python）的

回答

相關問題