2013-01-20 24 views
1

我已經爲我的目標構建了一個正則表達式模式:一個包含數據的字符串來自CSV文件。我在編程方面幾乎是一個新手,但我真的被困在這一步,我努力解決這個問題,因爲正則表達式是(我認爲...)我的問題的最佳選擇,是從CSV文件中搜索數據,但它們之間有一些差異,但遵循一種遵循正式協議(來自生物信息學領域的MIAME文件)的模式。這是我的代碼Python的正則表達式模式在測試人員甚至匹配的字符串中不匹配(重試,正則表達式教練,http://ksamuel.pythonanywhere.com)

import re 
    ficheiro=open(raw_input('write the name of the file (formato CSV):'), 'r') 
    lista_file=ficheiro.readlines() 
    str_file=str(lista_file) 
    list_spr=[] 
    value_spr=[] 
    for a in str_file: 
     regex_spr = re.search(r"(spr[0-9]{4})[^\t.]*\t([0-9.]+)", a, re.I|re.M) 
     print regex_spr.group() 
     list_spr +=regex_spr.group(1) 
     value_spr +=regex_spr.group(2) 

,但結果始終是一些與'NoneType',像

Traceback (most recent call last): 
    File "C:\EDPython27\test\put_words_in_dict.py", line 112, in <module> 
    print regex_spr.group() 
AttributeError: 'NoneType' object has no attribute 'group' 

接下來是一些我用來測試模式str_file的範圍:

('Reporter Identifier\tVALUE\n', 'spr0320060100000320\t4.784064198\n', 'spr0963060100000963\t3.646246197\n', 'spr1586060100001584\t5.755770215\n', 'spr1102060100001101\t5.794439261\n', 'spr1727060100001725\t6.452100774\n', 'spr0552060100000552\t6.816527711\n', 'spr0807060100000807\t3.185267941\n', 'spr0322060100000322\t5.889496662\n', 'spr0971060100000971\t3.112604228\n', 'spr0490060100000490\t6.608164616\n', 'spr0471060100000471\t6.807244139\n', 'spr60100000321\t5.331036948\n', 'spr1070060100001069\t6.408937689\n', 'spr1585060100001583\t6.157044216\n', 'spr1189060100001188\t3.481847857\n', 'spr1191060100001190\t3.523784616\n', 'spr1081060100001080\t6.708517655\n', 'spr1071060100001070\t7.092586967\n', 'spr1101060100001100\t6.294650154\n', 'spr0561060100000561\t7.52495517\n', 'spr0802060100000802\t8.299020685\n', 'spr1195060100001194\t6.143485258\n', 'spr0470060100000470\t5.869271803\n', 'spr1944060100001941\t7.060765363\n', 'spr0968060100000968\t6.276636704\n', 'spr1072060100001071\t7.267895537\n', 'spr0972060100000972\t5.535911422\n', 'spr1821060100001819\t7.660640949\n', 'spr0316060100000316\t6.399083059\n', 'spr0129060100000129\t6.693897057\n', 'spr0966060100000966\t6.208969299\n', 'spr0323060100000323\t6.230187159\n', 'spr1466060100001465\t7.609506586\n', 'spr0964060100000964\t6.286528191\n', 'spr1665060100001663\t5.597969101\n', 'spr0969060100000969\t5.122425278\n', 'spr1394060100001393\t7.310099682\n', 'spr0683060100000683\t7.397780719\n', 'spr1649060100001647\t6.121430945\n', 'spr0536060100000536\t7.936838283\n', 'spr1020060100001020\t7.339227818\n', 'spr0682060100000682\t7.435907739\n', 'spr0606060100000606\t6.251491879\n', 'spr0491060100000491\t5.400560984\n', 'spr0939060100000939\t6.928170725\n', 'spr1492060100001491\t7.451461913\n', 'spr0965060100000965\t5.610110186\n', 'spr1188060100001187\t3.384989187\n', 'spr1296060100001295\t5.927021756\n') 

對所有顧問,我提前感謝。

+3

當沒有找到匹配項時're.search'返回'None'。 –

+0

是的,我知道。但我的問題只是不知道爲什麼,主要是WHEN是正則表達式中的錯誤,因爲它在測試人員中工作。也請忽略最後兩行。我甚至不知道它們在語法上是否正確 – BioInfoPT

回答

1

docsre.search()上:通過串

掃描尋找其中定期 表達模式產生一個匹配的位置,並返回對應的 MatchObject實例。返回如果字符串中沒有位置匹配 模式

因此這裏的解決辦法將是檢查regex_spr是否None與否。

for a in str_file: 
    regex_spr = re.search(r"(spr[0-9]{4})[^\t.]*\t([0-9.]+)", a, re.I|re.M) 
    if regex_spr is not None: 
     print regex_spr.group() 
     list_spr +=regex_spr.group(1) 
     value_spr +=regex_spr.group(2) 
    else: 
     #do something else