2014-09-19 50 views
-1

我正在使用Python 2.7將我的表格數據轉換爲矩陣,我正在做一些分析以及我正在檢查單元是否具有NA(它是R輸出並且我將NAs用於缺失的數據點)。如果細胞有NA,我不做任何分析,只是傳遞給另一個。與NA的字符串比較在If語句中失敗

但它適用於其中的一些(前三行),但它不適用於第四行。該值也是NA,我正在以相同的方式檢查它。

代碼:

從CMD的這個特定代碼
def findMax(l, i): 
    r = [] 
    for x in range(0, 3): 
     if not l[i] == "NA": # Problem 
      print l[i] 
      if float(l[i]) <= 15: 
       if not l[i-1] == "NA": 
        if float(l[i-1]) <= 0.05: 
         if not l[i-2] == "NA": 
          r.append(float(l[i-2])) 

     i = i+12 
    if len(r) != 0: 
     return max(r) 
    else: 
     return 0 


fIn = open("D:/projects/salmon/rawData_full.csv", "r") 
fOut = open("D:/projects/salmon/dataAsMatrix.txt", "w") 
fOut.write("Prot"+"\t"+"2 min"+"\t"+"5 min"+"\t"+"10 min"+"\t"+"20 min"+"\n") 

for line in fIn: 
    cols = line.split(";"); 
    if cols[6] != "NA": 
     hgnc_symbol = cols[6]; 
     vals = [findMax(cols, 9), findMax(cols, 12), findMax(cols, 15), findMax(cols, 18)] 
     m = max(vals) 
     if m != 0: 
      mi = [i for i, j in enumerate(vals) if j == m] # Problem 
      if mi == [0]: 
       fOut.write(hgnc_symbol+"\t"+"1"+"\t"+"0"+"\t"+"0"+"\t"+"0"+"\n") 
      elif mi == [1]: 
       fOut.write(hgnc_symbol+"\t"+"0"+"\t"+"1"+"\t"+"0"+"\t"+"0"+"\n") 
      elif mi == [2]: 
       fOut.write(hgnc_symbol+"\t"+"0"+"\t"+"0"+"\t"+"1"+"\t"+"0"+"\n") 
      elif mi == [3]: 
       fOut.write(hgnc_symbol+"\t"+"0"+"\t"+"0"+"\t"+"0"+"\t"+"1"+"\n") 

fIn.close() 
fOut.close() 

輸出:

D:\projects\salmon>python processDataAsMatrix.py 
17.278 
16.37 
13.072 
11.251 
23.81 
4.3903 
8.284 
22.255 
5.9456 
25.727 
15.511 
13.448 

18.857 
17.056 
15.106 
33.84 
3.9582 
5.4985 

18.857 
17.056 
15.106 
33.84 
3.9582 
5.4985 

NA 

Traceback (most recent call last): 
    File "processDataAsMatrix.py", line 29, in <module> 
    vals = [findMax(cols, 9), findMax(cols, 12), findMax(cols, 15), findMax(cols 
, 18)] 
    File "processDataAsMatrix.py", line 8, in findMax 
    if float(l[i]) <= 15: 
ValueError: could not convert string to float: NA 

表:

1st row: ZYX 0.030963842 0.44073 17.278 0.026328939 0.34735 11.251 -0.020729408 0.40571 8.284 0.12169113 0.047 25.727 -0.038389092 0.23603 16.37 -0.028881936 0.39508 23.81 0.017909396 0.41499 22.255 0.258158193 0.021821 15.511 -0.01200769 0.33594 13.072 0.049101678 0.34596 43.903 0.019365575 0.44196 59.456 0.157124196 0.19583 13.448 
2nd row: ZYX 0.046846204 0.31797 18.857 0.146097014 0.0034837 15.106 0.221048912 0.0011114 33.84 0.492229415 3.61e-07 39.582 NA NA NA NA NA NA NA NA NA NA NA NA 0.011612729 0.49258 17.056 -0.076600534 0.071586 NA 0.371141778 7.49e-05 NA 0.507383556 0.0017682 54.985 
3rd row: ZYX 0.046846204 0.32115 18.857 0.146097014 0.0032917 15.106 0.221048912 0.00099106 33.84 0.492229415 2.27e-07 39.582 NA NA NA NA NA NA NA NA NA NA NA NA 0.011612729 0.49293 17.056 -0.128999496 0.01102 NA 0.220709405 0.011875 NA 0.507383556 0.0017682 54.985 
4th row: ZYX NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 
+2

是否有可能字符串值有空格(例如'「NA」'或其他)? – BrenBarn 2014-09-19 07:50:53

+1

嘗試使用'repr(l [i])打印值'並查看字符串中的實際內容。 – 2014-09-19 07:55:46

+0

你真的應該學習[布爾操作](https://docs.python.org/2/reference/expressions.html#boolean-operations),而不是嵌套深入五層的if。 – 2014-09-19 07:58:31

回答

0

由於球員的幫助,repr表明各行包含\n所以我只需要這樣做:line = line.rstrip()。現在,它的工作。