2013-03-18 41 views
0

我有描述具體參數列的列表中的文件:從該文件在表中匹配行號與字符串。

尺寸大小亮度

我只需要特定的數據(尤其是行和列)。到目前爲止,我在python中有一個代碼,我在其中附加了必要的行號。我只需要知道我如何匹配它以獲取文本文件中的正確字符串以及列(量級)和(光度)中的變量。關於如何處理這個問題的任何建議?

這裏是我的代碼示例(#comments形容我做了什麼什麼,我想做的事):

temp_ListMatch = (point[5]).strip() 
if temp_ListMatch: 
    ListMatchaddress = (point[5]).strip() 
    ListMatchaddress = re.sub(r'\s', '_', ListMatchaddress) 
    ListMatch_dirname = '/projects/XRB_Web/apmanuel/499/Lists/' + ListMatchaddress 
    #print ListMatch_dirname+"\n" 

    try: 
     file5 = open(ListMatch_dirname, 'r') 
    except IOError: 
     print 'Cannot open: '+ListMatch_dirname 

    Optparline = [] 
    for line in file5: 
     point5 = line.split() 
     j = int(point5[1]) 
     Optparline.append(j) 
     #Basically file5 contains the line numbers I need, 
     #and I have appended these numbers to the variable j. 
     temp_others = (point[4]).strip() 
     if temp_others: 
      othersaddress = (point[4]).strip() 
      othersaddress =re.sub(r'\s', '_', othersaddress) 
      othersbase_dirname = '/projects/XRB_Web/apmanuel/499/Lists/' + othersaddress 
      try: 
       file6 = open(othersbase_dirname, 'r') 
      except IOError: 
       print 'Cannot open: '+othersbase_dirname 

      gmag = [] 
      z = [] 
      rh = [] 
      gz = [] 

      for line in file6: 
       point6 = line.split() 
       f = float(point6[2]) 
       g = float(point6[4]) 
       h = float(point6[6]) 
       i = float(point6[9]) 
     # So now I have opened file 6 where this list of data is, and have 
     # identified the columns of elements that I need. 
     # I only need the particular rows (provided by line number) 
     # with these elements chosen. That is where I'm stuck! 
+0

也許我失去了一些東西,但爲什麼不增加你檢查每一個新行的數字?你可以在添加行的內容時使用該行索引字典,決定是否需要行,等等。 – Patashu 2013-03-18 03:31:42

+0

我正確地認爲你有一個文件包含格式爲'size magnitude luminosity'的行,而另一個文件與行號列表?而且,您只想提取指定行的大小和亮度列? – 2013-03-18 04:27:27

+0

什麼是「點」?它拿着什麼? – pradyunsg 2013-03-18 04:55:22

回答

0

加載整個數據文件中的數據框大熊貓(假設數據文件有一個頭,從中我們可以得到的列名)

import pandas as pd 
df = pd.read_csv('/path/to/file') 

載重線號的文件轉換成熊貓系列(假設有每行一個):

# squeeze = True makes the function return a series 
row_numbers = pd.read_csv('/path/to/rows_file', squeeze = True) 

只返回那些在行號文件中的行,列大小和亮度(假定第一行的編號爲0):

relevant_rows = df.ix[row_numbers][['magnitude', 'luminosity'] 
相關問題