2016-08-04 16 views
0

我希望我能有一些幫助,這個問題:比較每行中兩個文本文件

我有由約10.000行(假設文件1和文件2)從FEM分析comng兩個文本文件。這些文件的結構是:

文件1

 .... 
    Element   Facet   Node CNORMF.Magnitude  CNORMF.CNF1  CNORMF.CNF2  CNORMF.CNF3   CPRESS   CSHEAR1   CSHEAR2 CSHEARF.Magnitude CSHEARF.CSF1 CSHEARF.CSF2 CSHEARF.CSF3 

     881    3   6619    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     881    3   6648    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     881    3   6653    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     930    3   6452    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     930    3   6483    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     930    3   6488    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     1244    2   7722    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     1244    2   7724    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     1244    2   7754    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     2380    2   3757  304.326E-06 -123.097E-06 -203.689E-06 -189.663E-06  564.697E-06 -281.448E-06  22.5357E-06  152.710E-06  144.843E-06 -26.7177E-06 -40.3387E-06 
     2380    2   3826  226.603E-06 -85.9859E-06 -161.270E-06 -133.967E-06  270.594E-06 -134.865E-06  10.7988E-06  117.700E-06  116.217E-06 -4.67318E-06 -18.0298E-06 
     2380    2   3848  10.4740E-03 -2.01174E-03 -6.63900E-03 -7.84743E-03  771.739E-06 -384.638E-06  30.7983E-06  5.24148E-03  5.12795E-03 -541.446E-06 -940.251E-06 
     2894    2   8253    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     2894    2   8255    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     2894    2   8270    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3372    2   5920    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3372    2   5961  52.7705E-03  12.2948E-03 -40.8019E-03 -31.1251E-03  7.36309E-03 -2.56505E-03 -502.055E-06  18.8167E-03  17.9038E-03  2.12060E-03  5.38774E-03 
     3372    2   5996    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3936    3   6782    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3936    3   6852    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3936    3   6857    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3937    4   6410    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3937    4   6452    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3937    4   6488    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3955    2   6940    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3955    2   6941    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     3955    2   6993    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     4024    2   8027    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     4024    2   8050    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0. 
     .... 

文件2

 .... 
     Node COORD.Magnitude  COORD.COOR1  COORD.COOR2  COORD.COOR3  U.Magnitude   U.U1   U.U2   U.U3 
      1   131.691   14.5010  -92.2190  -92.8868   1.93638  188.252E-03  -1.64949 -996.662E-03 
      2   131.336   10.9038  -92.2281  -92.8663   1.93341  188.250E-03  -1.64672 -995.468E-03 
      3   132.130   18.7534  -92.4681  -92.5002   1.93968  188.190E-03  -1.65258 -997.959E-03 
      4   130.769   1.97638  -92.5186  -92.3953   1.92580  188.179E-03  -1.63965 -992.387E-03 
      5   130.560  -4.04517  -93.1433  -91.3993   1.92030  188.026E-03  -1.63459 -990.122E-03 
      6   132.422   24.0768  -93.9662  -90.1454   1.94282  187.819E-03  -1.65564 -999.062E-03 
      7   130.377  -8.39503  -94.1640  -89.7827   1.91586  187.774E-03  -1.63054 -988.235E-03 
      8   126.321   13.6556  -88.0641  -89.5278   1.93579  192.554E-03  -1.64736 -998.202E-03 
      9   125.963   4.31065  -88.6558  -89.3771   1.92786  192.145E-03  -1.64012 -994.852E-03 
      10   130.037   3.02359  -94.4877  -89.2894   1.92501  187.692E-03  -1.63909 -991.871E-03 
      11   126.692   18.5888  -88.1164  -89.1107   1.93970  192.653E-03  -1.65097 -999.810E-03 
      12   125.751  -1.96189  -89.1238  -88.6928   1.92231  192.010E-03  -1.63500 -992.572E-03 
      13   125.719  -3.46723  -89.2798  -88.4437   1.92094  191.971E-03  -1.63373 -992.005E-03 
      14   130.026   7.42596  -95.0372  -88.4289   1.92818  187.556E-03  -1.64210 -993.086E-03 
      15   130.736   16.3557  -95.3755  -87.9092   1.93527  187.472E-03  -1.64873 -995.891E-03 
      16   130.251  -12.8122  -95.5572  -87.5783   1.91105  187.430E-03  -1.62618 -986.163E-03 
      17   130.250   12.8770  -95.6602  -87.4548   1.93216  187.401E-03  -1.64586 -994.616E-03 
      18   125.609  -7.73838  -90.1949  -87.0785   1.91668  191.718E-03  -1.62985 -990.191E-03 
      19   124.466  -6.21492  -88.8834  -86.9075   1.91827  192.783E-03  -1.63095 -991.270E-03 
      20   126.958   23.9470  -89.5421  -86.7584   1.94289  192.337E-03  -1.65406  -1.00096 
      21   121.210   6.64491  -84.7929  -86.3587   1.92993  196.112E-03  -1.64059 -997.316E-03 
      22   121.369   12.5781  -84.3620  -86.3434   1.93495  196.450E-03  -1.64514 -999.468E-03 
     .... 

我想要做以下步驟:

  1. 從文件1
  2. 刪除前兩列
  3. 比較兩個文件的節點標籤
  4. 寫在含有由側

這裏具有相同的「節點標籤」側行「RPT」格式的輸出文本文件是我所使用的代碼。它看起來像適用於小文件。但對於大文件而言,需要很長的時間。

nodEl = open("P:/File1.rpt", "r") 
uniNod = open("P:/File2.rpt", "r") 

row_nodEl = nodEl.readlines() 
row_uniNod = uniNod.readlines() 

nodEl.close() 
uniNod.close() 

output = open("P:/output.rpt", "w") 

for index, line in enumerate(row_nodEl): 
    if index > 23081 and index < 40572 and index !=23083 and index !=23084: 
     temp = line.strip() 
     temp2 = " ".join(temp.split()) 
     var = temp2.split(" ",3) 
     for index2, line2 in enumerate(row_uniNod): 
      if index2 > 11412 and index2 < 21258 and index2 != 11414 and index2 !=11415: 
       temp3 = line.strip() 
       temp4 = " ".join(temp3.split()) 
       var2 = temp4.split(" ",1) 
       if var[2] == var2[0]: 
        output.write("%s" %var[2]) + " " + "%s" %var[3] + " " + "%s" %var2[1]) 

任何建議更受歡迎!

回答

1

您正在比較一個文件(與m行)的每一行到另一個文件的每行(使用n行)。這導致時間複雜度爲O(m*n)。這意味着兩個文件(每個文件有10,000行)將產生100,000,000個比較結果。

如果你改變你讀取值的方式,你可以加速你的代碼。考慮將文件讀入字典而不是列表。字典中的每個鍵都是一個節點號,每個值都是完整的一行。

使用這種方法,你可以做到以下幾點:第一檔

  1. 加載到字典
  2. 加載第二個文件轉換成字典
  3. 從第一字典中的每個節點,找到在第二字典

使用Python相應的節點,這將類似於此

file_contents_1 = load_file("P:/File1.rpt") 
file_contents_2 = load_file("P:/File2.rpt") 

for node_label in file_contents_1: 
    # Skip processing node which doesn't have corresponding values in the second file 
    if not node_label in file_contents_2: 
     continue 
    # Do something 

這種方法的好處是可以單獨加載文件,這意味着時間複雜度現在變成線性的O(m+n)。當在第二個文件中查找相應的節點時,由於實現字典的方式(即散列表),您有恆定的時間複雜度。

這應該讓你的代碼快很多。

+0

您可能會先載入第一個字典,然後依次循環第二個字典。不知道這將如何影響性能。 – agentp

+0

@agentp這也行得通。這可能會減少內存密集。 – hgazibara

+0

感謝您的回答。我已經將這兩個文件加載爲字典,然後使用您的方法進行比較。現在一切都很好。 @agentp你的意思是隻加載一個文件作爲字典,然後比較它的每個標籤和第二個文本文件的每一行? – drSlump