如何消除在文件

我有一個file.txt的如後續等於行：如何消除在文件

1. 0. 3.21 
1. 1. 2.11 
1. 2. 1.554 
1. 0. 3.21 
1. 3. 1.111 
1. 2. 1.554

正如你可以看到我有兩條線中等於彼此（第一，第四和第三和第六）。我的嘗試是消除是平等的，以獲得類似的線路：

1. 0. 3.21 
1. 1. 2.11 
1. 2. 1.554 
1. 3. 1.111

我Fortran程序做的嘗試是：

 program mean 
     implicit none 
     integer :: i,j,n,s,units 
     REAL*8,allocatable:: x(:),y(:),amp(:) 

      ! open the file I want to change 

      OPEN(UNIT=10,FILE='oldfile.dat') 
      n=0 
      DO 
       READ(10,*,END=100)   
       n=n+1 
      END DO 

    100  continue 
      rewind(10) 
     allocate(x(n),y(n),amp(n)) 
    s=0 

     ! save the numbers from the file in three different vectors 

     do s=1, n 
      read(10,*) x(s), y(s),amp(s) 
     end do 
     !---------------------! 

    ! Open the file that should contains the new data without repetition  
    units=107 
    open(unit=units,file='newfile.dat') 

    ! THIS SHOULD WRITE ONLY NOT EQUAL ELEMENTS of THE oldfile.dat: 
    ! scan the elements in the third column and write only the elements for which 
    ! the if statement is true, namely: write only the elements (x,y,amp) that have 
    ! different values in the third column. 

    do i=1,n 
     do j = i+1,n 
     if (amp(i) .ne. amp(j)) then ! 
     write(units,*),x(j),y(j),amp(j) 
     end if 
     end do 
    end do 
    end program

但輸出文件看起來像這樣：

1.000000  1.000000  2.110000  
    1.000000  2.000000  1.554000  
    1.000000  3.000000  1.111000  
    1.000000  2.000000  1.554000  
    1.000000  2.000000  1.554000  
    1.000000  0.0000000E+00 3.210000  
    1.000000  3.000000  1.111000  
    1.000000  2.000000  1.554000  
    1.000000  0.0000000E+00 3.210000  
    1.000000  3.000000  1.111000  
    1.000000  3.000000  1.111000  
    1.000000  2.000000  1.554000  
    1.000000  2.000000  1.554000

我不明白if條件的問題是什麼，請問您能幫我一下嗎？

非常感謝！

來源

2014-10-09 Panichi Pattumeros PapaCastoro

好多了。現在，您輸入的文件是否真正代表了真實的輸入文件？在典型的輸入文件中會有多少行？ – 2014-10-09 13:34:15

@HighPerformanceMark yes與三個實型列和n行（其中n = 100000（或多或少，這是輸出的一般行數）的矩陣）完全相同。 – 2014-10-09 14:01:03

無論算法考慮使用字符串操作來完成整個事情（假設文本表示中「相等」行相等）。它將簡化代碼，速度更快，並且您的輸出將被自動格式化爲與輸入相同。 – agentp 2014-10-09 15:56:10

我不會修復你的方法我會完全放棄它。你得到的是一個O(n^2)算法，適用於少量線路，但在10^5線路上您將執行if語句0.5 * 10^10次。 Fortran的速度很快，但這是不必要的浪費。

我會先排序文件（O(n log n)）然後掃描它（O(n)）並消除重複。我可能不會使用Fortran對其進行排序，我會使用其中一個Linux實用程序，如sort。然後，我可能會使用uniq，並最終不做任何Fortran編程。

如果您想按原始順序編寫重複數據刪除文件，那麼我會添加一個行號，然後進行排序，唯一化，然後重新排序。

我相信Windows的最新版本，支持Powershell的版本，有相同的命令。

如果我絕對不得不在Fortran中完成所有這些工作，我會編寫一個排序例程（或者相反，從我的一攬子技巧中抽出一個）並繼續。我傾向於將字符串作爲字符串進行讀取，並對其進行文本分類，而不會混淆實數和他們棘手的平等概念。對於10^5行，我會將整個文件讀入一個數組，然後將其排序到另一個數組中，然後繼續。

最後，我認爲您的if聲明的邏輯是不可靠的。它決定是否僅根據第三個字段（即，不是）的第三個字段即amp的平等寫入一個新文件的行。它肯定應該考慮對線i和j所有三個字段，更像

if (any([ x(i)/=x(j), y(i)/=y(j), amp(i)/=amp(j) ])) then

來源

2014-10-09 14:30:20

它的工作！它的工作非常好，而且速度很快。我使用'sort -n -k 3 oldfile.txt >> sort.txt'對文件進行排序，使其第三行的所有數字都等於其他數字。然後我只使用'uniq sort.txt >> newfile.txt'就是這樣！非常感謝！ – 2014-10-09 16:08:49

只是爲了修復蠻力循環，它應該是這樣的：

do i=1,n 
    j=1 
    do while(j.lt.i.and.amp(i) .ne. amp(j)) 
    j=j+1 
    enddo 
    if(j.eq.i)write(units,*)x(i),y(i),amp(i) 
end do

或

do i=1,n 
    do j=1,i-1 
    if (amp(i) .eq. amp(j)) exit 
    enddo 
    if(j.eq.i)write(units,*)x(i),y(i),amp(i) 
end do

來源

2014-10-09 16:04:22 agentp

如何消除在文件

回答

相關問題