2013-01-10 100 views
-1

我正在開發一個現有的拼寫檢查應用程序, 任何人都可以幫助我更改這部分代碼,我不能將指針或stackalloc移植到Java,因爲沒有等價物存在。 具有完全相同功能的java方法。移植C#Levenshtein距離Java

public static unsafe double GNULevenstein(string word1, string word2) 
    { 
     // this algorithm normally computes un-normalized distance between two string. 
     fixed (char* word1Ptr = word1) 
     fixed (char* word2Ptr = word2) 
     { 
      char* pointerToWord1 = word1Ptr; 
      char* pointerToWord2 = word2Ptr; 

      /* skip equal start sequence, if any */ 
      if (word1.Length >= word2.Length) 
      { 
       while (*pointerToWord1 == *pointerToWord2) 
       { 
        /* if we already used up one string, 
        * then the result is the length of the other */ 
        if (*pointerToWord1 == '\0') break; 
        pointerToWord1++; 
        pointerToWord2++; 
       } 
      } 
      else // wordl < word2 
      { 
       while (*pointerToWord1 == *pointerToWord2) 
       { 
        /* if we already used up one string, 
        * then the result is the length of the other */ 
        if (*pointerToWord2 == '\0') break; 
        pointerToWord1++; 
        pointerToWord2++; 
       } 
      } 

      /* length count #1*/ 
      int len1 = word1.Length - (int)(pointerToWord1 - word1Ptr); 
      int len2 = word2.Length - (int)(pointerToWord2 - word2Ptr); 


      /* if we already used up one string, then 
      the result is the length of the other */ 
      if (*pointerToWord1 == '\0') 
       return ExportResult(len2 , word1.Length,word2.Length , false); 
      if (*pointerToWord2 == '\0') 
       return ExportResult(len1, word1.Length, word2.Length, false); 

      /* length count #2*/ 
      pointerToWord1 += len1; 
      pointerToWord2 += len2; 

      /* cut of equal tail sequence, if any */ 
      while (*--pointerToWord1 == *--pointerToWord2) 
      { 
       len1--; 
       len2--; 
      } 

      /* reset pointers, adjust length */ 
      pointerToWord1 -= len1++; 
      pointerToWord2 -= len2++; 

      /* possible dist to great? */ 
      //if ((len1 - len2 >= 0 ? len1 - len2 : -(len1 - len2)) >= char.MaxValue) return 1; 
      if (Math.Abs(len1 - len2) >= char.MaxValue) 
       return ExportResult(1, false); // no similarity 

      char* tmp; 
      /* swap if l2 longer than l1 */ 
      if (len1 < len2) 
      { 
       tmp = pointerToWord1; 
       pointerToWord1 = pointerToWord2; 
       pointerToWord2 = tmp; 
       len1 ^= len2; 
       len2 ^= len1; 
       len1 ^= len2; 
      } 

      /* fill initial row */ 

      int i, j, n; 

      n = (*pointerToWord1 != *pointerToWord2) ? 1 : 0; 
      char* r = stackalloc char[len1 * 2]; 

      char* p1, p2; 
      for (i = 0, p1 = r; i < len1; i++, *p1++ = (char)n++, p1++) 
      { /*empty*/} 


      /* calc. rowwise */ 
      for (j = 1; j < len2; j++) 
      { 
       /* init pointers and col#0 */ 
       p1 = r + ((j & 1) == 0 ? 1 : 0); 
       p2 = r + (j & 1); 
       n = *p1 + 1; 
       *p2++ = (char)n; p2++; 
       pointerToWord2++; 

       /* foreach column */ 
       for (i = 1; i < len1; i++) 
       { 
        if (*p1 < n) n = *p1 + (*(pointerToWord1 + i) != *pointerToWord2 ? 1 : 0); /* replace cheaper than delete? */ 
        p1++; 
        if (*++p1 < n) n = *p1 + 1; /* insert cheaper then insert ? */ 
        *p2++ = (char)n++; /* update field and cost for next col's delete */ 
        p2++; 
       } 
      } 

      /* return result */ 
      return ExportResult(n - 1, word1.Length, word2.Length, false); 
     } 


    } 
+0

我不認爲Java語言允許這些概念..指針或手動內存分配堆棧中的任何一個。 –

+0

是的,你是對的,但它可以改變這是一個方法惠普相同的功能在Java中,這正是我想要的 – Navid

回答

3

該方法看起來像是從C/C++懶惰地移植而不是用C#編寫的。在C#中的字符串操作通常是速度不夠快,使用unsafe,並直接與char* s工作的時間和精力的浪費......

我只是用Google搜索的方法名稱,看起來你只是想要一個Java實現的Levenshtein distance,所以,從相同的鏈接:

public class LevenshteinDistance { 
     private static int minimum(int a, int b, int c) { 
       return Math.min(Math.min(a, b), c); 
     } 

     public static int computeLevenshteinDistance(CharSequence str1, 
         CharSequence str2) { 
       int[][] distance = new int[str1.length() + 1][str2.length() + 1]; 

       for (int i = 0; i <= str1.length(); i++) 
         distance[i][0] = i; 
       for (int j = 1; j <= str2.length(); j++) 


       distance[0][j] = j; 

      for (int i = 1; i <= str1.length(); i++) 
        for (int j = 1; j <= str2.length(); j++) 
          distance[i][j] = minimum(
              distance[i - 1][j] + 1, 
              distance[i][j - 1] + 1, 
              distance[i - 1][j - 1] 
                  + ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 
                      : 1)); 

      return distance[str1.length()][str2.length()]; 
    } 
} 
+0

我已經看到了上面的代碼,但它不是提到的方法相同,這就是爲什麼我想要確切地轉換該方法。你爲我做了很多,如果幫我:) – Navid

+0

@NavidKayhaniRad你說的方法和羅伯特提出的方法有什麼區別? –