我從this網站獲取Jaro-Winkler算法的代碼。我需要運行150,000次以獲得差異之間的距離。這需要很長時間,因爲我在Android移動設備上運行。優化Jaro-Winkler算法
它可以優化更多嗎?
public class Jaro {
/**
* gets the similarity of the two strings using Jaro distance.
*
* @param string1 the first input string
* @param string2 the second input string
* @return a value between 0-1 of the similarity
*/
public float getSimilarity(final String string1, final String string2) {
//get half the length of the string rounded up - (this is the distance used for acceptable transpositions)
final int halflen = ((Math.min(string1.length(), string2.length()))/2) + ((Math.min(string1.length(), string2.length())) % 2);
//get common characters
final StringBuffer common1 = getCommonCharacters(string1, string2, halflen);
final StringBuffer common2 = getCommonCharacters(string2, string1, halflen);
//check for zero in common
if (common1.length() == 0 || common2.length() == 0) {
return 0.0f;
}
//check for same length common strings returning 0.0f is not the same
if (common1.length() != common2.length()) {
return 0.0f;
}
//get the number of transpositions
int transpositions = 0;
int n=common1.length();
for (int i = 0; i < n; i++) {
if (common1.charAt(i) != common2.charAt(i))
transpositions++;
}
transpositions /= 2.0f;
//calculate jaro metric
return (common1.length()/((float) string1.length()) +
common2.length()/((float) string2.length()) +
(common1.length() - transpositions)/((float) common1.length()))/3.0f;
}
/**
* returns a string buffer of characters from string1 within string2 if they are of a given
* distance seperation from the position in string1.
*
* @param string1
* @param string2
* @param distanceSep
* @return a string buffer of characters from string1 within string2 if they are of a given
* distance seperation from the position in string1
*/
private static StringBuffer getCommonCharacters(final String string1, final String string2, final int distanceSep) {
//create a return buffer of characters
final StringBuffer returnCommons = new StringBuffer();
//create a copy of string2 for processing
final StringBuffer copy = new StringBuffer(string2);
//iterate over string1
int n=string1.length();
int m=string2.length();
for (int i = 0; i < n; i++) {
final char ch = string1.charAt(i);
//set boolean for quick loop exit if found
boolean foundIt = false;
//compare char with range of characters to either side
for (int j = Math.max(0, i - distanceSep); !foundIt && j < Math.min(i + distanceSep, m - 1); j++) {
//check if found
if (copy.charAt(j) == ch) {
foundIt = true;
//append character found
returnCommons.append(ch);
//alter copied string2 for processing
copy.setCharAt(j, (char)0);
}
}
}
return returnCommons;
}
}
我提到的是,在整個過程中,我使腳本的只是實例,所以只有一次
jaro= new Jaro();
如果你要測試和需要的例子所以不破壞腳本,你會發現它here,在另一個蟒蛇優化線程
我對此持懷疑態度,但我跑了一些測試,看起來char數組的真正大約是StringBuffer的十倍。如果你想避免使用實際的字符數組,StringBuilder的速度只有char數組的兩倍左右。 – Rubys 2010-05-17 22:02:09
你可以發佈該代碼片段來幫助指導。 – Pentium10 2010-05-18 07:30:57