作爲acfrancis已經回答:它不比使用內置的levenshtein
函數更簡單。
但是,要回答您的最終問題:是的,按照您建議的方式進行操作並不困難。
代碼
function checkQuestions($para1, $para2){
$arr1 = array_unique(array_filter(explode(' ', preg_replace('/[^a-zA-Z0-9]/', ' ', strtolower($para1)))));
$arr2 = array_unique(array_filter(explode(' ', preg_replace('/[^a-zA-Z0-9]/', ' ', strtolower($para2)))));
$intersect = array_intersect($arr1, $arr2);
$p1 = count($arr1); //Number of words in para1
$p2 = count($arr2); //Number of words in para2
$in = count($intersect); //Number of words in intersect
$lowest = ($p1 < $p2) ? $p1 : $p2; //Which is smaller p1 or p2?
return array(
'Average' => number_format((100/(($p1+$p2)/2)) * $in, 2), //Percentage the same compared to average length of questions
'Smallest' => number_format((100/$lowest) * $in, 2) //Percentage the same compared to shortest question
);
}
說明
- 我們定義其接受兩個參數(參數是,我們要比較的問題)的功能。
- 我們過濾輸入和轉換爲數組
- 使輸入小寫
strtolower
- 過濾掉非字母數字字符
preg_replace
- 我們爆炸上空間中的過濾字符串
- 我們過濾創建的陣列
- 刪除空白
array_filter
- 刪除重複
array_unique
- 重複
2-4
第二個問題
- 找到匹配在兩個陣列的話,並移動到新的數組字
$intersect
- 計數數在三個陣列
$p1
的,$p2
和$in
- 計算百分比相似度並返回
然後,您需要設置一個閾值,以確定問題在被視爲與相同之前的相似程度。 80%
。
N.B.
- 該函數返回兩個值的數組。第一個比較長度和兩個輸入問題的平均值僅次於最短。你可以修改它返回一個單一的值。
- 我用
number_format
的百分比...但你會被罰款與返回的int
可能
例子
例1
$question1 = 'The average of 20 numbers is zero. Of them, at the most, how many may be greater than zero?';
$question2 = 'The average of 20 numbers is zero. Of them how many may be greater than zero?';
if(checkQuestions($question1, $question2)['Average'] >= 80){
echo "Questions are the same...";
}
else{
echo "Questions are not the same...";
}
//Output: Questions are the same...
例2
$para1 = 'The average of 20 numbers is zero. Of them, at the most, how many may be greater than zero?';
$para2 = 'The average of 20 numbers is zero. Of them how many may be greater than zero?';
$para3 = 'The average of 20 numbers is zero. Of them how many may be greater than zero, at the most?';
var_dump(checkQuestions($para1, $para2));
var_dump(checkQuestions($para1, $para3));
var_dump(checkQuestions($para2, $para3));
/**
Output:
array(2) {
["Average"]=>
string(5) "93.33"
["Smallest"]=>
string(6) "100.00"
}
array(2) {
["Average"]=>
string(6) "100.00"
["Smallest"]=>
string(6) "100.00"
}
array(2) {
["Average"]=>
string(5) "93.33"
["Smallest"]=>
string(6) "100.00"
}
*/
謝謝!這個,我現在就試試。 – PEPLOVE