我正在製作抄襲檢測程序。成品將逐句比較兩個完整的文本文檔;在這一點上,我只是測試我的算法,以便與句子進行比較,並給出一個0到1之間的數字,表示他們的單詞有多相似。我可能會對std :: set作品感到困惑。我的代碼不能正常工作
我會嘗試逐句通過代碼,並告訴你什麼是問題。
指令和一個函數聲明:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <math.h>
#include <set>
double sntnc_cmpr_qtnt(const std::vector<std::string>&, const std::vector<std::string>&);
主要取字符串兩個陣列,並將它們放入載體。我知道這似乎沒用,但它只是爲了我的測試目的。我計算兩個字符串向量之間的句子比較商(應該是2個句子)。
int main (int argc, char* const argv[]) {
std::string arr1[] = {"Yo", "dawg", "I", "heard", "you", "like", "functions", "so", "we", "put", "a", "function", "inside"};
std::vector<std::string> str1, str2;
for (int i = 0; i < sizeof(arr1)/sizeof(std::string); ++i)
str1.push_back(arr1[i]);
std::string arr2[] = {"Yo", "dawg", "I", "heard", "you", "like", "cars", "so", "we", "put", "a", "car", "inside"};
for (int i = 0; i < sizeof(arr2)/sizeof(std::string); ++i)
str2.push_back(arr2[i]);
std::cout << sntnc_cmpr_qtnt(str1, str2);
return 0;
}
以下是句子比較商數函數。它計算2個句子之間共同的單詞數量。
雖然有些事情出錯了。我的點數(「cnt」)到了158,顯然太高了。我無法弄清楚爲什麼。
double sntnc_cmpr_qtnt(const std::vector<std::string>& s1, const std::vector<std::string>& s2) {
// Place the words of sentences s1 and s2 each into seperate sets s1_set and s2_set:
std::set<std::string> s1set, s2set;
for (std::vector<std::string>::const_iterator it = s1.begin(); it != s1.end(); ++it)
s1set.insert(*it);
for (std::vector<std::string>::const_iterator it = s2.begin(); it != s2.end(); ++it)
s2set.insert(*it);
/* Compute the proportion of words in common between str1_set and str2_set,
multiped by 1 over 1 minus the squareroot of the size of the smaller set.
This is the sentence comparison quotient that is returned. */
double cnt(0.0);
for (std::set<std::string>::iterator it1 = s1set.begin(); it1 != s1set.end(); ++it1) {
for (std::set<std::string>::iterator it2 = s2set.begin(); it2 != s2set.end(); ++it2) {
if ((*it1).compare(*it2))
cnt += 1.0;
}
}
if (cnt == 0.0) {
return 0.0;
} else {
double minsz = (double)std::min(s1set.size(), s2set.size());
return ((1-1/sqrt(minsz))*cnt/minsz);
}
}
你可能想看看['string :: compare'](http://en.cppreference.com/w/cpp/string/basic_string/compare)實際返回的結果。 – sbabbi
sizeof(std :: string)= 8 ....我敢肯定,這是一個指向字符串的指針的大小 – user2967799
啊,我明白了,sbabbi。哎呦。 – user2967799