計算KNN的歐氏距離

我已經看到很多計算KNN的歐幾里得距離的例子，但是非情感分類的例子。計算KNN的歐氏距離

比如我有一句「一個非常接近的比賽」

如何計算的一句「偉大的博弈」的歐氏距離？

來源

2017-09-05 xx4xx4

你所說的判刑「歐氏距離」是指目前還不清楚。要獲得任何距離，您需要修正一些編碼 - 例如，您可以使用計數向量，二進制版本或tfidf向量。 –

假設你有一個[link]（https://i.stack.imgur.com/PrqAF.png）的訓練數據，你必須使用KNN對「非常接近的比賽」這個句子進行分類......類似的東西 – xx4xx4

該數據有句子字符串。正如我前面提到的，有很多方法可以對它們進行矢量化。 –

想想一個關於多維空間中的一個點的句子，只有在你定義了座標系之後，才能計算出歐幾里德距離。例如。你能介紹

O1 - 一個句子長度（長）
O2 - 一個詞數（WordsCount）
O2 - 按字母順序中心（我只是想到這一點）。它可以計算爲一個句子中每個作品的字母中心的算術平均值。

CharsIndex = Sum(Char.indexInWord)/CharsCountInWord; CharsCode = Sum(Char.charCode)/CharsCount; AlphWordCoordinate = [CharsIndex, CharsCode]; WordsIndex = Sum(Words.CharsIndex)/WordsCount; WordsCode = Sum(Words.CharsCode)/WordsCount; AlphaSentenceCoordinate = (WordsIndex ^2+WordsCode^2+WordIndexInSentence^2)^1/2;

因此，歐氏距離，可以計算出任何如下：

EuclidianSentenceDistance = (WordsCount^2 + Length^2 + AlphaSentenceCoordinate^2)^1/2

沒有每一句話可以被轉化爲指向的三維空間，如P [長，單詞，AlphaCoordinate]。有距離可以比較和分類句子。

這不是我想的理想方法，但我想告訴你一個主意。

import math 

def calc_word_alpha_center(word): 
    chars_index = 0; 
    chars_codes = 0; 
    for index, char in enumerate(word): 
     chars_index += index 
     chars_codes += ord(char) 
    chars_count = len(word) 
    index = chars_index/len(word) 
    code = chars_codes/len(word) 
    return (index, code) 


def calc_alpha_distance(words): 
    word_chars_index = 0; 
    word_code = 0; 
    word_index = 0; 
    for index, word in enumerate(words): 
     point = calc_word_alpha_center(word) 
     word_chars_index += point[0] 
     word_code += point[1] 
     word_index += index 
    chars_index = word_chars_index/len(words) 
    code = word_code/len(words) 
    index = word_index/len(words) 
    return math.sqrt(math.pow(chars_index, 2) + math.pow(code, 2) + math.pow(index, 2)) 

def calc_sentence_euclidean_distance(sentence): 
    length = len(sentence) 

    words = sentence.split(" ") 
    words_count = len(words) 

    alpha_distance = calc_alpha_distance(words) 

    return math.sqrt(math.pow(length, 2) + math.pow(words_count, 2) + math.pow(alpha_distance, 2)) 


sentence1 = "a great game" 
sentence2 = "A great game" 

distance1 = calc_sentence_euclidean_distance(sentence1) 
distance2 = calc_sentence_euclidean_distance(sentence2) 

print(sentence1) 
print(str(distance1)) 

print(sentence2) 
print(str(distance2))

控制檯輸出

a great game 
101.764433866 
A great game 
91.8477000256

來源

2017-09-05 17:44:29 slesh

即時通訊困惑...你可以嘗試使用我有的例子計算？例如這樣的鏈接：https：//stackoverflow.com/questions/17053459/how-to-transform-a-text-to-vector – xx4xx4

我已經添加了代碼示例。你可以玩它並嘗試實現高質量的功能。因爲現在，正如你所看到的那樣，函數對像char寄存器這樣的小改動很敏感。 – slesh

我讀過的代碼，但我覺得從我想要做的不同... 假設：培訓一句話：「一場偉大的比賽」未標記一句話：「一個非常接近的比賽」我想要計算兩句之間的歐氏距離。從什麼iv'e讀我應該將每個句子轉換成二進制就像我以前的評論中的鏈接... – xx4xx4

計算KNN的歐氏距離

回答

相關問題