2017-07-11 25 views
1

我是粒子羣優化的新手。我閱讀了關於基於PSO和K-means的聚類的研究論文,但是我沒有找到相同的工作示例。任何形式的幫助都非常感謝。提前致謝!基於PSO和K-means的文本文檔聚類R

我想在R中使用PSO和K-means進行文本文檔聚類。我的基本思想是,首先PSO會給我聚類質心的優化值,然後我必須使用羣集質心的優化值PSO作爲k-均值的初始聚類質心以獲得文檔簇。

下面是描述我迄今爲止所做的工作的代碼!

#Import library 
library(pdist) 
library(hydroPSO) 

#Create matrix and suppose it is our document term matrix which we get after 
the cleaning of corpus 

(在我的實際數據我有一個951個條款,即暗淡(DTM)= 20 * 951 20個文檔)

matri <- matrix(data = seq(1, 20, 1), nrow = 4, ncol = 7, byrow = TRUE) 
matri 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] 
[1,] 1 2 3 4 5 6 7 
[2,] 8 9 10 11 12 13 14 
[3,] 15 16 17 18 19 20 1 
[4,] 2 3 4 5 6 7 8 

#Initially select first and second row as centroids 
cj <- matri[1:2,] 

#Calculate Euclidean Distance of each data point from centroids 
vm <- as.data.frame(t(as.matrix(pdist(matri, cj)))) 
vm 
    V1  V2  V3  V4 
1 0.00000 18.52026 34.81379 2.645751 
2 18.52026 0.00000 21.51744 15.874508 

#Create binary matrix S in which 1 means Instance Ii is allocated to the cluster Cj otherwise 0. 
S <- matrix(data = NA, nrow = nrow(vm), ncol = ncol(vm)) 

for(i in 1:nrow(vm)){ 
    for(j in 1:ncol(vm)){ 
     cd <- which.min(vm[, j]) 
     ifelse(cd==i, S[i,j] <-1, S[i,j] <-0) 

    } 
    } 

S 
     [,1] [,2] [,3] [,4] 
[1,] 1 0 0 1 
[2,] 0 1 1 0 

#Apply `hydroPSO()` to get optimised values of centroids. 
set.seed(5486) 
D <- 4 # Dimension 
lower <- rep(0, D) 
upper <- rep(10, D) 
m_s <- matrix(data = NA, nrow = nrow(S), ncol = ncol(matri)) 
Fn= function(y) { #Objective Function which has to be minimised 

for(j in 1:ncol(matri)){ 
    for(i in 1:nrow(matri)){ 
     for(k in 1:nrow(y)){ 
      for(l in 1:ncol(y)){ 
       m_s[k,] <- colSums(matri[y[k,]==1,])/sum(y[k,]) 
      } 
     } 
    } 
} 

    sm <- sum(m_s)/ nrow(S) 
    return(sm) 

    } 

hh1 <- hydroPSO(S,fn=Fn, lower=lower, upper=upper, 
       control=list(write2disk=FALSE, npart=3)) 

但上面hydroPSO()功能無法正常工作。它給錯誤錯誤1:nrow(y):參數的長度爲0。我搜索了它,但沒有得到任何解決方案,這對我有用。

我也做了一些改變,我的目標函數和這個時候hydroPSO()工作,但我猜測不正確。我將我的初始質心矩陣作爲尺寸爲2 * 7的參數傳遞,但函數僅返回1 * 7個優化值。我沒有得到它的理由。

set.seed(5486) 
D <- 7# Dimension 
lower <- rep(0, D) 
upper <- rep(10, D) 

Fn = function(x){ 
vm <- as.data.frame(t(as.matrix(pdist(matri, x)))) 

S <- matrix(data = NA, nrow = nrow(vm), ncol = ncol(vm)) 

for(i in 1:nrow(vm)){ 
    for(j in 1:ncol(vm)){ 
     cd <- which.min(vm[, j]) 
     ifelse(cd==i, S[i,j] <-1, S[i,j] <-0) 

    } 
    } 

    m_s <- matrix(data = NA, nrow = nrow(S), ncol = ncol(matri)) 

for(j in 1:ncol(matri)){ 
    for(i in 1:nrow(matri)){ 
     for(k in 1:nrow(S)){ 
      for(l in 1:ncol(S)){ 
       m_s[k,] <- colSums(matri[S[k,]==1,])/sum(S[k,]) 
      } 
     } 
    } 
    } 

sm <- sum(m_s)/ nrow(S) 
return(sm) 

} 
hh1 <- hydroPSO(cj,fn=Fn, lower=lower, upper=upper, 
        control=list(write2disk=FALSE, npart=2, K=2)) 

上述函數的輸出。

## $par 
## Param1 Param2 Param3 Param4 Param5 Param6 Param7 
## 8.6996174 2.1952303 5.6903588 0.4471795 3.7103161 1.6605425 8.2717574 
## 
## $value 
## [1] 61.5 
## 
## $best.particle 
## [1] 1 
## 
## $counts 
## function.calls  iterations regroupings 
##   2000   1000    0 
## 
## $convergence 
## [1] 3 
## 
## $message 
## [1] "Maximum number of iterations reached" 

我想我以錯誤的方式將參數傳遞給hydroPSO()。請糾正我在哪裏做錯了。

非常感謝!

回答

0

不是傳遞CJhydroPSO()我以前as.vector(T(CJ))在我的第二個方法,它爲我工作得很好。我有14個優化值