2012-08-28 23 views

回答

2

尋找另外WEKA源,這是做歸一化的方法:

private void normalizeInstance(Instance inst, int firstCopy) throws Exception 
{ 
    double docLength = 0; 

    if (m_AvgDocLength < 0) 
    { 
     throw new Exception("Average document length not set."); 
    } 

    // Compute length of document vector 
    for(int j=0; j<inst.numValues(); j++) 
    { 
     if(inst.index(j)>=firstCopy) 
     { 
      docLength += inst.valueSparse(j) * inst.valueSparse(j); 
     } 
    }  
    docLength = Math.sqrt(docLength); 

    // Normalize document vector 
    for(int j=0; j<inst.numValues(); j++) 
    { 
     if(inst.index(j)>=firstCopy) 
     { 
      double val = inst.valueSparse(j) * m_AvgDocLength/docLength; 
      inst.setValueSparse(j, val); 
      if (val == 0) 
      { 
       System.err.println("setting value "+inst.index(j)+" to zero."); 
       j--; 
      } 
     } 
    } 
} 

它看起來像最相關的部分是

double val = inst.valueSparse(j) * m_AvgDocLength/docLength; 
inst.setValueSparse(j, val); 

所以看起來正常化是value = currentValue * averageDocumentLength/actualDocumentLength