UPDATE：根據Python中特定列的str值計算向量長度

我試圖根據輸入數據的第一列的值來測量向量的長度。例如：我的輸入數據如下：UPDATE：根據Python中特定列的str值計算向量長度

dog nmod+n+-n 4 
dog nmod+n+n-a-commitment-n 6 
child into+ns-j+vn-pass-rb-divide-v 3 
child nmod+n+ns-commitment-n 5 
child nmod+n+n-pledge-n 3 
hello nmod+n+ns 2

，我想是基於在第一列相同的值來計算的值。例如，我會根據dog在第一列中的所有行計算出一個值，然後我會根據child在第一列中的所有行計算一個值，依此類推。

我已經計算出數學來計算矢量長度（Euc。norm）。但是，我不確定如何基於將相同的值分組到第一列中來進行計算。

到目前爲止，這是代碼我寫：

#!/usr/bin/python 
import os 
import sys 
import getopt 
import datetime 
import math 

print "starting:", 
print datetime.datetime.now() 


def countVectorLength(infile, outfile): 

    with open(infile, 'rb') as inputfile: 
     flem, _, fw = next(inputfile).split() 
     current_lem = flem 
     weights = [float(fw)] 
     for line in inputfile: 
      lem, _, w = line.split() 
      if lem == current_lem: 
       weights.append(float(w)) 
      else: 
       print current_lem, 
       print math.sqrt(sum([math.pow(weight,2) for weight in weights])) 

       current_lem = lem 
       weights = [float(w)] 

     print current_lem, 
     print math.sqrt(sum([math.pow(weight,2) for weight in weights])) 

      print "Finish:", 
      print datetime.datetime.now() 

path = '/Path/to/Input/' 
pathout = '/Path/to/Output' 
listing = os.listdir(path) 
for infile in listing: 
    outfile = 'output' + infile 
    print "current file is:" + infile 

    countVectorLength(path + infile, pathout + outfile)

該代碼輸出每個單獨的引理的矢量的長度。以上數據給了我下面的輸出：

dog 7.211102550927978 
child 6.48074069840786 
hello 2

UPDATE 我一直在努力，而且我已經成功地得到下面的工作代碼，代碼示例中更新上面。但是，你將能夠看到。代碼對每個文件的最後一行的輸出都有問題---我通過手動添加它已經基本解決了這個問題。但是，由於這個問題，它不允許通過目錄進行乾淨的迭代 - 輸出附加的>文檔中的所有文件的所有結果。有沒有辦法讓這個更清潔，pythonic的方式來直接輸出outpath目錄中的每個單獨的相應文件？

來源

2013-12-11 owwoow14

第一件事情，你需要將輸入轉化爲類似

dog => [4,2] 
child => [3,5,3] 
etc

它是這樣的：

from collections import defaultdict 
data = defaultdict(list) 
for line in file: 
    line = line.split('\t') 
    data[line[0]].append(line[2])

一旦這樣做了，剩下的就是顯而易見的：

def vector_len(vec): 
    you already got that 

vector_lens = {name: vector_len(values) for name, values in data.items()}

來源

2013-12-11 17:12:03 georg

這給出了一個回溯錯誤： 'TypeError：'type'object is not iterable' - see updated code – owwoow14

@ owwoow14：我發佈的內容不是工作代碼 - 我故意遺漏了一些你應該弄清楚的細節。 – georg

UPDATE：根據Python中特定列的str值計算向量長度

回答

相關問題