我正在使用Text :: Document計算兩個文檔之間的餘弦相似度。當我嘗試打印包含餘弦相似度得分($ sim)的變量時,我收到一條錯誤消息:「在連接(。)或字符串中使用單位值$ sim ...」。據我所知,我立即在打印命令上面初始化這個變量。無可否認,這是我第一次進入Text :: Document模塊,這裏我的對象構造可能是錯誤的/醜陋的/可能存在問題的。任何想法變量初始化有什麼問題?用於測量Text :: Document中CosineSimilarity的變量未初始化
use strict ;
use warnings ;
use autodie ;
use Text::Document ;
### BEGIN BY READING IN EACH FILE ONE BY ONE. ###
################## LOOP BEGIN ##################
# Process every file with a `txt` file type
my $parent = "D:/Cleaned 10Ks" ;
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
chdir($path) or die "Cant chdir to $path $!";
for my $filename (grep -f, glob('*')) {
open my ($fh), '<', $filename;
my $data1 = do {local $/; <$fh> } ;
my $data2 = Text::Document->new(file=>'$data1') ;
my $data3 = $data2->WriteToString() ;
my $data4 = Text::Document::NewFromString($data3) ;
my ($comp_id, $year, $rest) = split '-', $filename, 3;
my $prev_year = ($year ne '00') ? $year - 1 : 99;
my $prev_year_base = join '-', $comp_id, $year ;
my ($prev_year_file) = glob "$prev_year_base*" ;
open my ($fh_prior), '<', $prev_year_file ;
my $data1_prior = do {local $/; <$fh_prior> } ;
my $data2_prior = Text::Document->new(file=>'$data1_prior') ;
my $data3_prior = $data2->WriteToString() ;
my $data4_prior = Text::Document::NewFromString($data3_prior) ;
my $sim = $data4->CosineSimilarity($data4_prior) ;
print "The cosine similarity score is $sim\n" ;
}
}
您的回答有助於闡明此模塊的語法。謝謝! – Rick