我正在嘗試使用vw來查找預測是否有人會打開電子郵件的單詞或短語。如果他們打開電子郵件,則目標爲1,否則爲0。我的數據是這樣的:Vowpal Wabbit varinfo和ngrams:不存在的組合
1 |A this is a test
0 |A this test is only temporary
1 |A i bought a new polo shirt
1 |A that was a great online sale
我把它改成了一個名爲「test1.txt文件」和文件運行下面的代碼做2的n-gram,也輸出變量信息:
C:\~\vw>perl vw-varinfo.pl -V --ngram 2 test1.txt >> out.txt
當我看看有沒有在原始數據中看到的bigrams輸出。這是一個錯誤還是我誤解了一些東西。
輸出:
FeatureName HashVal MinVal MaxVal Weight RelScore
A^a 239656 0.00 1.00 +0.1664 100.00%
A^is 7514 0.00 1.00 +0.0772 46.38%
A^test 12331 0.00 1.00 +0.0772 46.38%
A^this 169573 0.00 1.00 +0.0772 46.38%
A^bought 245782 0.00 1.00 +0.0650 39.06%
A^i 245469 0.00 1.00 +0.0650 39.06%
A^new 51974 0.00 1.00 +0.0650 39.06%
A^polo 48680 0.00 1.00 +0.0650 39.06%
A^shirt 73882 0.00 1.00 +0.0650 39.06%
A^great 220692 0.00 1.00 +0.0610 36.64%
A^online 147727 0.00 1.00 +0.0610 36.64%
A^sale 242707 0.00 1.00 +0.0610 36.64%
A^that 206586 0.00 1.00 +0.0610 36.64%
A^was 223274 0.00 1.00 +0.0610 36.64%
A^a^bought 216990 0.00 0.00 +0.0000 0.00%
A^bought^great 7122 0.00 0.00 +0.0000 0.00%
A^great^i 190625 0.00 0.00 +0.0000 0.00%
A^i^is 76227 0.00 0.00 +0.0000 0.00%
A^is^new 140536 0.00 0.00 +0.0000 0.00%
A^new^online 69117 0.00 0.00 +0.0000 0.00%
A^online^only 173498 0.00 0.00 +0.0000 0.00%
A^only^polo 51059 0.00 0.00 +0.0000 0.00%
A^polo^sale 131483 0.00 0.00 +0.0000 0.00%
A^sale^shirt 191329 0.00 0.00 +0.0000 0.00%
A^shirt^temporary 81555 0.00 0.00 +0.0000 0.00%
A^temporary^test 90632 0.00 0.00 +0.0000 0.00%
A^test^that 13689 0.00 0.00 +0.0000 0.00%
A^that^this 127863 0.00 0.00 +0.0000 0.00%
A^this^was 22011 0.00 0.00 +0.0000 0.00%
Constant 116060 0.00 0.00 +0.1465 0.00%
A^only 62951 0.00 1.00 -0.0490 -29.47%
A^temporary 44641 0.00 1.00 -0.0490 -29.47%
例如,^bought^great
實際上從未出現在任何原始輸入行。難道我做錯了什麼?
感謝您的信息和提示! – screechOwl 2014-10-17 11:50:36
看起來不重要,但提示非常有價值! – 2015-02-04 00:21:09