2016-03-23 163 views
1

我是Python中的完整初學者。我已經嘗試了很多方法從這個問題上的stackoverflow答案,但他們都沒有在我的腳本。
我有這個小腳本可以使用,但是我無法獲得.txt文件的巨大結果,因此我可以分析數據。如何將打印輸出重定向到我電腦上的txt文件?將打印輸出重定向到Python中的.txt文件

from nltk.util import ngrams 
import collections 

with open("text.txt", "rU") as f: 
    sixgrams = ngrams(f.read().decode('utf8').split(), 2) 

result = collections.Counter(sixgrams) 
print result 
for item, count in sorted(result.iteritems()): 
    if count >= 2: 
     print " ".join(item).encode('utf8'), count 
+2

如果你是完整的初學者到Python,特別是因爲它似乎你正在做的NLP ,我建議你切換到Python 3徹底! –

回答

4

print statement in Python 2.x支持重定向(>> fileobj):

... 
with open('output.txt', 'w') as f: 
    print >>f, result 
    for item, count in sorted(result.iteritems()): 
     if count >= 2: 
      print >>f, " ".join(item).encode('utf8'), count 

在Python 3.x中,print function接受可選關鍵字參數file

print("....", file=f) 

如果您在Python 2.6+中使用from __future__ import print_function,則即使在Python 2.x中也可以使用上述方法。

5

只要做到這一點的命令行:python script.py > text.txt

1

使用的BufferedWriter你可以做這樣的

os = io.BufferedWriter(io.FileIO(pathOut, "wb")) 
os.write(result+"\n") 
for item, count in sorted(result.iteritems()): 
    if count >= 2: 
    os.write(" ".join(item).encode('utf8')+ str(count)+"\n") 

outs.flush() 
outs.close() 
0

正如安蒂提到的,你應該更喜歡python3,離開這一切煩人 python2垃圾在你身後。以下腳本適用於python2和python3。

要讀取/寫入文件,請使用io模塊中的open函數,這是 python2/python3兼容。 Allways使用with統計來打開文件等資源。 with用於包裝在Python Context Manager中的塊的執行。文件描述符具有上下文管理器實現,並將在離開with塊時自動關閉。

不依賴於蟒,如果你想讀一個文本文件,你應該知道 編碼這個文件的讀它正確的(如果您不確定嘗試utf-8 第一)。除此之外,正確的UTF-8簽名爲utf-8,模式U爲 。

#!/usr/bin/env python 
# -*- coding: utf-8; mode: python -*- 

from nltk.util import ngrams 
import collections 
import io, sys 

def main(inFile, outFile): 

    with io.open(inFile, encoding="utf-8") as i: 
     sixgrams = ngrams(i.read().split(), 2) 

    result = collections.Counter(sixgrams) 
    templ = "%-10s %s\n" 

    with io.open(outFile, "w", encoding="utf-8") as o: 

     o.write(templ % (u"count", u"words")) 
     o.write(templ % (u"-" * 10, u"-" * 30)) 

     # Sorting might be expensive. Before sort, filter items you don't want 
     # to handle, btw. place *count* in front of the tuple. 

     filtered = [ (c, w) for w, c in result.items() if c > 1] 
     filtered.sort(reverse=True) 

     for count, item in filtered: 
      o.write(templ % (count, " ".join(item))) 

if __name__ == '__main__': 
    sys.exit(main("text.txt", "out_text.txt")) 

與輸入text.txt文件:

At eight o'clock on Thursday morning and Arthur didn't feel very good 
he missed 100 € on Thursday morning. The Euro symbol of 100 € is here 
to test the encoding of non ASCII characters, because encoding errors 
do occur only on Thursday morning. 

我得到以下output_text

count  words 
---------- ------------------------------ 
3   on Thursday 
2   Thursday morning. 
2   100 € 
相關問題