Python：如何將輸出捕獲到文本文件？（現在只捕獲了530條線中的25條）

我已經在SO和相當數量的搜索和閱讀上做了相當多的潛伏，但我還必須承認在編程方面是一個相對的noob。我正在努力學習，所以我一直在玩Python的NLTK。在下面的腳本中，我可以讓所有的東西都起作用，除了它只寫出多屏輸出的第一個屏幕，至少我是這麼想的。Python：如何將輸出捕獲到文本文件？（現在只捕獲了530條線中的25條）

這裏的腳本：

#! /usr/bin/env python 

import nltk 

# First we have to open and read the file: 

thefile = open('all_no_id.txt') 
raw = thefile.read() 

# Second we have to process it with nltk functions to do what we want 

tokens = nltk.wordpunct_tokenize(raw) 
text = nltk.Text(tokens) 

# Now we can actually do stuff with it: 

concord = text.concordance("cultural") 

# Now to save this to a file 

fileconcord = open('ccord-cultural.txt', 'w') 
fileconcord.writelines(concord) 
fileconcord.close()

而這裏的輸出文件的開頭：

Building index... 
Displaying 25 of 530 matches: 
y .   The Baobab Tree : Stories of Cultural Continuity The continuity evident 
regardless of ethnicity , and the cultural legacy of Africa as well . This Af

缺少什麼我在這裏得到寫入文件的整個530場比賽？

來源

2012-06-15 John Laudun

text.concordance(self, word, width=79, lines=25)似乎有其他參數manual。

我看不出有什麼方法來提取一致性指數的大小，然而，concordance printing code似乎有這一部分：lines = min(lines, len(offsets))，因此，你可以簡單地傳遞sys.maxint作爲最後一個參數：

concord = text.concordance("cultural", 75, sys.maxint)

補充：

現在看着你原來的代碼，我看不到它可以工作的方式。 text.concordance不會返回任何內容，但會使用print將所有內容輸出到stdout。因此，簡單的辦法是重定向標準輸出到你的文件，像這樣：

import sys 

.... 

# Open the file 
fileconcord = open('ccord-cultural.txt', 'w') 
# Save old stdout stream 
tmpout = sys.stdout 
# Redirect all "print" calls to that file 
sys.stdout = fileconcord 
# Init the method 
text.concordance("cultural", 200, sys.maxint) 
# Close file 
fileconcord.close() 
# Reset stdout in case you need something else to print 
sys.stdout = tmpout

另一種選擇是直接使用相應的類和省略文字包裝。只需複製here中的位並將它們與here中的位合併即可完成。

來源

2012-06-15 03:49:21 bezmax

有趣。當我添加額外的參數時，我得到一個空白文件和以下響應：「Traceback（最近呼叫最後一個）：文件」concordance.py「，第23行，在 fileconcord.writelines（concord） TypeError：writelines ）需要一個可迭代的參數（對不起，我被困在一起 - 我找不到在這個評論空間中輸入多個返回值的方法）。 –

另外一個發現：第二個參數允許我設置上下文的數量：當我將它改爲250時，每行總共有250字節。甜！ –

@JohnLaudun奇怪，現在我不明白你原來的代碼如何工作，看nltk來源 - 顯然不應該。 'text.concordance'不返回任何內容，它使用'print'打印所有內容。因此，我猜你可以重定向stdout的輸出。有關詳細信息，請參閱我的帖子的新增內容 – bezmax

更新：

我發現這個write text.concordance output to a file Options 從ntlk用戶組。這是從2010年，並指出：

Documentation for the Text class says: "is intended to support initial exploration of texts (via the interactive console). ... If you wish to write a program which makes use of these analyses, then you should bypass the Text class, and use the appropriate analysis function or class directly instead."

如果沒有已經在包裝從那以後改變了，這可能是你的問題的根源。

---以前---

使用writelines()我沒有看到書面的文件有問題：

file.writelines(sequence)

Write a sequence of strings to the file. The sequence can be any iterable object producing strings, typically a list of strings. There is no return value. (The name is intended to match readlines(); writelines() does not add line separators.)

注意斜體部分，你檢查不同的輸出文件編輯？也許數據在那裏，但由於缺少行尾分隔符而無法正確顯示？

您確定這部分是生成您要輸出的數據嗎？

concord = text.concordance("cultural")

我不熟悉nltk，所以我只是要求爲消除該問題可能來源的一部分。

來源

2012-06-15 03:06:39 Levon

對不起，顯示我的理解的早期狀態，但你有什麼建議如何在這裏使用'writelines'？我做了，實際上是在腳本的早期版本中嘗試，並沒有運氣。 –

另外，我還沒有遇到過這一系列的討論，所以，首先，謝謝。顯然，我將不得不圍繞Text類包圍我的頭。 –

@JohnLaudun我認爲你正確使用'writelines'。從以前的關於ntlk用戶列表的討論中可以看出，你不能簡單地打印出數據。不幸的是，我不熟悉'ntlk'來提出任何建議。自那時起，ntlk可能已經改變了？我不得不在那裏將數據轉儲到文件中。 – Levon

Python：如何將輸出捕獲到文本文件？ （現在只捕獲了530條線中的25條）

回答

相關問題

Python：如何將輸出捕獲到文本文件？（現在只捕獲了530條線中的25條）