2017-07-12 68 views
0

使用Python3我的要求是從目錄中讀取電子郵件文件並在其中過濾Html標籤。閱讀python中的巨大文本時遇到錯誤

我設法給它做一個大extent.When我嘗試閱讀我的輸出的內容,它提供了一個錯誤

for line in output.splitlines(): 
AttributeError: 'int' object has no attribute 'splitlines' 

for file in glob.glob('spam/*.*'): 
    output = os.system("python html2txt.py " + file) 
    for line in output.splitlines(): 
    print(line) 

當我打印輸出,它顯示過濾文本。任何幫助表示讚賞。

+7

閱讀的'os.system' ......還有,爲什麼你需要從蟒蛇內運行'python' OS命令的返回值的文檔?導入模塊... –

回答

1

試試這個爲您提供的代碼替換:

import glob 

files = glob.glob('spam/*.*') 

for f in files: 
    with open(f) as spam_file: 
     for line in spam_file: 
      print(line) 

如果這些文件是HTML文件,我會建議尋找到BeautifulSoup

0

os.system(command)的返回值是系統相關的,它假定返回由int表示的(編碼的)過程出口值。閱讀更多here

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

但沒有系統它返回一個str和方法splitlines()是海峽方法。閱讀更多here

要調用一個int這是一個str方法,爲什麼你的錯誤:

AttributeError: 'int' object has no attribute 'splitlines'

0

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command. The shell is given by the Windows environment variable COMSPEC: it is usually cmd.exe, which returns the exit status of the command run; on systems using a non-native shell, consult your shell documentation. python docs

所以你output變量是不是文件的結果被解析的整數 html2txt.py腳本。

爲什麼在當前進程之外運行另一個python腳本?難道你不能僅僅從這個模塊中導入正在做這項工作的任何類型的函數嗎?

也有一個email module,可以幫助你