創建後訪問bytesIO對象

我正在使用slate（https://pypi.python.org/pypi/slate）試圖在目錄中提取多個pdf文本的scrapy蜘蛛。我沒有興趣將實際的PDF保存到磁盤，因此我建議查看https://docs.python.org/2/library/io.html#buffered-streams的io.bytesIO子類。基於Creating bytesIO object，我已經初始化了pdf體的bytesIO類，但現在我需要將數據傳遞給slate模塊。到目前爲止，我有：創建後訪問bytesIO對象

def save_pdf(self, response): 
    in_memory_pdf = BytesIO(response.body) 

    with open(in_memory_pdf, 'rb') as f: 
     doc = slate.PDF(f) 
     print(doc[0])

我越來越：

in_memory_pdf.read(response.body) 
TypeError: integer argument expected, got 'str'

我怎樣才能得到這個工作？

編輯：

with open(in_memory_pdf, 'rb') as f: 
TypeError: coercing to Unicode: need string or buffer, _io.BytesIO found

編輯2：

def save_pdf(self, response): 
    in_memory_pdf = BytesIO(bytes(response.body)) 
    in_memory_pdf.seek(0) 
    doc = slate.PDF(in_memory_pdf) 
    print(doc)

來源

2016-09-30 user61629

嘗試'in_memory_pdf = BytesIO（bytes（response.body））'。 – martineau

謝謝，這解決了最初的問題！ – user61629

嘗試使用['StringIO']（https://docs.python.org/2/library/stringio.html#module-StringIO）而不是'BytesIO'。還要注意的是，對於任何一個，您都不需要'with open（...）as f'，只需在使用'in_memory_pdf.seek（0）'創建後將其倒回到開頭，然後使用'in_memory_pdf'_instead_ of' F'。 – martineau

你已經知道答案了。在Python TypeError消息中明確提到並且從文檔中明確提到：

class io.BytesIO([initial_bytes])

BytesIO接受字節。你傳遞的內容。即：response.body是一個字符串。

來源

2016-09-30 20:24:04

謝謝你，非常有用的信息！我從上面使用了in_memory_pdf = BytesIO（bytes（response.body））。現在的問題是我上面添加了一個錯誤。 – user61629

您的解決方案几乎是由錯誤本身給出的。您正在傳遞字節串。如錯誤所示，與int進行協調。它應該工作，因爲字節的工作。 –

創建後訪問bytesIO對象

回答

相關問題