2011-12-31 32 views
2

我有以下實現上傳PDF文件到Google文檔(從GDATA API樣本取)上傳PDF文件在上傳的文件上執行OCR文本識別。但我不確定如何在gdata docs python API中啓用OCR識別。所以我的問題是: 有沒有辦法使用PDF文件上的gdata python v3.0 API來啓用OCR識別?與OCR文檔GDATA蟒蛇V3.0

回答

3

我設法讓我的PDF文件使用下面的代碼OCR'ed:

def UploadResourceSample(filename, filepath, fullpath): 
    """Upload a document, and convert to Google Docs.""" 
    client = CreateClient() 
    doc = gdata.docs.data.Resource(type='document', title=filename) 

    path = fullpath 
    print 'Selected file at: %s' % path 

    # Create a MediaSource, pointing to the file 
    media = gdata.data.MediaSource() 
    media.SetFileHandle(path, 'application/pdf') 

    # Pass the MediaSource when creating the new Resource 
    create_uri = gdata.docs.client.RESOURCE_UPLOAD_URI + '?ocr=true&ocr-language=de' 
    doc = client.CreateResource(doc, create_uri=create_uri, media=media) 
    print 'Created, and uploaded:', doc.title.text, doc.resource_id.text