用golang從PDF中提取單詞？

package main 

import (
    "bytes" 
    "fmt" 

    "github.com/ledongthuc/pdf" 
) 

func main() { 
    content, err := readPdf("test.pdf") // Read local pdf file 
    if err != nil { 
     panic(err) 
    } 
    fmt.Println(content) 
    return 
} 

func readPdf(path string) (string, error) { 
    r, err := pdf.Open(path) 
    if err != nil { 
     return "", err 
    } 
    totalPage := r.NumPage() 

    var textBuilder bytes.Buffer 
    for pageIndex := 1; pageIndex <= totalPage; pageIndex++ { 
     p := r.Page(pageIndex) 
     if p.V.IsNull() { 
      continue 
     } 
     textBuilder.WriteString(p.GetPlainText("\n")) 
    } 
    return textBuilder.String(), nil 
}

來源

2017-03-14 03:20:23

我有你的lib一個錯誤，但它不可能可以張貼在'ledongthuc/pdf' Git的問題。 – LeMoussel

@LeMoussel，不知道爲什麼你不能在我的項目中創建問題。但無論如何，你可以問這裏發送錯誤，我會盡力幫你 –

@樂東Thuc：請參閱[如何從PDF中提取純文本在golang]（https://stackoverflow.com/questions/44560265/how- to-extract-plain-text-from-pdf-in-golang） – LeMoussel

我嘗試了一些去pdf庫，發現sajari/docconv作品像我期望的那樣。

使用方便，這裏是一個例子：

package main 

import (
    "fmt" 
    "log" 

    "github.com/sajari/docconv" 
) 

func main() { 
    res, err := docconv.ConvertPath("your-file.pdf") 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Println(res) 
}

來源

2017-09-18 07:39:07 Daoctor

用golang從PDF中提取單詞？

回答

相關問題