2012-06-04 196 views
-2

我沒有使用java.I希望在使用iText的Java庫PDF文件閱讀PDF表格處理太多的想法。如何進行?如何使用iText java讀取PDF中的表格?

+1

它實際上會幫助,如果你需要指定你的問題,添加你已經做了一些什麼源代碼,你已經試過什麼也沒有工作至今。 – brimborium

+0

我沒有使用java.i已經使用命令行(Java的罐子PDFBOX-APP-xyzjar ExtractText [OPTIONS] [文本文件])爲PDF轉換成某種格式從中我能夠處理該PDF處理多想法table.Also我已經使用http://www.roseindia.net/tutorial/java/itext/convertpdfToTextFile.html中的代碼。 –

回答

-3

我的解決方案

package com.geek.tutorial.itext.table; 
import java.io.FileOutputStream; 
import com.lowagie.text.pdf.PdfPTable; 
import com.lowagie.text.pdf.PdfPCell; 
import com.lowagie.text.pdf.PdfWriter; 
import com.lowagie.text.Document; 
import com.lowagie.text.Paragraph; 

public class SimplePDFTable 
{ 
    public SimplePDFTable() throws Exception 
    { 
     Document document = new Document(); 
     PdfWriter.getInstance(document, 
      new FileOutputStream("SimplePDFTable.pdf")); 
     document.open(); 
     PdfPTable table = new PdfPTable(2); // Code 1 
     // Code 2 
     table.addCell("1"); 
     table.addCell("2"); 
     // Code 3 
     table.addCell("3"); 
     table.addCell("4"); 
     // Code 4 
     table.addCell("5"); 
     table.addCell("6"); 
     // Code 5 
     document.add(table);   
     document.close(); 
    } 

    public static void main(String[] args) 
    {  
     try 
     { 
      SimplePDFTable pdfTable = new SimplePDFTable(); 
     } 
     catch(Exception e) 
     { 
      System.out.println(e); 
     } 
    } 
} 
+0

vijaykamma這個代碼是在PDF寫字桌......我想閱讀PDF表格 –

6

您可以提取從內容流中的文本,但對於普通的PDF文件,其結果將是純文本(沒有任何結構)。如果頁面上有表格,該表格將不會被識別。你會得到內容和一些空白空間,但這不是一個表格結構!只有你有一個帶標籤的PDF,你才能獲得一個XML文件。如果PDF包含被識別爲表格標籤的標籤,這將在PDF中反映出來。

這就是我發現here

+0

@soumitra仍然沒有解決方案? –

1

對於從PDF文件讀取表的內容,您只需要通過使用任何API將PDF轉換成文本文件(我用的iText的PdfTextExtracter.getTextFromPage())然後通過Java程序讀取該txt文件。讀完之後,主要任務就完成了。你必須篩選所需的數據,您可以通過持續使用String類的拆分方法,直到你找到你想要的記錄做。

下面是我的代碼中,我已經提取從PDF文件中的記錄的一部分,並將其寫入到.csv文件。您可以查看PDF文件瀏覽:http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf

public static void genrateCsvMonth_Region(String pdfpath, String csvpath) { 
     try { 
      String line = null; 
      // Appending Header in CSV file... 
      BufferedWriter writer1 = new BufferedWriter(new FileWriter(csvpath, 
        true)); 
      writer1.close(); 
      // Checking whether file is empty or not.. 
      BufferedReader br = new BufferedReader(new FileReader(csvpath)); 
         if ((line = br.readLine()) == null) { 
       BufferedWriter writer = new BufferedWriter(new FileWriter(
         csvpath, true)); 
       writer.append("REGION,"); 
       writer.append("YEAR,"); 
       writer.append("MONTH,"); 
       writer.append("THERMAL,"); 
       writer.append("NUCLEAR,"); 
       writer.append("HYDRO,"); 
       writer.append("TOTAL\n"); 
       writer.close(); 
      } 
      // Reading the pdf file.. 
      PdfReader reader = new PdfReader(pdfpath); 
      BufferedWriter writer = new BufferedWriter(new FileWriter(csvpath, 
        true)); 

      // Extracting records from page into String.. 
      String page = PdfTextExtractor.getTextFromPage(reader, 1); 
      // Extracting month and Year from String.. 
      String period1[] = page.split("PEROID"); 
      String period2[] = period1[0].split(":"); 
      String month[] = period2[1].split("-"); 
      String period3[] = month[1].split("ENERGY"); 
      String year[] = period3[0].split("VIS"); 

      // Extracting Northen region 
      String northen[] = page.split("NORTHEN REGION"); 
      String nthermal1[] = northen[0].split("THERMAL"); 
      String nthermal2[] = nthermal1[1].split(" "); 

      String nnuclear1[] = northen[0].split("NUCLEAR"); 
      String nnuclear2[] = nnuclear1[1].split(" "); 

      String nhydro1[] = northen[0].split("HYDRO"); 
      String nhydro2[] = nhydro1[1].split(" "); 

      String ntotal1[] = northen[0].split("TOTAL"); 
      String ntotal2[] = ntotal1[1].split(" "); 

      // Appending filtered data into CSV file.. 
      writer.append("NORTHEN" + ","); 
      writer.append(year[0] + ","); 
      writer.append(month[0] + ","); 
      writer.append(nthermal2[4] + ","); 
      writer.append(nnuclear2[4] + ","); 
      writer.append(nhydro2[4] + ","); 
      writer.append(ntotal2[4] + "\n"); 

      // Extracting Western region 
      String western[] = page.split("WESTERN"); 

      String wthermal1[] = western[1].split("THERMAL"); 
      String wthermal2[] = wthermal1[1].split(" "); 

      String wnuclear1[] = western[1].split("NUCLEAR"); 
      String wnuclear2[] = wnuclear1[1].split(" "); 

      String whydro1[] = western[1].split("HYDRO"); 
      String whydro2[] = whydro1[1].split(" "); 

      String wtotal1[] = western[1].split("TOTAL"); 
      String wtotal2[] = wtotal1[1].split(" "); 

      // Appending filtered data into CSV file.. 
      writer.append("WESTERN" + ","); 
      writer.append(year[0] + ","); 
      writer.append(month[0] + ","); 
      writer.append(wthermal2[4] + ","); 
      writer.append(wnuclear2[4] + ","); 
      writer.append(whydro2[4] + ","); 
      writer.append(wtotal2[4] + "\n"); 

      // Extracting Southern Region 
      String southern[] = page.split("SOUTHERN"); 

      String sthermal1[] = southern[1].split("THERMAL"); 
      String sthermal2[] = sthermal1[1].split(" "); 

      String snuclear1[] = southern[1].split("NUCLEAR"); 
      String snuclear2[] = snuclear1[1].split(" "); 

      String shydro1[] = southern[1].split("HYDRO"); 
      String shydro2[] = shydro1[1].split(" "); 

      String stotal1[] = southern[1].split("TOTAL"); 
      String stotal2[] = stotal1[1].split(" "); 

      // Appending filtered data into CSV file.. 
      writer.append("SOUTHERN" + ","); 
      writer.append(year[0] + ","); 
      writer.append(month[0] + ","); 
      writer.append(sthermal2[4] + ","); 
      writer.append(snuclear2[4] + ","); 
      writer.append(shydro2[4] + ","); 
      writer.append(stotal2[4] + "\n"); 

      // Extracting eastern region 
      String eastern[] = page.split("EASTERN"); 

      String ethermal1[] = eastern[1].split("THERMAL"); 
      String ethermal2[] = ethermal1[1].split(" "); 

      String ehydro1[] = eastern[1].split("HYDRO"); 
      String ehydro2[] = ehydro1[1].split(" "); 

      String etotal1[] = eastern[1].split("TOTAL"); 
      String etotal2[] = etotal1[1].split(" "); 
      // Appending filtered data into CSV file.. 
      writer.append("EASTERN" + ","); 
      writer.append(year[0] + ","); 
      writer.append(month[0] + ","); 
      writer.append(ethermal2[4] + ","); 
      writer.append(" " + ","); 
      writer.append(ehydro2[4] + ","); 
      writer.append(etotal2[4] + "\n"); 

      // Extracting northernEastern region 
      String neestern[] = page.split("NORTH"); 

      String nethermal1[] = neestern[2].split("THERMAL"); 
      String nethermal2[] = nethermal1[1].split(" "); 

      String nehydro1[] = neestern[2].split("HYDRO"); 
      String nehydro2[] = nehydro1[1].split(" "); 

      String netotal1[] = neestern[2].split("TOTAL"); 
      String netotal2[] = netotal1[1].split(" "); 

      writer.append("NORTH EASTERN" + ","); 
      writer.append(year[0] + ","); 
      writer.append(month[0] + ","); 
      writer.append(nethermal2[4] + ","); 
      writer.append(" " + ","); 
      writer.append(nehydro2[4] + ","); 
      writer.append(netotal2[4] + "\n"); 
      writer.close(); 

     } catch (IOException ioe) { 
      ioe.printStackTrace(); 
     } 

    }