2012-12-05 52 views
1

我正在使用poppler,並且我想使用poppler訪問特定頁碼的主題或標題,所以請告訴我如何使用poppler來執行此操作。如何使用poppler訪問pdfs中的主題名稱?

+0

哪個前端(API),您使用的([油嘴滑舌(HTTP://人。 freedesktop.org/~ajohnson/docs/poppler-glib/),[qt](http://people.freedesktop.org/~aacid/docs/qt4/))?我認爲你必須使用pdf的index/toc。請參閱[相關問題](http://stackoverflow.com/q/7131906/1381638)。 –

回答

0

使用glib API。不知道你想要哪個API。

我很確定沒有與特定頁面一起存儲的主題/標題。 你必須走索引,如果有的話。

Walk the index帶回溯。如果幸運的話,每個索引節點都包含一個PopplerActionGotoDest(檢查類型!)。 您可以從PopplerAction對象中獲取標題(gchar * title),並從包含的PopplerDestint page_num)中獲取頁碼。 page_num應該是該部分的第一頁。

假設您的PDF有一個包含PopplerActionGotoDest對象的索引。 然後你只需走它,檢查page_num。 如果page_num> searching_num,則返回一個步驟。 當你在正確的父母身邊時,走孩子。這應該會給你最好的搭配。 我只是做了一些代碼吧:

gchar* getTitle(PopplerIndexIter *iter, int num, PopplerIndexIter *last,PopplerDocument *doc) 
{ 
    int cur_num = 0; 
    int next; 
    PopplerAction * action; 
    PopplerDest * dest; 
    gchar * title = NULL; 
    PopplerIndexIter * last_tmp; 

    do 
    { 
      action = poppler_index_iter_get_action(iter); 
      if (action->type != POPPLER_ACTION_GOTO_DEST) { 
       printf("No GOTO_DEST!\n"); 
       return NULL; 
      } 

      //get page number of current node 
      if (action->goto_dest.dest->type == POPPLER_DEST_NAMED) { 
       dest = poppler_document_find_dest (doc, action->goto_dest.dest->named_dest); 
       cur_num = dest->page_num; 
       poppler_dest_free(dest); 
      } else { 
       cur_num = action->goto_dest.dest->page_num; 
      } 
      //printf("cur_num: %d, %d\n",cur_num,num); 

      //free action, as we don't need it anymore 
      poppler_action_free(action); 

      //are there nodes following this one? 
      last_tmp = poppler_index_iter_copy(iter); 
      next = poppler_index_iter_next (iter); 

      //descend 
      if (!next || cur_num > num) { 
       if ((!next && cur_num < num) || cur_num == num) { 
        //descend current node 
        if (last) { 
         poppler_index_iter_free(last); 
        } 
        last = last_tmp; 
       } 
       //descend last node (backtracking) 
       if (last) { 
        /* Get the the action and do something with it */ 
        PopplerIndexIter *child = poppler_index_iter_get_child (last); 
        gchar * tmp = NULL; 
        if (child) { 
         tmp = getTitle(child,num,last,doc); 
         poppler_index_iter_free (child); 
        } else { 
         action = poppler_index_iter_get_action(last); 
         if (action->type != POPPLER_ACTION_GOTO_DEST) { 
          tmp = NULL; 
         } else { 
          tmp = g_strdup (action->any.title); 
         } 
         poppler_action_free(action); 
         poppler_index_iter_free (last); 
        } 

        return tmp; 
       } else { 
        return NULL; 
       } 
      } 

      if (cur_num > num || (next && cur_num != 0)) { 
       // free last index_iter 
       if (last) { 
        poppler_index_iter_free(last); 
       } 
       last = last_tmp; 
      } 
    } 
    while (next); 

    return NULL; 
} 

的getTitle得到由名爲:poppler的的

for (i = 0; i < num_pages; i++) { 
      iter = poppler_index_iter_new (document); 
      title = getTitle(iter,i,NULL,document); 
      poppler_index_iter_free (iter); 

      if (title) { 
       printf("title of %d: %s\n",i, title); 
       g_free(title); 
      } else { 
       printf("%d: no title\n",i); 
      } 
    } 
相關問題