2015-12-03 61 views
3

我一直在嘗試在演示PDF中使用IBM Watson文檔轉換服務,但它並未將文檔轉換爲小數位。它正在做的,是創建1答案單元,這是非常長的:文檔轉換Watson服務不起作用?

"text": "Watson is an artificially intelligent computer system capable of answering questions posed in natural language,[2] developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO and industrialist Thomas J. Watson.[3][4] The computer system was specifically developed to answer questions on the quiz show Jeopardy![5] In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings.[3][6] Watson received the first place prize of $1 million.[7] Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage[8] including the full text of Wikipedia,[9] but was not connected to the Internet during the game.[10][11] For each clue, Watson's three most probable responses were displayed on the television screen. Watson consistently outperformed its human opponents on the game's signaling device, but had trouble responding to a few categories, notably those having short clues containing only a few words. In February 2013, IBM announced that Watson software system's first commercial application would be for utilization management decisions in lung cancer treatment at Memorial Sloan- Kettering Cancer Center in conjunction with health insurance company WellPoint.[12] IBM Watson's former business chief Manoj Saxena says that 90% of nurses in the field who use Watson now follow its guidance.[13]" 

在此先感謝!

回答

6

不幸的是,該演示PDF不是最好的文檔使用:目前,應答單元是基於標題標籤(h1-h6)拆分,並且PDF不包含任何標題。 =(

如果設置conversion_targetNORMALIZED_HTML,你就可以看到轉換的PDF它分成回答單位之前,它會包含段落,但沒有標題。

在未來,我們期待中,也允許通過段落分割回答單位,但是這並沒有發佈

UPDATE:。 我們更新在演示現場的PDF文件使用一個,這是一個更好的例子

+1

你可以得到一個更更好的示例PDF格式:https://github.com/mfulgo/document-conversion-nodejs/raw/master/pub lic/data/samplePDF.pdf –

+0

嗨馬特!感謝您的幫助,它真的對我有用! -Tanmay – TajyMany