R：在LexisNexis上使用tm函數語料庫時出現下標越界數據

我正嘗試使用tm -package創建LexisNexis中的文章語料庫。的文章已經從律商聯訊導出爲.html和被解析成R與tm.plugin.lexisnexis -package像這樣：R：在LexisNexis上使用tm函數語料庫時出現下標越界數據

> library("tm") 
> library("tm.plugin.lexisnexis") 
> src <- LexisNexisSource("~/Desktop/lexisnexis.html")

繼在tm.plugin.lexisnexis -documentation的指示，我然後建立tm -package語料庫，像這樣：

> data <- Corpus(src, readerControl = list(language = NA)) 
Error in getNodeSet(tree, "//div[@class = 'c3']/p[@class = 'c1']/span[@class = 'c4']")[[1]] : 
    subscript out of bounds

這個錯誤是什麼意思，我該如何解決它？

HTML示例數據：link

來源

2016-01-06 ageil

嗯，我不知道我明白。我在.html文件中丟失了什麼，或者是src'對象不完整？ – ageil

不知道那裏發生了什麼。請在這裏尋找上述錯誤的一般解決方案http://stackoverflow.com/questions/15031338/subscript-out-of-bounds-general-definition-and-solution –

我包的作者。由於LexisNexis使用的格式未被記錄，目前它已被破解。我會盡力解決它，但如果有人提出補丁，它會更快發生。 :-)

來源

2016-01-09 21:47:11

R：在LexisNexis上使用tm函數語料庫時出現下標越界數據

回答

相關問題