2
我通過topicmodels教程R.去12頁左右,他們剝去HTML標籤和希臘字母:「XML內容似乎並不爲XML」:錯誤xmlTreeParse R中
R> library("XML")
R> remove_HTML_markup <- function(s) {
+ doc <- htmlTreeParse(s, asText = TRUE, trim = FALSE)
+ xmlValue(xmlRoot(doc))
+ }
R> remove_HTML_markup(JSS_papers[1,"description"])
Error: XML content does not seem to be XML, nor to identify a file name ...
JSS_papers
店與從期刊下載的論文集相關的元數據。 description
標記下的條目是文章的摘要。這個沒有任何標籤:
JSS_papers[1,"description"] = "The fit of a variogram model to spatially-distributed
data is often difficult to assess. A graphical diagnostic written in S-plus is
introduced that allows the user to determine both the general quality of the fit of a
variogram model, and to find specific pairs of locations that do not have measurements
that are consonant with the fitted variogram. It can help identify nonstationarity,
outliers, and poor variogram fit in general. Simulated data sets and a set of soil
nitrogen concentration data are examined using this graphical diagnostic."
它適合我。你可以發佈你的'sessionInfo()'嗎? – nograpes