你真的SHLD在XML命名空間以及它們如何R中工作,也XPath的一般閱讀起來。另外,xml2
是一個較新的XML pkg,有一些很好的功能,你應該看看。
library(xml2)
# read the doc
doc <- read_xml("http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=year(NEW_DATE)%20eq%202005")
# libxml2 + R == "meh" handling of default namespaces
ns <- xml_ns_rename(xml_ns(doc), d1="default")
# all the info is in the properties tag so focus on it
props <- xml_find_all(doc, "//default:entry/default:content/m:properties", ns)
# lots of ways to extract, but this data is "regular" enough to take a
# rather simplistic approach. Extract all the node values which will be
# separated by newlines. Convert newlines to tabs, trim the whole thing
# and read it in as a table.
dat <- read.table(text=trimws(gsub("\n", "\t", unlist(lapply(props, xml_text)))),
sep="\t", stringsAsFactors=FALSE)
# column names wld be good so build those from one property node
colnames(dat) <- xml_name(xml_children(props[[1]]))
# boom: done
str(dat)
## 'data.frame': 250 obs. of 14 variables:
## $ Id : int 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 ...
## $ NEW_DATE : chr " 2005-11-14T00:00:00" " 2005-11-10T00:00:00" " 2005-11-15T00:00:00" " 2005-11-17T00:00:00" ...
## $ BC_1MONTH : num 3.93 3.89 4.01 3.98 4 ...
## $ BC_3MONTH : num 4.02 3.97 4.01 4.01 4 ...
## $ BC_6MONTH : num 4.35 4.3 4.34 4.3 4.3 ...
## $ BC_1YEAR : num 4.4 4.34 4.38 4.32 4.34 ...
## $ BC_2YEAR : num 4.5 4.44 4.47 4.37 4.42 ...
## $ BC_3YEAR : num 4.52 4.48 4.5 4.39 4.43 ...
## $ BC_5YEAR : num 4.54 4.49 4.51 4.39 4.43 ...
## $ BC_7YEAR : num 4.57 4.51 4.52 4.42 4.45 ...
## $ BC_10YEAR : num 4.61 4.55 4.56 4.46 4.49 ...
## $ BC_20YEAR : num 4.9 4.85 4.83 4.75 4.77 ...
## $ BC_30YEAR : logi NA NA NA NA NA NA ...
## $ BC_30YEARDISPLAY: int 0 0 0 0 0 0 0 0 0 0 ...