0
我正在使用r中的R獲得類別(維基百科頁面的底部大部分)。我已經使用SelectorGadget來標識用於類別提取的html節點。我使用的代碼如下如何使用Rvest中的R獲取Wikipedia中的「Categories」?
thepage <- read_html("https://en.wikipedia.org/wiki/San_Diego")
Categories <- thepage %>%
html_nodes("#mw-normal-catlinks") %>%
html_text()
Categories
得到的結果如下:
"Categories: San Diego1769 establishments in California1850 establishments in CaliforniaCities in San Diego County, CaliforniaCounty seats in CaliforniaIncorporated cities and towns in CaliforniaPopulated coastal places in CaliforniaPopulated places established in 1769San Antonio-San Diego Mail LineSan Diego County, CaliforniaSan Diego metropolitan areaSpanish mission settlements in North AmericaSpecial economic zones of the United StatesStagecoach stops in the United States"
正如你可以看到,有沒有分隔符的類別區分。第一類是「聖地亞哥」,第二類是「加利福尼亞州的1769個機構」。我如何在列表中獲得這些類別或以某種方式分離?