我有這樣的URL列表:分割字符串或者
mydata <- read.table(header=TRUE, text="
Id
https://www.example.com/dp/c/830216013?q=%3Arelevance%3Abrickpattern%3ADecorative%2FArt+Deco%3Abrickpattern%3AFloral%3Abrickpattern%3AGeometric%3Abrickpattern%3AGraphic%3Abrickpattern%3ATropical%3Aprice%3A300%2C10500&page=7&gridValue=4
https://www.example.com/dp/c/830216013?q=%3Arelevance%3Averticalsizegroupformat%3AIN%2040%3Averticalcolorfamily%3ABlack%3Averticalcolorfamily%3ABlue%3Averticalcolorfamily%3AWhite
https://www.example.com/dp/c/830316016?q=%3Arelevance%3Averticalcolorfamily%3AWhite&gclid=CjwKEAjw9_jJBRCXycSarr3csWcSJABthk07W_H0RxQtOPZX7VdD9CSmK4S01BMYdXbtc0XxC0OeChoCky_w_wcB
https://www.example.com/dp/c/830216013?q=%3Arelevance%3Abrand%3AFLYING%20MACHINE%3Abrand%3AMUFTI%3Abrand%3AUNITED%20COLORS%20OF%20BENETTON
https://www.example.com/dp/c/830216013?q=%3Arelevance%3Averticalsizegroupformat%3AIN%2038%3Averticalsizegroupformat%3AIN%2039%3Averticalsizegroupformat%3AIN%20M%3Averticalsizegroupformat%3AUK%2039%3Averticalsizegroupformat%3AUK%20M%3Averticalsizegroupformat%3AUK%20S%3Averticalsizegroupformat%3AUS%20M%3Averticalsizegroupformat%3AUS%20S%3Abrickpattern%3ASolid%3Averticalcolorfamily%3ABlack%3Averticalcolorfamily%3AWhite
https://www.example.com/dp/c/830216013?q=%3Aprce-asc%3Abricksleeve%3AShort%3Aprice%3A300%2C10500&page=2&gridValue=4
https://www.example.com/dp/c/830216013??q=%3Aprce-asc%3Abrand%3AUS+POLO%3Abricksleeve%3AShort%3Aprice%3A300%2C10500
https://www.example.com/dp/c/830216013?q=%3Arelevance%3Abrand%3AAJIO%3Abrand%3ABASICS%3Abrand%3ACelio%3Abrand%3ADNMX%3Abrand%3AGAS%3Abrand%3ALEVIS%3Abrand%3ANETPLAY%3Abrand%3ASIN%3Abrand%3ASUPERDRY%3Abrand%3AUS%20POLO%3Abrand%3AVIMAL%3Abrand%3AVIMAL%20APPARELS%3Abrand%3AVOI%20JEANS
https://www.example.com/dp/c/830216013?q=%3Arelevance%3Abrand%3ABritish+Club%3Abrand%3ACelio%3Abrand%3AFLYING+MACHINE%3Aprice%3A300%2C10500&page=1&gridValue=4
")
我需要拉出的參數值,喜歡的品牌,verticalcolorfamily,Q =從等的URL。這些參數是在網站上應用的過濾器。
我在尋找的輸出是一個有三列的數據框架:參數,值和值的出現頻率。對於實施例:
parameter | value | frequency
----------|----------------|----------
brand | FLYING+MACHINE | 2
q= | relevance | 5
price | 300%2C10500 | 2
brand | BASICS | 1
目前我能夠想到的是收集每個網址如由交替的「%3A」的值作爲一個分隔符分隔的字符串載體:[Q =%3Arelevance,brickpattern%3ADecorative%2FArt + Deco,brickpattern%3AFloral,brickpattern%3AGeometric,brickpattern%3AGraphic,brickpattern%3atropical,價格%3A300%2C10500]。
然後將每個元素在數據幀中的一列,然後再次通過「%3A」分開並通過做的基團。 其他方法的建議將非常感激。 另外,如果我應該使用這種方法,我不知道使用交替'%3A'作爲分隔符的方法。
不清楚你如何讓你的輸出 – Sotos
@Sotos每個URL從「Q =」開始採取對通過「%3A」分隔數據的第一個數據是第二數據是值的參數。 –
請看看[urltools(https://cran.r-project.org/web/packages/urltools/vignettes/urltools.html),它可能包含你正在試圖達到的目標。 – sebastianmm