我處理的HTML看起來像這樣grep來提取出從HTML
<a class="title may-blank" data-event-action="title" href="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/" tabindex="1" data-href-url="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/" data-inbound-url="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/?utm_content=title&utm_medium=hot&utm_source=reddit&utm_name=frontpage" rel="">We can play singleplayer games OFF THE INTERNET? Are they seriously that out of touch to advertise this?</a>
多條線路一樣,
我只想要那個引號之間的東西一律在href="http://xxxxxxxx"
和rel="">yyyyyyyyyy
中,其餘是不必要的。
標識像他們這樣的輸出,對於每一個塊的新線之上
<a href="http://xxxxxxxx" rel="">yyyyyyyyyy</a>
任何想法,我將如何得到解決這樣做呢?
它看起來像一個reddit鏈接,因此您可能還想查看[reddit API](https://www.reddit.com/dev/api/)而不是手動解析html – user3151902
請參見https:// stackoverflow.com/a/1732454/1682509 – Reeno