I am trying to use beautiful soup to parse html and find all href with a specific anchor tag蟒蛇/ beautifulsoup找到所有<a href> with specific anchor text
<a href="http://example.com">TEXT</a>
<a href="http://example.com/link">TEXT</a>
<a href="http://example.com/page">TEXT</a>
all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF
edit:
for clarification looking for something similar to using the class to parse for the links
<a href="http://example.com" class="visible">TEXT</a>
<a href="http://example.com/link" class="visible">TEXT</a>
<a href="http://example.com/page" class="visible">TEXT</a>
and then using
findAll('a', 'visible')
except the HTML I am parsing doesn't have a class but always the same anchor text
trying to find a quicker way, to me this takes a little longer to process since it finds ALL href, then compares each one to the text to find a match. Preferably I could parse directly for the links required. Something like when the href has a class you can do findAll('a', 'className') – cwal
@cwal Oh, gotcha (my bad - long day :)). Try the updated version - it builds it into the filter. Does that do what you want? This will load them as a generator as opposed to loading all of them, so I believe this is the fastest you will get (as there needs to be some way up front for BS to check if a link fits your criteria). Happy to help think through another way if this doesn't work. – RocketDonkey
this certainly looks like it works! I had tried this, but without the href=true and it didnt seem to work. Unfortunately I dont have the time right now to check if it works for me but I will as soon as possible and post back my results. thank you! – cwal