我想創建一個正則表達式相當不成功,我正在做的是獲取任何html元素的內容(作者| byline |作家)複雜的正則表達式來提取python作者名稱
這裏是我迄今爲止
<([A-Z][A-Z0-9]*)class=\"(byLineTag|byline|author|by)\"[^>]*>(.*?)</\1>
什麼,我需要匹配
<h6 class="byline">By <a rel="author" href="http://topics.nytimes.com/top/reference/timestopics/people/e/jack_ewing/index.html?inline=nyt-per" title="More Articles by Jack Ewing" class="meta-per">JACK EWING</a> and <a rel="author" href="http://topics.nytimes.com/top/reference/timestopics/people/t/landon_jr_thomas/index.html?inline=nyt-per" title="More Articles by Landon Thomas Jr." class="meta-per">LANDON THOMAS Jr.</a></h6>
或
例子210<div class="noindex"><span class="by">By </span><span class="byline"><a href="javascript:NewWindow(575,480,'/apps/pbcs.dll/personalia?ID=sshemkus',0)" title="Email Reporter">Sarah Shemkus</a></span></div>
任何幫助將不勝感激。 -Stefan
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454 – MitMaro
不要這樣做。請參閱MitMaro的鏈接。想象一下像'
您可以發佈一些示例輸入和預期的輸出。 – Stephan