解析html代碼並打印出來

-3

這個html頁面（PART CODE）帶有多個（a href =「https://twitter.com/$name」）我需要解析所有$名稱並在頁面中打印能做到這一點解析html代碼並打印出來

<td>Apr 01 2011<br><b>527 
    </b> 
</td> 
<td> 
              <a href="https://twitter.com/al_rasekhoon" class="twitter-follow-button" data-show count="false" data-lang="" data-width="60px" > al_rasekhoon</a> 
</td>         
</tr> 
    <tr class="rowc"><td colspan="11"></td></tr>

來源

2012-09-09 LeoSam

什麼是「多」？什麼是「$名稱」？ – tomsv

多意味着有更多的元素來解析像 – LeoSam

他意味着多個元素。 – Asciiom

你需要循環在你的$ names數組並打印正確a標籤爲數組中的每個條目是這樣的：。

<?php foreach($names as $name){ ?> 
    <a href="https://twitter.com<?php echo $name ?>"><?php echo $name ?></a> 
<?php } ?>

來源

2012-09-09 08:32:22 Asciiom

可以說我需要從頁面names.html解析如何我可以將此應用到此代碼 – LeoSam

我不知道如果我正確理解你，但這似乎棘手。最好在某個地方有一個名稱數組，並用它來打印names.html中的內容以及twitter鏈接頁面的內容。 – Asciiom

這個答案已被接受，但它似乎並沒有解決OP問題。 – FilmJ

聽起來像屏幕抓取，你爲此需要遍歷DOM。RE將非常不可靠。

DOMDocument可能會幫助您，但您可能需要查看庫的屏幕抓取，例如BeautifulSoup（或某些PHP equiv）。

來源

2012-09-09 08:47:59 FilmJ

'你**需要**來遍歷DOM' - nope，'explode'得到了你的覆蓋，無需爲此使用html解析器:) – l4mpi

我不確定這會覆蓋你所有的基礎，如果你有一個更復雜和嵌套的內容（或小應用程序標籤？）的通用網頁，但對於OP發佈的內容，它可能會工作得很好。 – FilmJ

啊，我沒有在我的答案中考慮過小應用程序標籤 - 但只是在' l4mpi

如果我理解正確，你從某處獲取html頁面並且想要提取所有鏈接的twitter用戶？您可以解析html代碼，也可以使用一些字符串拆分來完成此操作。此代碼是未經測試，但應該給你一個想法：

$input = '(the html code)'; 
$links = explode('<a ', $input); //split input by start of link tags 
for ($i = 0; $i < count($links); $i++) { 
    //cut off everything after the closing '>' 
    $links[$i] = explode('>', $links[$i], 2)[0] 
    //skip this link if it doesn't go to twitter.com 
    if (strpos($links[$i], 'href="twitter.com/') === False) { continue; } 
    //split by the 'href' attribute and keep everything after 'twitter.com' 
    $links[$i] = explode('href="twitter.com/', $links[$i], 2)[1] 
    //cut off everything after the " ending the href attribute 
    $links[$i] = explode('"', $links[$i], 2)[0] 
    //now $links[$i] should contain the twitter username 
    echo $links[$i] 
}

注：如果有其他鏈接到Twitter是不是主要頁面或用戶頁面上，他們將獲得印太（例如，如果頁面鏈接到twitter常見問題）。你需要手動過濾它們。

php糟透了，讓我們在python中做到這一點！

input = '(the html code)' 
links = [l.split(">", 1)[0] for l in input.split("<a ")} 
twitter_links = [l for l in links if 'href="twitter.com/' in l] 
twitter_hrefs = [l.split('href="twitter.com/', 1)[1] for l in twitter_links] 
users = [l.split('"', 1)[0] for l in twitter_hrefs] 
print '\n'.join(users)

來源

2012-09-09 10:55:30 l4mpi

解析html代碼並打印出來

回答

相關問題