1
我想解析一個破解的html頁面,其中有註釋,並且所有着名的htmlparsers像beautifulsoup,lxml和HTMLParser都給出了語法錯誤。以下是代碼。如何忽略損壞的代碼部分並解析頁面的其餘部分?解析python中的損壞的html頁面
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<script language="JavaScript">
<!--
function setTimeOffsetVars (Link) {
// code removed
}
<!-- Image Preloader - takes an array of images to preload -->
function warningCheck(e, warnMsg) {
// code removed
}
-->
</script>
</head>
<body topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0" marginwidth="0" marginheight="0">
<!-- lot of useful code -->
</body></html>