0
我想用urllib python庫解析網站。我寫了這個:我無法獲得網頁中的html頁面的正文元素python
from bs4 import BeautifulSoup
from urllib.request import HTTPCookieProcessor, build_opener
from http.cookiejar import FileCookieJar
def makeSoup(url):
jar = FileCookieJar("cookies")
opener = build_opener(HTTPCookieProcessor(jar))
html = opener.open(url).read()
return BeautifulSoup(html, "lxml")
def articlePage(url):
return makeSoup(url)
Links = "http://collegeprozheh.ir/%d9%85%d9%82%d8%a7%d9%84%d9%87- %d9%85%d8%af%d9%84-%d8%b1%d9%82%d8%a7%d8%a8%d8%aa%db%8c-%d8%af%d8%b1-%d8%b5%d9%86%d8%b9%d8%aa-%d9%be%d9%86%d9%84-%d9%87%d8%a7%db%8c-%d8%ae%d9%88%d8%b1%d8%b4%db%8c%d8%af/"
print(articlePage(Links))
但是網站沒有返回body標籤的內容。 這是我的程序的結果是:
cURL = window.location.href;
var p = new Date();
second = p.getTime();
GetVars = getUrlVars();
setCookie("Human" , "15421469358743" , 10);
check_coockie = getCookie("Human");
if (check_coockie != "15421469358743")
document.write("Could not Set cookie!");
else
window.location.reload(true);
</script>
</head><body></body>
</html>
我認爲該cookie引起了這個問題。