我試圖解析HTML文件(demo.html使所有相關鏈接絕對這裏是我嘗試做這在Python腳本 -解析HTML編輯鏈接
from bs4 import BeautifulSoup
f = open('demo.html', 'r')
html_text = f.read()
f.close()
soup = BeautifulSoup(html_text)
for a in soup.findAll('a'):
for x in a.attrs:
if x == 'href':
temp = a[x]
a[x] = "http://www.esplanade.com.sg" + temp
for a in soup.findAll('link'):
for x in a.attrs:
if x == 'href':
temp = a[x]
a[x] = "http://www.esplanade.com.sg" + temp
for a in soup.findAll('script'):
for x in a.attrs:
if x == 'src':
temp = a[x]
a[x] = "http://www.esplanade.com.sg" + temp
f = open("demo_result.html", "w")
f.write(soup.prettify().encode("utf-8"))
但是,輸出文件demo_result.html包含了許多意想不到的變化。例如,
<script type="text/javascript" src="/scripts/ddtabmenu.js" />
/***********************************************
* DD Tab Menu script- (c) Dynamic Drive DHTML code library (www.dynamicdrive.com)
* + Drop Down/ Overlapping Content-
* This notice MUST stay intact for legal use
* Visit Dynamic Drive at http://www.dynamicdrive.com/ for full source code
***********************************************/
</script>
變化
<script src="http://www.esplanade.com.sg/scripts/ddtabmenu.js" type="text/javascript">
</script>
</head>
<body>
<p>
/***********************************************
* DD Tab Menu script- (c) Dynamic Drive DHTML code library (www.dynamicdrive.com)
* + Drop Down/ Overlapping Content-
* This notice MUST stay intact for legal use
* Visit Dynamic Drive at http://www.dynamicdrive.com/ for full source code
***********************************************/
有人可以告訴我我要去哪裏嗎?
感謝和最熱烈的問候。
它在我的最後工作正常。 – duck
@ user1471175 - 你是什麼意思?它是否只是轉換鏈接,而不是像我在我的問題中提到的那樣更改HTML的其他部分? –
對不起,我正在尋找錯誤的錯誤:) – duck