後鏈接這是一個find_all('a')
的結果(這是很長):解析HTML與美麗的湯讓HREF
</a>, <a class="btn text-default text-dark clear_filters pull-right group-ib" href="#" id="export_dialog_close" title="Cancel"><span class="glyphicon glyphicon-remove"></span><span>Cancel</span></a>, <a href="/ais/index/port_moves/all/include_anchs:no/ship_type:7/_:3525d580eade08cfdb72083b248185a9/in_transit:no/time_interval:1474948800.0_1475035200.00/per_page:50/portname:NOVOROSSIYSK/cb:6651/move_type:1/sort:SHIPNAME/direction:asc">Vessel Name</a>, <a href="/ais/index/port_moves/all/include_anchs:no/ship_type:7/_:3525d580eade08cfdb72083b248185a9/in_transit:no/time_interval:1474948800.0_1475035200.00/per_page:50/portname:NOVOROSSIYSK/cb:6651/move_type:1/sort:TIMESTAMP_UTC/direction:asc">Timestamp</a>, <a href="/ais/index/port_moves/all/include_anchs:no/ship_type:7/_:3525d580eade08cfdb72083b248185a9/in_transit:no/time_interval:1474948800.0_1475035200.00/per_page:50/portname:NOVOROSSIYSK/cb:6651/move_type:1/sort:PORT_NAME/direction:asc">Port</a>, <a href="/ais/index/port_moves/all/include_anchs:no/ship_type:7/_:3525d580eade08cfdb72083b248185a9/in_transit:no/time_interval:1474948800.0_1475035200.00/per_page:50/portname:NOVOROSSIYSK/cb:6651/move_type:1/sort:MOVE_TYPE_NAME/direction:asc">Port Call type</a>, <a href="/ais/index/port_moves/all/include_anchs:no/ship_type:7/_:3525d580eade08cfdb72083b248185a9/in_transit:no/time_interval:1474948800.0_1475035200.00/per_page:50/portname:NOVOROSSIYSK/cb:6651/move_type:1/sort:ELAPSED/direction:asc">Time Elapsed</a>, <a href="/en/ais/details/ships/shipid:465271/imo:9495595/mmsi:373571000/vessel:SIDER%20LUCK/_:3525d580eade08cfdb72083b248185a9" title="View details for: SIDER LUCK">SIDER LUCK</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/163/port_name:MILAZZO/_:3525d580eade08cfdb72083b248185a9" title="View details for: MILAZZO">MILAZZO</a>, <a href="/en/ais/details/ships/shipid:288753/imo:9389693/mmsi:249474000/vessel:OOCL%20ISTANBUL/_:3525d580eade08cfdb72083b248185a9" title="View details for: OOCL ISTANBUL">OOCL ISTANBUL</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/17436/port_name:AMBARLI/_:3525d580eade08cfdb72083b248185a9" title="View details for: AMBARLI">AMBARLI</a>, <a href="/en/ais/details/ships/shipid:754480/imo:9045613/mmsi:636014098/vessel:TK%20ROTTERDAM/_:3525d580eade08cfdb72083b248185a9" title="View details for: TK ROTTERDAM">TK ROTTERDAM</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/3504/port_name:DILISKELESI/_:3525d580eade08cfdb72083b248185a9" title="View details for: DILISKELESI">DILISKELESI</a>, <a href="/en/ais/details/ships/shipid:412277/imo:9039585/mmsi:353430000/vessel:SEA%20AEOLIS/_:3525d580eade08cfdb72083b248185a9" title="View details for: SEA AEOLIS">SEA AEOLIS</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/1/port_name:PIRAEUS/_:3525d580eade08cfdb72083b248185a9" title="View details for: PIRAEUS">PIRAEUS</a>, <a href="/en/ais/details/ships/shipid:346713/imo:7614599/mmsi:273327300/vessel:SOLIDAT/_:3525d580eade08cfdb72083b248185a9" title="View details for: SOLIDAT">SOLIDAT</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/883/port_name:SEVASTOPOL/_:3525d580eade08cfdb72083b248185a9" title="View details for: SEVASTOPOL">SEVASTOPOL</a>, <a href="/en/ais/details/ships/shipid:752974/imo:9195298/mmsi:636011072/vessel:OCEANPRINCESS/_:3525d580eade08cfdb72083b248185a9" title="View details for: OCEANPRINCESS">OCEANPRINCESS</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/21780/port_name:EREGLI/_:3525d580eade08cfdb72083b248185a9" title="View details for: EREGLI">EREGLI</a>, <a href="/en/ais/details/ships/shipid:201260/imo:9385075/mmsi:235102768/vessel:EMERALD%20BAY/_:3525d580eade08cfdb72083b248185a9" title="View details for: EMERALD BAY">EMERALD BAY</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ships/shipid:418956/imo:9102746/mmsi:356579000/vessel:MSC%20DON%20GIOVANNI/_:3525d580eade08cfdb72083b248185a9" title="View details for: MSC DON GIOVANNI">MSC DON GIOVANNI</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/67/port_name:CONSTANTA/_:3525d580eade08cfdb72083b248185a9" title="View details for: CONSTANTA">CONSTANTA</a>, <a href="/en/ais/details/ships/shipid:748395/imo:9460734/mmsi:622121422/vessel:WADI%20SAFAGA/_:3525d580eade08cfdb72083b248185a9" title="View details for: WADI SAFAGA">WADI SAFAGA</a>, <a href="/en/ais/details/ports/767/port_name:NOVOROSSIYSK/_:3525d580eade08cfdb72083b248185a9" title="View details for: NOVOROSSIYSK">NOVOROSSIYSK</a>, <a href="/en/ais/details/ports/997/port_name:DAMIETTA/_:3525d580eade08cfdb72083b248185a9" title="View details for: DAMIETTA">DAMIETTA</a>
我要拉出來,與開頭的字符串所/en/ais/details/ships/shipid:
如:
<a href="/en/ais/details/ships/shipid:465271/imo:9495595/mmsi:373571000/vessel:SIDER%20LUCK/_:3525d580eade08cfdb72083b248185a9"
我能夠複製這些例子(Find specific link w/ beautifulsoup或How to get Beautiful Soup to get link from href and class?),但我寧願不使用正則表達式。
到目前爲止,我有:
for i in ase: #ase is where the html is sotred
print(i.get('href')) #prints everysingle href.
總之,我的問題是我怎麼只保留有沒有使用正則表達式我感興趣的字符串href
的?
等一下,你真的可以做'element.get('attr')'而不是'element.attrs.get('attr')'?!看起來好多了! e:檢查過的文檔,它在那裏提到。不知道我爲什麼錯過了這麼久。 – n1c9
@ n1c9很好,不是嗎?有一個原因叫做「美麗」湯,哈哈。 – elethan
None - > error'中的'string',你只需要使用.get,如果你希望在你調用的節點上不存在href,考慮到你可以設置'href = True'並且使用一個css選擇器,正則表達式等等。沒有理由永遠需要使用.get,特別是調用它兩次。此外,字符串中的子字符串與以子字符串開頭的字符串不同。 –