我無法解析用美麗的湯KML文件(XML)。此代碼段應該有我的樣品中的每一個級別LXML返回非零數量的迭代2頁的表XML解析器返回0和數量應爲3解析KML與美麗的湯
from bs4 import BeautifulSoup
url = "sample.kml"
with open(url,'r') as page:
soup = BeautifulSoup(page, "lxml")
tables = soup.find_all('table')
print(len(tables))
for table in tables:
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('td')
該第一樣本腳本返回使用2個表代替3- lxml和0與XML解析器。
soup = BeautifulSoup(page, "xml")
placemark = soup.find_all('Placemark')
print(len(placemark))
for place in placemark:
tables = place.find_all('table')
print(len(tables))
for table in tables:
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('td')
穿越我最初開始尋找這LEN(表)返回2我知道是假的應該是表樹和約92,000psi讓我發現了另一個標籤,開始通過樹步進這是(返回正確的計數),然後試圖找到每個標籤中他們全部返回零的行和列,這讓我感到吃驚。我打得四處不同的解析器最終確定XML不過是一個合適的仍然是無法找到表的正確量儘管能夠使用re.search找到他們或崇高的文本搜索,這繼而導致我去檢查它的方法可能已被封裝但無濟於事。我很困難,似乎找不到使用find_all(「TAG」)方法訪問92,000個表的方法。有什麼建議麼?
樣品KML
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document id="laaSECS" xsi:schemaLocation="http://www.opengis.net/kml/2.2 http://schemas.opengis.net/kml/2.2.0/ogckml22.xsd http://www.google.com/kml/ext/2.2 http://code.google.com/apis/kml/schema/kml22gx.xsd">
<name>laaSECS</name>
<Snippet maxLines="0"></Snippet>
<Style id="PolyStyle00">
<LabelStyle>
<color>00000000</color>
<scale>0</scale>
</LabelStyle>
<LineStyle>
<color>ff7f5555</color>
<width>0.2</width>
</LineStyle>
<PolyStyle>
<color>ffc5d9fa</color>
<fill>0</fill>
</PolyStyle>
</Style>
<Style id="PolyStyle000">
<LabelStyle>
<color>00000000</color>
<scale>0</scale>
</LabelStyle>
<LineStyle>
<color>ff7f5555</color>
<width>0.2</width>
</LineStyle>
<PolyStyle>
<color>ffc5d9fa</color>
<fill>0</fill>
</PolyStyle>
</Style>
<StyleMap id="PolyStyle001">
<Pair>
<key>normal</key>
<styleUrl>#PolyStyle00</styleUrl>
</Pair>
<Pair>
<key>highlight</key>
<styleUrl>#PolyStyle000</styleUrl>
</Pair>
</StyleMap>
<Folder id="FeatureLayer0">
<name>laaSECS</name>
<Snippet maxLines="0"></Snippet>
<Placemark id="ID_00000">
<name>AL</name>
<Snippet maxLines="0"></Snippet>
<description><![CDATA[<html xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<head>
<META http-equiv="Content-Type" content="text/html">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body style="margin:0px 0px 0px 0px;overflow:auto;background:#FFFFFF;">
<table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-collapse:collapse;padding:3px 3px 3px 3px">
<tr style="text-align:center;font-weight:bold;background:#9CBCE2">
<td>AL</td>
</tr>
<tr>
<td>
<table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-spacing:0px; padding:3px 3px 3px 3px">
<tr>
<td>FID</td>
<td>0</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>STATE</td>
<td>AL</td>
</tr>
<tr>
<td>MER</td>
<td>25</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>TWP</td>
<td>22</td>
</tr>
<tr>
<td>TDIR</td>
<td>N</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>RNG</td>
<td>4</td>
</tr>
<tr>
<td>RDIR</td>
<td>W</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>SEC</td>
<td>24</td>
</tr>
<tr>
<td>MODDATE</td>
<td>20050311</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>DATUM</td>
<td>NAD27</td>
</tr>
<tr>
<td>SOURCE</td>
<td>WhiteStar</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>MTR</td>
<td>25 22.0N 4.0W</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>]]></description>
<styleUrl>#PolyStyle001</styleUrl>
<MultiGeometry>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-88.35570867858526,32.86011073571817,0 -88.35570870147141,32.86253443065814,0 -88.35597594524225,32.86011537400984,0 -88.35570867858526,32.86011073571817,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
<Placemark id="ID_00001">
<name>AL</name>
<Snippet maxLines="0"></Snippet>
<description><![CDATA[<html xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<head>
<META http-equiv="Content-Type" content="text/html">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body style="margin:0px 0px 0px 0px;overflow:auto;background:#FFFFFF;">
<table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-collapse:collapse;padding:3px 3px 3px 3px">
<tr style="text-align:center;font-weight:bold;background:#9CBCE2">
<td>AL</td>
</tr>
<tr>
<td>
<table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-spacing:0px; padding:3px 3px 3px 3px">
<tr>
<td>FID</td>
<td>1</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>STATE</td>
<td>AL</td>
</tr>
<tr>
<td>MER</td>
<td>25</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>TWP</td>
<td>22</td>
</tr>
<tr>
<td>TDIR</td>
<td>N</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>RNG</td>
<td>4</td>
</tr>
<tr>
<td>RDIR</td>
<td>W</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>SEC</td>
<td>25</td>
</tr>
<tr>
<td>MODDATE</td>
<td>20050311</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>DATUM</td>
<td>NAD27</td>
</tr>
<tr>
<td>SOURCE</td>
<td>WhiteStar</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>MTR</td>
<td>25 22.0N 4.0W</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>]]></description>
<styleUrl>#PolyStyle001</styleUrl>
<MultiGeometry>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-88.35597594524225,32.86011537400984,0 -88.3567389068841,32.85292852502473,0 -88.35768486975799,32.84508568993779,0 -88.35570853700197,32.84511675513796,0 -88.35570867858526,32.86011073571817,0 -88.35597594524225,32.86011537400984,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
<Placemark id="ID_00002">
<name>AL</name>
<Snippet maxLines="0"></Snippet>
<description><![CDATA[<html xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<head>
<META http-equiv="Content-Type" content="text/html">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body style="margin:0px 0px 0px 0px;overflow:auto;background:#FFFFFF;">
<table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-collapse:collapse;padding:3px 3px 3px 3px">
<tr style="text-align:center;font-weight:bold;background:#9CBCE2">
<td>AL</td>
</tr>
<tr>
<td>
<table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-spacing:0px; padding:3px 3px 3px 3px">
<tr>
<td>FID</td>
<td>2</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>STATE</td>
<td>AL</td>
</tr>
<tr>
<td>MER</td>
<td>25</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>TWP</td>
<td>22</td>
</tr>
<tr>
<td>TDIR</td>
<td>N</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>RNG</td>
<td>4</td>
</tr>
<tr>
<td>RDIR</td>
<td>W</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>SEC</td>
<td>36</td>
</tr>
<tr>
<td>MODDATE</td>
<td>20050311</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>DATUM</td>
<td>NAD27</td>
</tr>
<tr>
<td>SOURCE</td>
<td>WhiteStar</td>
</tr>
<tr bgcolor="#D4E4F3">
<td>MTR</td>
<td>25 22.0N 4.0W</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>]]></description>
<styleUrl>#PolyStyle001</styleUrl>
<MultiGeometry>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-88.35768486975799,32.84508568993779,0 -88.35843183642189,32.83843382961495,0 -88.35914980106479,32.83165897171819,0 -88.35908878782671,32.83049899571662,0 -88.35570839957039,32.83056244880483,0 -88.35570853700197,32.84511675513796,0 -88.35768486975799,32.84508568993779,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
<Placemark id="ID_00003">
<name>AL</name>
<Snippet maxLines="0"></Snippet>
<description><![CDATA[<html xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
鏈接到originalfile KML FILE
請提供XML的展示問題的一小部分。一個175 MB文件不符合[mcve]的一部分! – miken32
OK @ miken32我把它帶到了500行 –