我想用BeautifulSoup
解析存儲在HTML表格中的信息並將其存儲到字典中。我已經能夠訪問表格,並遍歷這些值,但表格中仍然有很多垃圾,我不知道如何處理。用BeautifulSoup解析HTML表格數據到字典
# load the HTML file
r = requests.get("http://www.ebay.com/itm/222378225962")
soup = BeautifulSoup(r.content, "html.parser")
# navigate to the item attributes table
table = soup.find('div', 'itemAttr')
# iterate through the attribute information
attr = []
for i in table.findAll("tr"):
attr.append(i.text.strip().replace('\t', ''))
用這種方法,這就是數據的樣子。正如你所看到的,那裏有很多垃圾,一些行包含多個項目,如Year和VIN。
[u'Condition:\nUsed',
u'Seller Notes:\n\u201cExcellent Condition\u201d',
u'Year: \n\n2015\n\n VIN (Vehicle Identification Number): \n\n2G1FJ1EW2F9192023',
u'Mileage: \n\n29,000\n\n Transmission: \n\nManual',
u'Make: \n\nChevrolet\n\n Body Type: \n\nCoupe',
u'Model: \n\nCamaro\n\n Warranty: \n\nVehicle has an existing warranty',
u'Trim: \n\nSS Coupe 2-Door\n\n Vehicle Title: \n\nClear',
u'Engine: \n\n6.2L 6162CC 376Cu. In. V8 GAS OHV Naturally Aspirated\n\n Options: \n\nLeather Seats',
u'Drive Type: \n\nRWD\n\n Safety Features: \n\nAnti-Lock Brakes, Driver Airbag, Passenger Airbag, Side Airbags',
u'Power Options: \n\nAir Conditioning, Cruise Control, Power Locks, Power Windows, Power Seats\n\n Sub Model: \n\n1LE',
u'Fuel Type: \n\nGasoline\n\n Color: \n\nWhite',
u'For Sale By: \n\nPrivate Seller\n\n Interior Color: \n\nBlack',
u'Disability Equipped: \n\nNo\n\n Number of Cylinders: \n\n8',
u'']
最終,我想要將數據存儲在下面的字典中。我知道如何創建一本字典,但不知道如何清理需要進入字典的數據,而無需蠻力查找和替換。
{'Condition' : 'Used',
'Seller Notes' : 'Excellent Condition',
'Year': '2015',
'VIN (Vehicle Identification Number)': '2G1FJ1EW2F9192023',
'Mileage': '29,000',
'Transmission': 'Manual',
'Make': 'Chevrolet',
'Body Type': 'Coupe',
'Model': 'Camaro',
'Warranty': 'Vehicle has an existing warranty',
'Trim': 'SS Coupe 2-Door',
'Vehicle Title' : 'Clear',
'Engine': '6.2L 6162CC 376Cu. In. V8 GAS OHV Naturally Aspirated',
'Options': 'Leather Seats',
'Drive Type': 'RWD',
'Safety Features' : 'Anti-Lock Brakes, Driver Airbag, Passenger Airbag, Side Airbags',
'Power Options' : 'Air Conditioning, Cruise Control, Power Locks, Power Windows, Power Seats',
'Sub Model' : '1LE',
'Fuel Type' : 'Gasoline',
'Exterior Color' : 'White',
'For Sale By' : 'Private Seller',
'Interior Color' : 'Black',
'Disability Equipped' : 'No',
'Number of Cylinders': '8'}