用美麗的湯擺脫單元格的值在表

與HTML從http://coinmarketcap.com/我想要創建從HTML包含值的Python字典工作行，例如：用美麗的湯擺脫單元格的值在表

{比特幣：{Market_cap：」 $一百十二億四千七百四十四萬二千七百二十八' ，體積：‘$六千四百六十六萬八千九百’}，復仇：....等}

我不熟悉的HTML是如何構成如何過。對於像市值一些事情的細胞（TD）鏈接到數據，即：

<td class="no-wrap market-cap text-right" data-usd="11247442728.0" data-btc="15963828.0"> 

         $11,247,442,728 

       </td>

但是對於像交易量的細胞，該值是如此格式的鏈接是不同的，即：

<td class="no-wrap text-right"> 
        <a href="/currencies/bitcoin/#markets" class="volume" data-usd="64668900.0" data-btc="91797.5">$64,668,900</a> 
       </td>

這是我的工作代碼：

import requests 
from bs4 import BeautifulSoup as bs 

request = requests.get('http://coinmarketcap.com/') 

content = request.content 

soup = bs(content, 'html.parser') 

table = soup.findChildren('table')[0] 

rows = table.findChildren('tr') 

for row in rows: 
    cells = row.findChildren('td') 
    for cell in cells: 
     print cell.string

這得到的空白和缺失數據負載的結果。

對於每一行我怎麼能得到硬幣的名字嗎？對於每個單元格，我如何訪問每個值？無論它是一個鏈接（）或常規值

編輯：

通過改變for循環：

for row in rows: 
    cells = row.findChildren('td') 
    for cell in cells: 
     print cell.getText().strip().replace(" ", "")

我能得到我想要的數據，即：

1 
Bitcoin 
$11,254,003,178 
$704.95 
15,964,212 
BTC 
$63,057,100 
-0.11%

但是，我會很酷，有每個單元格的類名稱，即

id: bitcoin 
marketcap: 11,254,003,178 
etc......

來源

2016-11-08 David Hancock

你幾乎沒有。而不是使用cell.string方法，使用cell.getText()。您可能還需要對輸出字符串進行一些清理，以刪除多餘的空白區域。我用正則表達式，但這裏有一些其他的選項，以及取決於您的數據處於什麼狀態，我已經平添了幾分的Python 3兼容性，以及與打印功能。

from __future__ import print_function 
import requests 
import re 

from bs4 import BeautifulSoup as bs 

request = requests.get('http://coinmarketcap.com/') 

content = request.content 

soup = bs(content, 'html.parser') 

table = soup.findChildren('table')[0] 

rows = table.findChildren('tr') 

for row in rows: 
    cells = row.findChildren('td') 
    for cell in cells: 
     cell_content = cell.getText() 
     clean_content = re.sub('\s+', ' ', cell_content).strip() 
     print(clean_content)

表格標題存儲在第一行，這樣你就可以像這樣提取出來：

headers = [x.getText() for x in rows[0].findChildren('th')]

來源

2016-11-08 05:03:13

偉大啊！它非常完美，非常感謝！任何想法如何獲得每個字段的名字嗎？ –

增加了關於如何獲取每個字段名稱（表頭）的信息。 –

用美麗的湯擺脫單元格的值在表

回答

相關問題