2017-06-02 159 views
1

希望任何人都可以幫助我。我對python相當陌生,但我想從一個網站上獲取數據,這個網站不幸需要一個帳戶。雖然我無法提取日期(即2017-06-01)。使用python從html中提取文本

<li class="latest-value-item"> 
    <div class="latest-value-label">Date</div> 
    <div class="latest-value">2017-06-01</div> 
</li> 
<li class="latest-value-item"> 
    <div class="latest-value-label">Index</div> 
    <div class="latest-value">1430</div> 
</li> 

這是我的代碼:

import urllib3 
import urllib.request 
from bs4 import BeautifulSoup 
import pandas as pd 
import requests 
import csv 
from datetime import datetime 

url = 'https://www.quandl.com/data/LLOYDS/BCI-Baltic-Capesize-Index' 
r = requests.get(url) 
soup = BeautifulSoup(r.text, 'lxml') 

Baltic_Indices = [] 
New_Value = [] 

#new = soup.find_all('div', attrs={'class':'latest-value'}).get_text() 
date = soup.find_all(class_="latest value") 
text1 = date.text 

print(text1) 
+0

[使用Python從HTML文件中提取文本]的可能副本(https://stackoverflow.com/questions/328356/extracting-text-from-html-file-using-python) – Umair

回答

2

date = soup.find_all(class_="latest value")

您使用了錯誤的CSS類名('latest value' != 'latest-value'

print(soup.find_all(attrs={'class': 'latest-value'})) 
# [<div class="latest-value">2017-06-01</div>, <div class="latest-value">1430</div>] 

for element in soup.find_all(attrs={'class': 'latest-value'}): 
    print(element.text) 
# 2017-06-01 
# 1430 

我更喜歡使用attrs kwarg但你方法也適用(給定正確的CSS類名稱)

for element in soup.find_all(class_='latest-value'): 
    print(element.text) 
# 2017-06-01 
# 1430