2016-07-27 80 views
3

我的問題是對here之一的後續問題。用Python刮雅虎財務資產負債表

功能:

periodic_figure_values()

似乎除了在被搜索線項目的名稱的情況下正常工作出現了兩次。我所指的具體情況是試圖獲取「長期債務」的數據。上述鏈接中的功能將返回以下錯誤:

Traceback (most recent call last): 
    File "test.py", line 31, in <module> 
    LongTermDebt=(periodic_figure_values(soup, "Long Term Debt")) 
    File "test.py", line 21, in periodic_figure_values 
    value = int(str_value) 
ValueError: invalid literal for int() with base 10: 'Short/Current Long Term Debt' 

因爲它似乎被絆倒在「短期/當前長期債務」上。你看,該頁面既有「短期/當前長期債務」也有「長期債務」。您可以使用Apple的資產負債表here查看源頁面的示例。

我試圖找到一種方法來爲「長期債務」返回數據的功能,而不會被「短期/當前長期債務」絆住。

下面是函數和取「現金及現金等價物」,它工作得很好的例子,「長期借款」,這不工作:

import requests, bs4, re 

def periodic_figure_values(soup, yahoo_figure): 
    values = [] 
    pattern = re.compile(yahoo_figure) 
    title = soup.find("strong", text=pattern) # works for the figures printed in bold 
    if title: 
     row = title.parent.parent 
    else: 
     title = soup.find("td", text=pattern) # works for any other available figure 
     if title: 
      row = title.parent 
     else: 
      sys.exit("Invalid figure '" + yahoo_figure + "' passed.") 
    cells = row.find_all("td")[1:] # exclude the <td> with figure name 
    for cell in cells: 
     if cell.text.strip() != yahoo_figure: # needed because some figures are indented 
      str_value = cell.text.strip().replace(",", "").replace("(", "-").replace(")", "") 
      if str_value == "-": 
       str_value = 0 
      value = int(str_value) 
      values.append(value) 
    return values 

res = requests.get('https://ca.finance.yahoo.com/q/bs?s=AAPL') 
res.raise_for_status 
soup = bs4.BeautifulSoup(res.text, 'html.parser') 
Cash=(periodic_figure_values(soup, "Cash And Cash Equivalents")) 
print(Cash) 
LongTermDebt=(periodic_figure_values(soup, "Long Term Debt")) 
print(LongTermDebt) 

回答

0

你可以改變的功能,使它接受一個正則表達式而不是一個普通的字符串。然後,您可以搜索^Long Term Debt以確保在此之前沒有文字。所有你需要做的是改變

if cell.text.strip() != yahoo_figure: 

if not re.match(yahoo_figure, cell.text.strip()): 
1

最簡單的是使用募集ValueError使用try/except組合:

import requests, bs4, re 

def periodic_figure_values(soup, yahoo_figure): 
    values = [] 
    pattern = re.compile(yahoo_figure) 
    title = soup.find("strong", text=pattern) # works for the figures printed in bold 
    if title: 
     row = title.parent.parent 
    else: 
     title = soup.find("td", text=pattern) # works for any other available figure 
     if title: 
      row = title.parent 
     else: 
      sys.exit("Invalid figure '" + yahoo_figure + "' passed.") 
    cells = row.find_all("td")[1:] # exclude the <td> with figure name 
    for cell in cells: 
     if cell.text.strip() != yahoo_figure: # needed because some figures are indented 
      str_value = cell.text.strip().replace(",", "").replace("(", "-").replace(")", "") 
      if str_value == "-": 
       str_value = 0 
### from here 
      try: 
       value = int(str_value) 
       values.append(value) 
      except ValueError: 
       continue 
### to here 
    return values 

res = requests.get('https://ca.finance.yahoo.com/q/bs?s=AAPL') 
res.raise_for_status 
soup = bs4.BeautifulSoup(res.text, 'html.parser') 
Cash=(periodic_figure_values(soup, "Cash And Cash Equivalents")) 
print(Cash) 
LongTermDebt=(periodic_figure_values(soup, "Long Term Debt")) 
print(LongTermDebt) 

這一個打印出你的數字相當好。
請注意,在這種情況下,您並不需要re模塊,因爲您僅檢查文字(沒有通配符,無邊界)等。

+0

我試圖找到函數返回「長期債務」的數據而沒有被「短期/當前長期債務」絆住。 – Mike8