使多個if語句不詳細

我颳了一個網頁，它沒有在其html標記中使用任何有用的類或ID，所以我不得不廢除所有鏈接並查找鏈接中的模式。下面是一個簡單的HTML可能看起來怎麼樣使多個if語句不詳細

<span>Category</span><link href='example.com/link-about-a'>A</a>

在另一頁，我們可能有不同的類別

<span>Category</span><link href='example.com/link-about-b'>B</a>

使用beautifulsoup4，我目前的解決方案是這樣的

def category(soup): 
    for x in soup.find_all('a'): 
     if 'link-about-a' in x['href']: 
      return 'A' 
     if 'link-about-b' in x['href']: 
      return 'B'

等對..但這是非常醜陋的。

我想知道是否有一種方法可以減少冗長。

喜歡使用字典

categories = {'A': 'link-about-a', 'B': 'link-about-b'}

和它減少到單一表達。

來源

2014-01-13 yayu

鏈接中的模式有多可預測？如果子串匹配是找到模式的唯一方法，Eric的解決方案是很好的。我個人可能會使用元組的元組而不是字典來作爲我只是作爲鍵/值對迭代的東西，但這是一個微不足道的差異。但是，如果您可以可靠地使用類似於正則表達式的方式提取模式，那麼使用將該模式映射到類別的字典是最好的方法。 –

@PeterDeGlopper該模式是可預測的，並從預定義的類別列表（A，B，C ...），所以你是對的，我發現正則表達式的實現更有用。謝謝。 – yayu

所有你需要的是另一種循環：

for x in soup.find_all('a'): 
    for k, v in categories.iteritems(): 
     if v in x['href']: 
      return k

但如果你想有一個單一的表達：

category = next((
    k for x in soup.find_all('a') 
     for k, v in categories.iteritems() 
     if v in x['href'] 
), None)

來源

2014-01-13 02:15:53 Eric

它可能是一個更靈活一點使用正則表達式和類別的列表：

categories = [[re.compile('link-about-a'), 'A'], 
       [re.compile('link-about-b'), 'B']] 

def category(soup): 
    for x in soup.findAll('a'): 
     for expression, description in categories: 
      if expression.search(x['href']): 
       return description 
    else: 
     return None

來源

2014-01-13 03:21:43 chthonicdaemon

使多個if語句不詳細

回答

相關問題