我試圖創建一個函數,它接受多個參數,並返回一個可調用的lambda函數。我將這些lambda函數傳遞到BeautifulSoup的find_all
方法中以解析html。返回動態創建函數
這裏是我寫生成lambda函數功能:
def tag_filter_function(self, name="", search_terms={}, attrs=[], **kwargs):
# filter attrs that are in the search_terms keys out of attrs
attrs = [attr for attr in attrs if attr not in search_terms.keys()]
# array of strings to compile into a lambda function
exec_strings = []
# add name search into exec_strings
if len(name) > 0:
tag_search_name = "tag.name == \"{}\"".format(name)
exec_strings.append(tag_search_name)
# add generic search terms into exec_strings
if len(search_terms) > 0:
tag_search_terms = ' and '.join(["tag.has_attr(\"{}\") and tag[\"{}\"] == \"{}\"".format(k, k, v) for k, v in search_terms.items()])
exec_strings.append(tag_search_terms)
# add generic has_attr calls into exec_strings
if len(attrs) > 0:
tag_search_attrs = ' and '.join(["tag.has_attr(\"{}\")".format(item) for item in attrs])
exec_strings.append(tag_search_attrs)
# function string
exec_string = "lambda tag: " + " and ".join(exec_strings)
return exec(compile(exec_string, '<string>', 'exec'))
從調用
tag_filter_function(name="article", search_terms={"id" : "article"})
的函數返回的字符串是
lambda tag: tag.name == "article" and tag.has_attr("id") and tag["id"] == "article"
函數的返回值是None
。我不確信exec()
函數是我想在這裏使用的,但我真的不確定。將該字符串轉換爲可執行的lambda函數是可能的,如果是這樣的話?不知道我是否以正確的方式開展這項工作。
如果你使用'標籤上has_attr',你不應該找'tag.attr '而不是'tag [attr]'? –