2017-03-07 44 views
1

,我會說,我開始很新的Python。我一直在建立一個不和諧的機器人與discord.py和美麗的湯4,這裏就是我在:在BS4使用find_all獲得文本列表

@commands.command(hidden=True) 
async def roster(self): 
    """Gets a list of CD's members""" 
    url = "http://www.clandestine.pw/roster.html" 
    async with aiohttp.get(url) as response: 
     soupObject = BeautifulSoup(await response.text(), "html.parser") 
    try: 
     text = soupObject.find_all("font", attrs={'size': '4'}) 
     await self.bot.say(text) 
    except: 
     await self.bot.say("Not found!") 

這裏的輸出: http://puu.sh/uycBF/1efe173437.png

現在,我已經在使用get_text()嘗試多種不同的方式來剝離括號和HTML標記從該代碼,但它每次引發錯誤。我將如何能夠既實現這一目標或輸出這個數據到一個數組或列表,然後只打印純文本?

+0

您使用的是哪一種Python和美麗的湯的版本?我假設它是> = python 3.5給定異步等待語法 –

回答

0

您正在返回的Tags從BeautifulSoup列表,你seing括號內是從列表中的對象。

要麼返回它們作爲一個字符串列表:

text = [Member.get_text().encode("utf-8").strip() for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")] 

或者一個字符串:

text = ",".join([Member.get_text().encode("utf-8") for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")]) 
0

更換

text = soupObject.find_all("font", attrs={'size': '4'}) 

與此:

all_font_tags = soupObject.find_all("font", attrs={'size': '4'}) 
list_of_inner_text = [x.text for x in all_font_tags] 
# If you want to print the text as a comma separated string 
text = ', '.join(list_of_inner_text)