如何從td美麗的湯中刪除跨度Python 3.5

我刮雅虎財經網站獲取公司股票數據，我用美麗的湯提取td標籤，但我想刪除span標籤，無法做到這一點。以下是我需要提取文本的html代碼的幾行代碼。如何從td美麗的湯中刪除跨度Python 3.5

[ < td class = "Py(10px) Ta(start)" 
data - reactid = "53" > < span data - reactid = "54" > 31 - Jul - 2017 < /span></td > , < td class = "Py(10px)" 
data - reactid = "55" > < span data - reactid = "56" > 991.90 < /span></td > , < td class = "Py(10px)" 
data - reactid = "57" > < span data - reactid = "58" > 1, 021.70 < /span></td > , < td class = "Py(10px)" 
data - reactid = "59" > < span data - reactid = "60" > 986.75 < /span></td > , < td class = "Py(10px)" 
data - reactid = "61" > < span data - reactid = "62" > 1, 011.20 < /span></td >

]

我下面的代碼給了我上面的內容。

INFY = url.urlopen("https://in.finance.yahoo.com/quote/INFY.NS/history?p=INFY.NS") 
INFYHis = INFY.read() 
INFYSoup = soup(INFYHis,'html.parser') 
INFYtd=INFYSoup.findAll("td",{"class":"Py(10px)"})

我對python非常陌生，不確定如何獲取刪除或獲取我的分析文本。

來源

2017-07-31 Keerthesh Kumar

那麼你想刪除它或獲取文本？ –

是的，我需要得到的文本，並以數據框的形式，以便我可以使用它作爲熊貓datafrome –

您可以使用BeautifulSoup的unwrap()方法。

INFYSoup = soup(INFYHis,'html.parser') 

for match in INFYSoup.find_all('span'): # add these two extra two lines 
    match.unwrap()      # to filter the `<span>` tag content first 

# then proceed as usual 
INFYtd=INFYSoup.findAll("td",{"class":"Py(10px)"}) 

for child in INFYtd: 
    print child

演示：

<td class="Py(10px) Ta(start)" data-reactid="53">31-Jul-2017</td> 
<td class="Py(10px)" data-reactid="55">991.90</td> 
... 
... 
<td class="Py(10px)" data-reactid="1540">992.59</td> 
<td class="Py(10px)" data-reactid="1542">30,89,588</td>

實現了基於重複的答案鏈接

提取Py(10px)上課前只需添加這兩種額外的兩行從INFYSoup內容<span>標籤內容過濾器在評論中（Removing span tags from soup BeautifulSoup/Python）。

來源

2017-07-31 22:18:16 davedwards

謝謝我試過你的代碼，並刪除，但我得到了另一個代碼，我用它來工作。 –

@KeertheshKumar，很高興聽到你讓它工作！做得好！ – davedwards

如何從td美麗的湯中刪除跨度Python 3.5

回答

相關問題