2014-07-23 68 views
1

我正在嘗試使用以下HTML解析網站。在Python中使用BS4解析HTML

我正在使用Python和BeautifulSoup。

如何提取文本德州遊騎兵出此?

我有麻煩,因爲它不在課堂上? 謝謝,

馬特

<div class="team"> 
      <span class="team-logo mlb tex"></span>Texas Rangers 
          <br /> 
       <a class="fancy" href="/split_stats/index/Baseball/Pitcher/107">BvP</a> 
       &middot; 


           <a class="fancy" href="/split_stats/index/Baseball/Righty/107">vs. R/a> 
       &middot; 

       <a class="fancy" href="/split_stats/index/Baseball/Away/107">Away</a> 
       &middot; 

           <a class="fancy" href="/split_stats/index/Baseball/Night/107">Night</a> 

        </div> 

回答

2

未必是最好的解決辦法,但這個工程。

>>> soup = BeautifulSoup(htmlCode) 
>>> soup.div.contents[2].strip() 
u'Texas Rangers' 
0

我會用下面的代碼,我的IPython內運行:

In [28]: htmldoc = """<div class="team"> 
    ....: <span class="team-logo mlb tex"></span>Texas Rangers 
    ....: <br /> 
    ....: <a class="fancy" href="/split_stats/index/Baseball/Pitcher/107">BvP</a> 
    ....: &middot; 
    ....: <a class="fancy" href="/split_stats/index/Baseball/Righty/107">vs. R/a&gt; 
    ....: &middot; 
    ....: </a><a class="fancy" href="/split_stats/index/Baseball/Away/107">Away</a> 
    ....: &middot; 
< ....: <a class="fancy" href="/split_stats/index/Baseball/Night/107">Night</a> 
    ....: </div> 
    ....: """ 

In [30]: soup = BeautifulSoup(htmldoc) 

In [31]: import re 

In [32]: soup(text=re.compile('Texas Rangers')) 
Out[32]: [u'Texas Rangers\n'] 
+0

的美麗湯整點是解析HTML頁面。如果文本不同(「德州遊騎兵」以外的東西),那麼你的代碼將無法工作。 – JRodDynamite