2013-10-17 48 views
1

我想要的豐富網頁摘要數據應用到我的網頁,下面http://schema.org/Article標準。其中一個屬性是articleBody,我期望應該包括構成文章的整個文本。如何從豐富的片段元素中排除內容?

不幸的是,該文章的HTML表示會偶爾出現按鈕,廣告和其他提示,其文本不應進入articleBody

例如:

<div itemscope itemtype="http://schema.org/Article"> 
    <div itemtype="articleBody"> 
    <p>1st Paragraph</p> 
    <p>2nd paragraph</p> 
    <a>A few useful links for my users</a> 
    <p>3rd paragraph</p> 
    <div>A few text ads</div> 
    <p>4th paragraph</p> 
    </div> 
</div> 

有沒有辦法排除從文章本身的廣告/鏈接文本?

+0

請注意,您有一個錯誤在你的代碼:'項目類型= 「articleBody」'應該是' itemprop = 「articleBody」'。 – unor

回答

1

不,微觀數據不提供一種方法來排除內容。

articleBodyvalue will be the textContent of the element


醜陋「黑客」將是這個項目的指定幾個articleBody屬性:

<div itemscope itemtype="http://schema.org/Article"> 
    <div itemtype="articleBody"> 
    <p>1st Paragraph</p> 
    <p>2nd paragraph</p> 
    </div> 
    <a>A few useful links for my users</a> 
    <p itemtype="articleBody">3rd paragraph</p> 
    <div>A few text ads</div> 
    <p itemtype="articleBody">4th paragraph</p> 
    </div> 
</div> 

但要注意,Microdata does not define how those values should be interpreted,所以它的消費者。


再醜方法:

複製的信息,包含在meta element

<div itemscope itemtype="http://schema.org/Article"> 
    <div> 
    <p>1st Paragraph</p> 
    <p>2nd paragraph</p> 
    <a>A few useful links for my users</a> 
    <p>3rd paragraph</p> 
    <div>A few text ads</div> 
    <p>4th paragraph</p> 
    </div> 
    <meta itemtype="articleBody" content="1st Paragraph. 2nd paragraph. 3rd paragraph. 4th paragraph." /> 
</div>