2015-09-11 65 views
1
<div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
    <p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3> 
    <p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p> 
    <h3><strong>Personal blogs</strong></h3> 
    <p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p> 
    <h3>Travel</h3> 
    <p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>: A collection of bloggers on travel. Range from personal stories to specific advice on airlines, hotels and places.</p> 
    <div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div> 
    <p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi&#8217;s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p> 
    <!-- Quick Adsense WordPress Plugin: http://quicksense.net/ --> 
    <div style="float:none;margin:5px 0 5px 0;text-align:center;"> 
    <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> 
    <!-- Blog Basics - 300 x 250 --> 
    <ins class="adsbygoogle" 
     style="display:inline-block;width:300px;height:250px" 
     data-ad-client="ca-pub-5556427932737077" 
     data-ad-slot="6553509385"></ins> 
    <script> 
    (adsbygoogle = window.adsbygoogle || []).push({}); 
    </script> 
    </div> 

當我試圖解析所有itemprop這是目前使用Jsoup庫在HTML源代碼中所有itemtype屬性裏面的值,我得到空值。解析微觀數據

下面是示例HTML網頁正文:

<body class="single single-post postid-2334 single-format-standard custom-header header-image header-full-width full-width-content" itemscope="itemscope" itemtype="http://schema.org/WebPage"><div class="site-container"><header class="site-header" role="banner" itemscope="itemscope" itemtype="http://schema.org/WPHeader"><div class="wrap"><div class="title-area"><p class="site-title" itemprop="headline"><a href="http://blogbasics.com/">Blog Basics</a></p><div id="title_image"><a href="http://blogbasics.com/" title="Blog Basics"><img src="http://blogbasics.com/wp-content/uploads/cropped-cropped-Win-1.png" title="Blog Basics" /></a><style>#title { display:none; }</style></div><p class="site-description" itemprop="description">Starting a blog? Learn how to make it amazing.</p></div></div></header><nav class="nav-primary" role="navigation" itemscope="itemscope" itemtype="http://schema.org/SiteNavigationElement"><div class="wrap"><ul id="menu-primary-navigation" class="menu genesis-nav-menu menu-primary"><li id="menu-item-2590" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-home menu-item-2590"><a title="Blog Basics" href="http://blogbasics.com">Home</a></li> 
<li id="menu-item-3187" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3187"><a href="http://blogbasics.com/blog">Blog</a></li> 
<li id="menu-item-3722" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3722"><a href="http://blogbasics.com/welcome">Free Updates</a></li> 
<li id="menu-item-2578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2578"><a title="Blogging Tools" href="http://blogbasics.com/blogging-tools/">Blogging Tools</a></li> 
</ul></div></nav><div class="site-inner"><div class="feature-area widget-area"> 
<div id="spyr_tru_notifybar-2" class="widget notify_bar"><div class="widget-wrap">Starting a blog? Learn how to make it awesome!</div></div> 

<div id="spyr_tru_twocolumn-3" class="widget widget_spyr_tru_twocolumn"><div class="widget-wrap"> 
<div class="column one-half first original"><div align="middle"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" target="_blank"><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0"></a><script data-leadbox="14581e773f72a2:12e927026b46dc" data-url="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" data-config="%7B%7D" type="text/javascript" src="https://curlcentric.leadpages.net/leadbox-910.js"></script></div> 
</div> 
<div class="column one-half last original"><p>Learn how to build a blog that generates traffic, revenue, & popularity in 30 days.</p> 
<p>Just enter your email address in the box below and click "Submit".</p> 
</div> 
<div class="clear"></div> 
</div></div> 
<div id="spyr_tru_subscribesocial-2" class="widget feature-area-bottom tru_subscribe_social"><div class="widget-wrap"> 
<div class="tru_subscribesocial_wrap"> 
    <form action="http://www.aweber.com/scripts/addlead.pl" method="post" target="_blank"> 
     <div class="hidden_fields"><input type="hidden" name="meta_web_form_id" value="276964962" /> 
<input type="hidden" name="meta_split_id" value="" /> 
<input type="hidden" name="listname" value="awlist3567293" /> 
<input type="hidden" name="redirect" value="http://www.aweber.com/thankyou-coi.htm?m=text" id="redirect_f956eccce03104dc62dec5f8c897285e" /> 

<input type="hidden" name="meta_adtracking" value="Blog_Basics" /> 
<input type="hidden" name="meta_message" value="1" /> 
<input type="hidden" name="meta_required" value="email" /> 

<input type="hidden" name="meta_tooltip" value="" /></div> 
     <input type="email" class="default_value" name="email" value="Enter email to get updates" /></span> 
     <input type="submit" value="Submit" /> 
     </form> 
    <div class="social_menu"> 
     <ul id="menu-social" class="menu superfish"> 

      </ul> 
     </div> 
    <div class="clear"></div> 
    </div> 
</div></div> 
</div><div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3> 
<p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p> 
<h3><strong>Personal blogs</strong></h3> 
<p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p> 
<h3>Travel</h3> 
<p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>:  A collection of bloggers on travel.  Range from personal stories to specific advice on airlines, hotels and places.</p> 
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div> 
<p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi&#8217;s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p> 
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ --> 
<div style="float:none;margin:5px 0 5px 0;text-align:center;"> 
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> 
<!-- Blog Basics - 300 x 250 --> 
<ins class="adsbygoogle" 
    style="display:inline-block;width:300px;height:250px" 
    data-ad-client="ca-pub-5556427932737077" 
    data-ad-slot="6553509385"></ins> 
<script> 
(adsbygoogle = window.adsbygoogle || []).push({}); 
</script> 
</div> 

<p><a href="http://www.whygo.com/" target="_blank">Why go network of blogs</a>: Another group of travel bloggers.  Each blogger has their own patch, which range from Portland, which looks a nice city, to Iceland and France.</p> 
<h3>Technical</h3> 
<p><a href="http://techcrunch.com/" target="_blank">Techcrunch</a>:  This is the one to learn all about technology and in particular technology business, technology start-ups and gadgets.  You&#8217;ll usually hear the techie gossip here first.</p> 
<p><a href="http://speckyboy.com/2010/02/25/50-amazing-personal-blog-web-designs/" target="_blank">Speckyboy.com</a>: Great blog on the design of websites.  Good on lists, (usually 50) of well researched examples of good or unusual design.  Gives even the least technical good ideas to discuss with their own designers.</p> 
<h3>On Blogging</h3> 
<p><a href="http://www.trafficgenerationcafe.com/" target="_blank">Traffic Generation Cafe</a>: Ana Hoffman&#8217;s very friendly, very knowledgeable blog on building traffic for your blog.</p> 
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div> 
<p><a href="http://blogbasics.com/blog/" target="_blank">Blog Basics</a>: This website is a blog that is focused on topics like &#8216;how to blog&#8217; and &#8216;how to make money blogging&#8217;.</p> 
<h3>Over to you</h3> 
<p>Which blogs do you like?  Are you writing a blog?  Then tell us about it.</p> 

<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ --> 
<div style="float:none;margin:5px 0 5px 0;text-align:center;"> 
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> 
<!-- Banner --> 
<ins class="adsbygoogle" 
    style="display:inline-block;width:468px;height:60px" 
    data-ad-client="ca-pub-5556427932737077" 
    data-ad-slot="1983708988"></ins> 
<script> 
(adsbygoogle = window.adsbygoogle || []).push({}); 
</script> 
</div> 

<div style="font-size:0px;height:0px;line-height:0px;margin:0;padding:0;clear:both"></div><div style="clear:both;"></div><div id='ois-1' class='ois-design' ><div class="ois-outer ois-8-outer"> 
    <div class="ois-8-call-top"></div> 
    <div class="ois-8-inner ois-inner"> 
     <div class="col-md-7 ois-8-left"> 
      <div class="ois-8-title">Get Exclusive Tips</div> 
      <div class="ois-8-subtitle">Instantly discover how you can start a blog that generates traffic and income when you join the Blog Basics Tribe (It’s Free). Here's your chance. Just type in your email address.</div> 
     </div> <!-- .span7 left side -->  
     <div class="col-md-5 ois-8-right"> 
      <div class="ois-8-img-wrapper"> 
       <img src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /><noscript><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /></noscript> 
      </div> 
      <div class="ois-8-form"> 
       <form action="http://www.aweber.com/scripts/addlead.pl" method="post" id="ois-form-1" data-service="aweber" ><div id="ois-8-email-input-wrapper"> 
    <input type="text" name="email" class="ois-8-email-input ois-email-input ois-form-control" placeholder="Your Email"/> 
</div> 
<div id="ois-8-button-wrapper"> 
    <input type="submit" class="ois-btn ois-8-button" value="Submit"/> 
</div><input type='hidden' name='listname' value='awlist3567293'/> 
<input type='hidden' name='meta_message' value='1'/> 
<input type='hidden' name='redirect' value='http://www.aweber.com/thankyou-coi.htm?m=video&e=example%40example.com&name=Example%20Subscriber&l=awlist3567293'/> 
</form> 
      </div> <!-- #ois-8-form --> 
     </div><!-- .right .col-md-5 right side--> 
     <div style="clear:both"></div> 
    </div> <!-- inner --> 
</div> <!-- outer --></div></div> 
<div class="spyr_sliding_share"> 
    <div class="spyr_sliding_share_text">Share this article</div> 
    <div class="spyr_sliding_share_wrap"> 
      <div class="spyr_sliding_share_button spyr_sb_facebook"> 
       <a href="#" class="icon icon-facebook"><span>Facebook</span></a> 
       <div class="spyr_sb_inner"><div class="fb-like" data-href="http://blogbasics.com/examples-of-blogs/" data-send="false" data-layout="button_count" data-width="100" data-show-faces="false"></div></div> 
       </div> 
      <div class="spyr_sliding_share_button spyr_sb_twitter"> 
       <a href="#" class="icon icon-twitter"><span>Twitter</span></a> 
       <div class="spyr_sb_inner"><a href="https://twitter.com/share" class="twitter-share-button" data-url="http://blogbasics.com/examples-of-blogs/" data-text="Examples of Blogs | Blog Basics" data-via="kbyrdjr">Tweet</a></div> 
       </div> 
      <div class="spyr_sliding_share_button spyr_sb_gplus"> 
       <a href="#" class="icon icon-gplus"><span>Google+</span></a> 
       <div class="spyr_sb_inner"><div class="g-plusone" data-size="medium" data-href="http://blogbasics.com/examples-of-blogs/"></div></div> 
       </div> 
      <div class="spyr_sliding_share_button spyr_sb_pinterest"> 
       <a href="#" class="icon icon-pinterest"><span>Pinterest</span></a> 
       <div class="spyr_sb_inner"><a href="http://pinterest.com/pin/create/button/?url=http://blogbasics.com/examples-of-blogs/&media=http://blogbasics.com/wp-content/uploads/Examples-of-Blogs-550x367.jpg&description=Examples of Blogs" class="pin-it-button" count-layout="horizontal"><img border="0" src="//assets.pinterest.com/images/PinExt.png" title="Pin It" /></a></div> 
       </div> 
      <div class="spyr_sliding_share_button spyr_sb_mail"> 
       <a href="#" class="icon icon-mail"><span>Email a Friend</span></a> 
       <div class="spyr_sb_inner"><a href="mailto:?subject=Examples of Blogs&body=I found value in this and I think you will too.%0A%0AExamples of Blogs: http://blogbasics.com/examples-of-blogs/">Email a Friend</a></div> 
       </div> 
     </div> 
    <div class="clear"></div> 
    </div><footer class="entry-footer"></footer></article><div class="entry-comments" id="comments"><h3>Comments</h3><ol class="comment-list"> 
    <li class="comment even thread-even depth-1" id="comment-261"> 
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments"> 


     <header class="comment-header"> 
      <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person"> 
       <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&#038;d=mm&#038;r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&#038;d=mm&#038;r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name">violy</span> <span class="says">says</span>   </p> 

      <p class="comment-meta"> 
       <time class="comment-time" datetime="2012-01-09T04:42:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-261" class="comment-time-link" itemprop="url">January 9, 2012 at 4:42 am</a></time>   </p> 
     </header> 

     <div class="comment-content" itemprop="commentText"> 

      <p>Hi sir thank you so much for the nice compliment about my blog (Vivi&#8217;s Random Ramblings&#8221;), I&#8217;m blogging for not even 2 months now and it&#8217;s really overwhelming to see this compliment and getting a lot of good feedback  too and traffic which is a real surprise .. thank you so much!! &#8211; violy</p> 
     </div> 

     <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-261' onclick='return addComment.moveForm("comment-261", "261", "respond", "2334")' aria-label='Reply to violy'>Reply</a></div> 

    </article> 
    <ul class="children"> 

    <li class="comment odd alt depth-2" id="comment-262"> 
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments"> 


     <header class="comment-header"> 
      <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person"> 
       <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://blogbasics.com" class="comment-author-link" rel="external nofollow" itemprop="url">Paul Odtaa</a></span> <span class="says">says</span>   </p> 

      <p class="comment-meta"> 
       <time class="comment-time" datetime="2012-01-09T09:44:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-262" class="comment-time-link" itemprop="url">January 9, 2012 at 9:44 am</a></time>   </p> 
     </header> 

     <div class="comment-content" itemprop="commentText"> 

      <p>Hi Violy, </p> 
<p>I really like your blog and your photography is great. </p> 
     </div> 

     <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-262' onclick='return addComment.moveForm("comment-262", "262", "respond", "2334")' aria-label='Reply to Paul Odtaa'>Reply</a></div> 

    </article> 
    </li><!-- #comment-## --> 
</ul><!-- .children --> 
</li><!-- #comment-## --> 

    <li class="comment even thread-odd thread-alt depth-1" id="comment-270"> 
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments"> 


     <header class="comment-header"> 
      <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person"> 
       <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://allisondduncan.com" class="comment-author-link" rel="external nofollow" itemprop="url">Allison Duncan</a></span> <span class="says">says</span>   </p> 

      <p class="comment-meta"> 
       <time class="comment-time" datetime="2012-01-20T21:17:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-270" class="comment-time-link" itemprop="url">January 20, 2012 at 9:17 pm</a></time>   </p> 
     </header> 

     <div class="comment-content" itemprop="commentText"> 

      <p>Hi there,</p> 
<p>Thanks for featuring my blog on your site. It&#8217;s always nice to see your work being appreciated and linked to.</p> 
<p>I look forward to seeing what your site has coming down the pike.</p> 
<p>Thanks for reading!</p> 
<p>Allison</p> 
     </div> 

     <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-270' onclick='return addComment.moveForm("comment-270", "270", "respond", "2334")' aria-label='Reply to Allison Duncan'>Reply</a></div> 

    </article> 
    </li><!-- #comment-## --> 

我使用jsoup庫解析HTML和提取它。我試圖使用下面的代碼:

doc = Jsoup.connect("http://blogbasics.com/examples-of-blogs/").get(); 

      Elements links = doc.select("itemtype > [itemprop]"); 

      for (Element element : links) { 
       System.out.println(" itemprop :"+element.attr("itemprop")); 
      } 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 

但是我得到空值。我對這項工作很感興趣,請讓我知道正確的代碼。如果任何其它方式從HTML提取itemtypeitemprop請分享這將是很大的幫助。

<div class="content-sidebar-wrap"> 
<main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" 
itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish 
format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" 
itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"> 
<h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" 
itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | 
Go from 0 to 5,000 blog subscribers in 60 days 
<a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a> 
</p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" 
alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" 
itemprop="text"><h3>Overview</h3><p>This article includes examples of blogs 
from various niches. There are millions of example blogs out there in all 
different shapes and sizes. A good place to start is 
</p> 

預期輸出

itemtype="http://schema.org/Blog"> 
itemprop="mainContentOfPage" 

itemtype="http://schema.org/BlogPosting" 
itemprop="blogPost" 

itemtype="http://schema.org/Person" 
itemprop="author" 
itemprop="name"> 
itemprop="text" 
+0

爲什麼我的代碼不工作是<主類=「內容」角色=「主」 itemprop =「mainContentOfPage」的itemscope =「的itemscope」項目類型=「http://schema.org/Blog」>是不是裏面一個項目類型都項目類型和itemprop在同一節點存在的內部沒有.i'm在jsoup我會刪除答案。有人會幫你 –

+0

非常感謝快速蝸牛烏拉圭回合的幫助並不好。 –

+0

您的問題包含大量似乎無關緊要的內容。如果您想將您嘗試解析的HTML濃縮爲一個最簡單的示例,它將增加獲得響應的機會。 – luksch

回答

1

我不清楚自己真正想要的,但似乎你需要獲得包含屬性itemtype與屬性itemprop或元素只包含在一起的所有元素itemprop但包含itemtype元素的直接子。如果是這樣的話,那麼你可以使用這個:

String html = "" 
     +"<div class=\"content-sidebar-wrap\">" 
     +"<main class=\"content\" role=\"main\" itemprop=\"mainContentOfPage\" itemscope=\"itemscope\" " 
     +"itemtype=\"http://schema.org/Blog\"><article class=\"post-2334 post type-post status-publish " 
     +"format-standard has-post-thumbnail category-blog-basics entry\" itemscope=\"itemscope\" " 
     +"itemtype=\"http://schema.org/BlogPosting\" itemprop=\"blogPost\"><header class=\"entry-header\">" 
     +"<h1 class=\"entry-title\" itemprop=\"headline\">Examples of Blogs</h1> " 
     +"<p class=\"entry-meta\">by <span class=\"entry-author\" itemprop=\"author\" itemscope=\"itemscope\" " 
     +"itemtype=\"http://schema.org/Person\"><span class=\"entry-author-name\" itemprop=\"name\">Kenneth Byrd</span></span> |" 
     +" Go from 0 to 5,000 blog subscribers in 60 days" 
     +" <a href=\"https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/\" rel=\"nofollow\">(Click Here)</a>" 
     +" </p></header><img src=\"http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg\" width=\"5315\" height=\"3543\" " 
     +" alt=\"examples of blogs\" title=\"\" class=\"attachment-tru-post wp-post-image\" /><div class=\"entry-content\"" 
     +" itemprop=\"text\"><h3>Overview</h3><p>This article includes examples of blogs" 
     +" from various niches. There are millions of example blogs out there in all " 
     +" different shapes and sizes. A good place to start is " 
     +" </p>" 
     ; 

Document doc = Jsoup.parse(html,""); 

Elements els = doc.select("*[itemtype][itemprop], *[itemtype] > *[itemprop]"); 
for (Element el:els){ 

    System.out.print(el.attr("itemtype").isEmpty()?"":("\n" +el.attr("itemtype")+"\n")); 
    System.out.println(el.attr("itemprop")); 
} 

的重要組成部分,是​​*[itemtype][itemprop], *[itemtype] > *[itemprop]這主要有兩個部分:

  1. *[itemtype][itemprop]選擇具有兩種屬性的元素。

  2. *[itemtype] > *[itemprop]選擇具有屬性itemprop的元素,它們是具有屬性itemtype的元素的直接子元素。如果你想允許所有的孩子,不僅直接的人然後就離開了>

選擇器之間的逗號爲「OR」,因此與任何列出的選擇器匹配的所有元素都將被返回。

+0

正是我在找同樣的東西。其實想從HTML extarcting Schema.org的微數據metdata的,所以我需要項目類型和相應的itemprop基於這樣我試圖解析itemprop的內容。它像鍵值對。例如像這個formate itemtype [itemprop:value]。最後,我可以從代碼中獲取所有itemtype和相應的itemprop,非常感謝。 –