2012-10-14 23 views
0

我通過分享了筆記evernotea HTML page生成。 我想獲得本說明的標題&內容,所以下面的代碼:如何使用python獲取筆記內容

import re 
resource = '<!DOCTYPE html>\n<!--[if lt IE 7 ]> <html class="ie6">; <![endif]--><!--[if IE 7 ]> <html class="ie7"> <![endif]--><!--[if IE 8 ]> <html class="ie8"> <![endif]--><!--[if IE 9 ]> <html class="ie9"> <![endif]--><!--[if gt IE 9]> <html>    <![endif]--><!--[if !IE]><!--> <html>   <!--<![endif]--><head><meta name="en:locale" content="en" />\n <meta charset="utf-8" />\n <meta http-equiv="X-UA-Compatible" content="IE=9,chrome=1" />\n <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=0" />\n\n <meta property="og:title" content="python re"/>\n <meta property="og:type" content="article"/>\n  <meta property="og:description" content="a question about python re\n "/>\n  <meta property="og:url" content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d"/>\n <meta property="og:image"\n  content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d/thm/note/396b4a1f-ae9c-40aa-b740-5aa19e301489"/>\n <meta property="og:site_name" content="Evernote"/>\n <meta property="og:created_time" content="1350193749000"/>\n <meta property="og:updated_time" content="1350193786000"/>\n <link rel="Shortcut Icon" href="/favicon.ico" type="image/x-icon" />\n\n <link rel="stylesheet" href="/redesign/global/css/fonts.css" />\n <link rel="stylesheet" href="/redesign/global/css/header.css" />\n\n <link rel="stylesheet" href="/redesign/sharing/css/sharedNote.css" />\n <title>python re</title>\n <link rel="stylesheet" href="/redesign/modules/SharingMenu/SharingMenu.css"><link rel="stylesheet" href="/redesign/modules/LinkUrlDialog/LinkUrlDialog.css"></head><body class="wrapper"><div class="logo-bar">\n  <a href="http://evernote.com/" target="_blank" class="evernote-logo"></a>\n  <a class="save-button save-button-desktop" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n   Save to Evernote</a>\n\n  <div class="switch-account-div">\n   <div class="switch-account-icon"></div>\n   <span class="switch-account-name"></span>\n   <div class="switch-account-arrow"></div>\n   <div class="switch-account-dropdown">\n   <div class="switch-dropdown-arrow"></div>\n   <div class="switch-account-menuitem">\n    Switch Account</div>\n   <div class="switch-account-logout">\n    Sign Out</div>\n   </div>\n  </div>\n\n  </div>\n\n <div id="message-container">\n  <div id="message">\n  <div id="message-checkmark"></div>\n  <span></span>\n  </div>\n </div>\n\n <div id="container-boundingbox" class="wrapper">\n  <div id="container" class="wrapper">\n  <div class="sharing-imagegallery">\n  <div class="SharingMenu"><div class="sharing-menu">\n <div class="share-button-container">\n  <div class="label-container">\n  <span class="label">\n   Share</span>\n  <div class="label-icon facebook-icon">\n  </div>\n  </div>\n  <div class="icon-container"\n   title="Share">\n  <div class="icon">\n  </div>\n  </div>\n </div>\n <div class="menu-bar">\n  <div class="menu-bar-div">\n  <div class="menu-bar-icon facebook-icon"></div>\n  <span class="menu-bar-label">\n   Facebook</span>\n  </div>\n  <div class="menu-bar-div">\n  <div class="menu-bar-icon twitter-icon"></div>\n  <span class="menu-bar-label">\n   Twitter</span>\n  </div>\n  <div class="menu-bar-div">\n  <div class="menu-bar-icon linkedin-icon"></div>\n  <span class="menu-bar-label">\n   LinkedIn</span>\n  </div>\n  <div class="menu-bar-div">\n  <div class="menu-bar-icon link-icon"></div>\n  <span class="menu-bar-label">\n   Link</span>\n  </div>\n </div>\n </div>\n</div></div>\n  <div class="shared-by-mobile">\n  Shared by flowerszhong</div>\n  <div class="shared-by shared-by-desktop">\n  <div class="shared-by-left"></div>\n  Shared by flowerszhong<div class="shared-by-right"></div>\n  </div>\n  <h2 class="note-title">python re</h2>\n  <div class="vtop">\n  <div class="note-updated">\n   <span>\n   Updated Today</span>\n  </div>\n  </div>\n  <div class="divider"></div>\n  <div class="note-content">\n  <div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="ennote">\na question about python re\n<div><br/></div></div></div>\n  <a class="save-button save-button-mobile" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n  Save to Evernote</a>\n <div class="clearfix" style="clear: both;"></div>\n</div>\n </div>\n\n\n <div class="footer">\n  <div>\n   Evernote makes it easy to remember things big and small from your everyday life using your computer, tablet, phone and the web.</div>\n   <div class="footer-logo"></div>\n  </div>\n\n <div class="LinkUrlDialog"><script id="linkUrlDialog" type="text/html">\n <div class="link-url-dialog">\n  <div class="dialog-head">\n  Link to Note</div>\n  <div class="dialog-body">\n  <p>Paste this link into an email or IM to share it.</p>\n  <p>Anyone with the link will be able to view the note.</p>\n  </div>\n  <div class="url-container">\n  <div class="url-title">\n   Note URL:</div>\n  <input type="text" class="url-input" value="{{url}}" readonly>\n  <div class="copy-container">\n   <button type="button" class="copy-button">\n   Copy to Clipboard</button>\n  </div>\n  </div>\n </div>\n </script>\n</div><script src="/redesign/global/js/respond.min.js"></script>\n <script src="/redesign/global/js/require.min.js"></script>\n <script src="/redesign/global/js/config-require.js"></script>\n <script type="text/javascript">\n  define("actionBean", [], function() {return {"shareNoteUri":"/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?shareNote&service=","foodNote":false,"skitchNote":false,"userName":"","switchAccountUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?switch","logoutUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?logout","userStatus":"","images":false,"userLoggedIn":false};});\n </script>\n <!-- Google Analytics -->\n<script type="text/javascript">\nvar _gaq = _gaq || [];\n_gaq.push([\'_setAccount\', \'UA-285778-5\']);\n\n\n _gaq.push([\'_trackPageview\', \'/sh/{noteGuid}/{noteKey}/{suffix}\']);\n \n\n(function() {\n var ga = document.createElement(\'script\'); ga.type = \'text/javascript\'; ga.async = true;\n ga.src = (\'https:\' == document.location.protocol ? \'https://ssl\' : \'http://www\') + \'.google-analytics.com/ga.js\';\n var s = document.getElementsByTagName(\'script\')[0]; s.parentNode.insertBefore(ga, s);\n})();\n</script>\n<!-- End of Google Analytics -->\n<script type="text/javascript">\n  var _gaq = _gaq || [];\n  _gaq.push([\'_setCustomVar\',\n     4,         // Slot 4 - required\n     \'contentClass\',     // Category - required\n     \'\', // Value - required\n     3         // Page-level scope\n    ]);\n\n  _gaq.push([\'_setCustomVar\',\n     5,          // Slot 5 - required\n     \'sourceApplication\',     // Category - required\n     \'\', // Value - required\n     3          // Page-level scope\n    ]);\n  _gaq.push([\'_trackPageview\', \'/singleNote\']);\n </script>\n <script type="text/javascript" src="/redesign/modules/SharingMenu/SharingMenu.js"></script><script type="text/javascript" src="/redesign/modules/LinkUrlDialog/LinkUrlDialog.js"></script><script type="text/javascript" src="/redesign/sharing/SharedNoteViewAction/SharedNoteViewAction.js"></script></body></html>' 
title_pattern = re.compile('(?<=<title>).+(?=</title>)') 
content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)') 
title= re.search(title_pattern,resource) 
content = re.search(content_pattern,resource) 

if title: 
    print title.group() 

if content: 
    print content.group() 
# if __name__=='__main__':main() 

輸出:

蟒蛇重新

爲什麼只有拿到冠軍?以及如何獲取本筆記的內容?

回答

2

你的問題是,內容包含換行符。 .默認情況下不匹配換行符。

因此,你應該使用re.DOTALL

content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)', re.DOTALL) 

使.匹配換行符。然後它工作。

+0

但是(爲了安全起見),您還應該使用惰性量詞,以防萬一有多個可能的匹配。 –

+0

通常我會,但在這種情況下,我認爲該網頁只會包含一個音符(因爲它是音符頁)。 – nneonneo

+0

@nneonneo thx,好的! – flowerszhong

相關問題