2015-10-14 53 views
0

鑑於this生電子郵件:使用Python從電子郵件正文中提取URL?

[('96 (RFC822 {17888}', 
'Delivered-To: [email protected]\r\nReceived: by 10.182.129.229 with SMTP id nz5csp2388417obb;\r\n  Tue, 13 Oct 2015 14:57:14 -0700 (PDT)\r\nX-Received: by 10.68.136.103 with SMTP id pz7mr5507255pbb.114.1444773434163;\r\n  Tue, 13 Oct 2015 14:57:14 -0700 (PDT)\r\nReturn-Path: <[email protected]itter.com>\r\nReceived: from spruce-goose-bc.twitter.com (spruce-goose-bc.twitter.com. [199.59.150.98])\r\n  by mx.google.com with ESMTPS id xm2si7949727pbb.66.2015.10.13.14.57.13\r\n  for <[email protected]>\r\n  (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\r\n  Tue, 13 Oct 2015 14:57:14 -0700 (PDT)\r\nReceived-SPF: pass (google.com: domain of [email protected]itter.com designates 199.59.150.98 as permitted sender) client-ip=199.59.150.98;\r\nAuthentication-Results: mx.google.com;\r\n  spf=pass (google.com: domain of [email protected]itter.com designates 199.59.150.98 as permitted sender) smtp.[email protected]bounce.twitter.com;\r\n  dkim=pass [email protected];\r\n  dmarc=pass (p=REJECT dis=NONE) header.from=twitter.com\r\nDKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=twitter.com;\r\n\ts=dkim-201406; t=1444773433;\r\n\tbh=WBJ/04fcxapn9W2moQ6bGL1p7salO/SDhe2f3COz1us=;\r\n\th=Date:From:To:Subject:MIME-Version:Content-Type:Message-ID;\r\n\tb=tvyrM/Sz+g0WemkLWTYoarsftOM0Y4jQAWCNdqRm6W+5kBG43CP2q6woxrtDqgYHg\r\n\t o/zPvMa5nIPjoOfslv0YCUlhfuVjr0V/6InNMl65s3/zGRMlCQxQjS+UGsQrF2zH6Z\r\n\t G7pWHMTml1NxI2r77nuOhSyhknNFCA9pl0SkeNfoyK8jcIo6rNS2uugFBw5Ta/fS8i\r\n\t RMXcNpLA35k4Znvboe2aiZQg7ZY6NjbtNT3X6Ln4xuAgLkjeS/BfDBvd6M8CZ8yIT8\r\n\t 7xStI8xTfT/zKqcK+35yqnAqQ3QD5oll/DWxQatFUIYzLsgw2DV39XRo11y6OTdDim\r\n\t KNS2DTEjaOsBg==\r\nX-MSFBL: eyJ1IjoiaW5nbGVzbWFuYWd1YUBnbWFpbC5jb21AMTRAMzgxNjkwOTc5M0AwQDJj\r\n\tMjQ4NDVjZTJjOGMyNjI0NDMxY2MzZDBlOGY3NTZhNDVjNGI4MzQiLCJnIjoiRXZl\r\n\tcnl0aGluZyIsImIiOiJzbWYxLWJkcC0yMy1zcjEtRXZlcnl0aGluZy4xOTgiLCJy\r\n\tIjoiaW5nbGVzbWFuYWd1YUBnbWFpbC5jb20ifQ==\r\nDate: Tue, 13 Oct 2015 21:57:13 +0000\r\nFrom: Twitter <[email protected]>\r\nTo: example <[email protected]>\r\nSubject: Confirm your Twitter account, example\r\nMIME-Version: 1.0\r\nContent-Type: multipart/alternative; \r\n\tboundary="----=_Part_44683898_1221426234.1444773433942"\r\nFeedback-ID: 16481b2a2bd9895bc6fbf92980687bb5fdd96d63782c26cd:16481b2a2bd9895bc6fbf92980687bb5fdd96d63782c26cd:none:twitterESP\r\nMessage-ID: <[email protected]>\r\n\r\n------=_Part_44683898_1221426234.1444773433942\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding: 7bit\r\n\r\nexample,\r\n\r\nConfirm your email address to complete your Twitter account. It\'s easy - just click on the button below.\r\n\r\nClick on the link below or copy and paste it into a browser:\r\n\r\nhttps://twitter.com/i/redirect?url=https%3A%2F%2Ftwitter.com%2Faccount%2Fconfirm_user_email%2F3816909793%2F9CE5D-H4F5D-144477%3Ft%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3Da6878f323b83b61ceb5eaa8fbdb2214d25fc65e7%26al%3D1%26iid%3D9df2edd3ab1d4c49a5c9ac3a0569baab%26ac%3D1%26autoactions%3D1444773433%26uid%3D3816909793%26nid%3D14%2B309&amp;t=1&amp;cn=ZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&amp;sig=2b56e3a59dd6b182afaf3a0030a96b26ccc67d73&amp;iid=9df2edd3ab1d4c49a5c9ac3a0569baab&amp;uid=3816909793&amp;nid=14+309\r\n------=_Part_44683898_1221426234.1444773433942\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/htm=\r\nl4/strict.dtd">\r\n<html>\r\n<head>\r\n<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8" />\r\n<meta name=3D"viewport" content=3D"width=3Ddevice-width, minimum-scale=3D1.=\r\n0, maximum-scale=3D1.0, user-scalable=3D0" />\r\n<meta name=3D"apple-mobile-web-app-capable" content=3D"yes" />\r\n<style type=3D"text/css">\r\n\r\[email protected] only screen and (max-device-width: 420px) {\r\ntd[class=3D"spacer"]{\r\nfont-size:4px !important;\r\n\r\n}\r\n\r\nspan[class=3D"address"] a {\r\n\r\nline-height:18px !important;\r\n}\r\n\r\n\r\ntd[class=3D"margins"]{\r\nwidth:18px !important;\r\n}\r\ntd[class=3D"logo_space"]{\r\nheight:12px !important;\r\n}\r\n}\r\n\r\[email protected] only screen and (max-device-width: 480px) {\r\n\r\ntable[class=3D"collapse"]{\r\nwidth:100% !important;\r\n}\r\n\r\ndiv[class=3D"collapse"]{\r\nwidth:100% !important;\r\n}\r\n\r\n\r\ntd[class=3D"body_text"] {\r\nfont-size:14px !important;\r\nline-height:22px !important;\r\n\r\n\r\n}\r\n\r\ntd[class=3D"greeting"]{\r\nfont-size:14px !important;\r\n\r\n}\r\n\r\n\r\ntd[class=3D"v_space"]{\r\nheight:8px !important;\r\n\r\n}\r\n\r\n\r\nspan[class=3D"address"]{\r\ndisplay:block !important;\r\nwidth:240px !important;\r\n}\r\ntd[class=3D"cut"]{\r\ndisplay:none !important;\r\n}\r\n\r\n}\r\n</style>\r\n</head>\r\n<body bgcolor=3D"#e1e8ed" style=3D"margin:0;padding:0;-webkit-text-size-adj=\r\nust:100%;-ms-text-size-adjust:100%;">\r\n<table cellpadding=3D"0" cellspacing=3D"0" border=3D"0" width=3D"100%" bgco=\r\nlor=3D"#e1e8ed" style=3D"background-color:#e1e8ed;padding:0;margin:0;line-h=\r\neight:1px;font-size:1px;" class=3D"body_wrapper">\r\n<tbody>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;">\r\n<table class=3D"collapse" id=3D"header" align=3D"center" width=3D"500" styl=\r\ne=3D"width: 500px;padding:0;margin:0;line-height:1px;font-size:1px;" bgcolo=\r\nr=3D"#ffffff" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr>\r\n<td style=3D"min-width: 500px;height:1px;padding:0;margin:0;line-height:1px=\r\n;font-size:1px;" class=3D"cut"> <img src=3D"https://ea.twimg.com/email/self=\r\n_serve/media/spacer-1402696023930.png" style=3D"min-width: 500px;height:1px=\r\n;margin:0;padding:0;display:block;-ms-interpolation-mode:bicubic;border:non=\r\ne;outline:none;" /> </td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n</tr>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;">\r\n<!--///////////////////header///////////////////////////-->\r\n<table class=3D"collapse" id=3D"header" align=3D"center" width=3D"500" styl=\r\ne=3D"width:500px;background-color:#ffffff;padding:0;margin:0;line-height:1p=\r\nx;font-size:1px;" bgcolor=3D"#ffffff" cellpadding=3D"0" cellspacing=3D"0" b=\r\norder=3D"0">\r\n<tbody>\r\n<tr>\r\n<td height=3D"15" style=3D"height:15px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;" class=3D"logo_space"> &nbsp; </td>\r\n</tr>\r\n<tr>\r\n<td style=3D"padding:0;margin:0;line-height:1px;font-size:1px;">\r\n<table cellpadding=3D"0" cellspacing=3D"0" border=3D"0" width=3D"100%" styl=\r\ne=3D"width:100%;padding:0;margin:0;line-height:1px;font-size:1px;">\r\n<tbody>\r\n<tr>\r\n<td align=3D"left" width=3D"15" style=3D"width:15px;padding:0;margin:0;line=\r\n-height:1px;font-size:1px;"></td>\r\n<td align=3D"left" width=3D"28" style=3D"padding:0;margin:0;line-height:1px=\r\n;font-size:1px;"> <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A%=\r\n2F%2Ftwitter.com%3Fcn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26refsrc%3Demail&a=\r\nmp;t=3D1&amp;cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&amp;sig=3Dfe1cdb1344cee3=\r\nb9db0674bd2ce2f22397f739d7&amp;iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&amp;u=\r\nid=3D3816909793&amp;nid=3D14+21" style=3D"text-decoration:none;border-style=\r\n:none;border:0;padding:0;margin:0;"><img align=3D"left" width=3D"28" src=3D=\r\n"https://ea.twimg.com/email/self_serve/media/logo-1400528502322.png" style=\r\n=3D"width:28px;padding-bottom:2px;margin:0;padding:0;display:block;-ms-inte=\r\nrpolation-mode:bicubic;border:none;outline:none;" /></a> </td>\r\n<td align=3D"left" width=3D"10" style=3D"width:10px;padding:0;margin:0;line=\r\n-height:1px;font-size:1px;"></td>\r\n<td align=3D"left" class=3D"greeting" style=3D"padding:0;margin:0;line-heig=\r\nht:1px;font-size:1px;font-family:\'Helvetica Neue Light\', Helvetica, Arial, =\r\nsans-serif;-webkit-font-smoothing:antialiased;-webkit-text-size-adjust:none=\r\n;color:#66757f;font-size:16px;padding:0px;margin:0px;font-weight:300;line-h=\r\neight:100%;text-align:left;"> example, </td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"14" style=3D"height:14px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;" class=3D"logo_space"> &nbsp; </td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<!--////////////////////border//////////////////////////-->\r\n<table class=3D"collapse" align=3D"center" width=3D"500" style=3D"width:500=\r\npx;background-color:#ffffff;padding:0;margin:0;line-height:1px;font-size:1p=\r\nx;" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr id=3D"border">\r\n<td colspan=3D"2" height=3D"1" style=3D"line-height:1px;display:block;heigh=\r\nt:1px;background-color:#e1e8ed;padding:0;margin:0;line-height:1px;font-size=\r\n:1px;"></td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<!--//////////////////////////////////////////////-->\r\n<table class=3D"collapse" align=3D"center" width=3D"500" style=3D"width:500=\r\npx;background-color:#ffffff;padding:0;margin:0;line-height:1px;font-size:1p=\r\nx;" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr>\r\n<td width=3D"50" style=3D"width:50px;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;" class=3D"margins"></td>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;">\r\n<table width=3D"100%" align=3D"center" cellpadding=3D"0" cellspacing=3D"0" =\r\nborder=3D"0" class=3D"collapse" style=3D"padding:0;margin:0;line-height:1px=\r\n;font-size:1px;">\r\n<tbody>\r\n<tr>\r\n<td height=3D"30" style=3D"height:30px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"left" style=3D"padding:0;margin:0;line-height:1px;font-size:1p=\r\nx;"> <span class=3D"headline_1" style=3D"font-family:\'Helvetica Neue Light\'=\r\n, Helvetica, Arial, sans-serif;-webkit-font-smoothing:antialiased;-webkit-t=\r\next-size-adjust:none;color:#66757f;font-size:28px;padding:0px;margin:0px;fo=\r\nnt-weight:300;line-height:100%;text-align:left;">Final step...</span> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"12" style=3D"height:12px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;" class=3D"v_space"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"left" class=3D"body_text" style=3D"padding:0;margin:0;line-hei=\r\nght:1px;font-size:1px;font-family:\'Helvetica Neue Light\', Helvetica, Arial,=\r\n sans-serif;-webkit-font-smoothing:antialiased;-webkit-text-size-adjust:non=\r\ne;color:#66757f;font-size:16px;padding:0px;margin:0px;font-weight:300;line-=\r\nheight:23px;text-align:left;"> Confirm your email address to complete your =\r\nTwitter account. It\'s easy =E2=80=94 just click on the button below. </td>\r\n</tr>\r\n<!--*********** button ************-->\r\n<tr>\r\n<td height=3D"22" style=3D"height:22px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"left" class=3D"button" style=3D"padding:0;margin:0;line-height=\r\n:1px;font-size:1px;">\r\n<table bgcolor=3D"#55acee" height=3D"40" border=3D"0" cellspacing=3D"0" cel=\r\nlpadding=3D"0" align=3D"left" style=3D"white-space:nowrap;border-radius:5px=\r\n;border-style:none;text-align:center;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;">\r\n<tbody>\r\n<tr>\r\n<td class=3D"spacer" width=3D"30" style=3D"font-size:1px;font-size:1px;line=\r\n-height:1px;font-size:1px;padding:0;margin:0;line-height:1px;font-size:1px;=\r\n">&nbsp;</td>\r\n<td height=3D"40" align=3D"center" style=3D"padding:0;margin:0;line-height:=\r\n1px;font-size:1px;"> <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%=\r\n3A%2F%2Ftwitter.com%2Faccount%2Fconfirm_user_email%2F3816909793%2F9CE5D-H4F=\r\n5D-144477%3Ft%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3D69386bec1=\r\n102903b8e56a388d035a97f9d8e69f9%26al%3D1%26iid%3D9df2edd3ab1d4c49a5c9ac3a05=\r\n69baab%26ac%3D1%26autoactions%3D1444773433%26uid%3D3816909793%26nid%3D14%2B=\r\n308&amp;t=3D1&amp;cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&amp;sig=3D256cbf355=\r\n6df8db1580c37c1e032d1178f4d23a3&amp;iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&=\r\namp;uid=3D3816909793&amp;nid=3D14+308" style=3D"border-style:none;text-deco=\r\nration:none;color:#ffffff;-webkit-font-smoothing: antialiased;font-size:14p=\r\nx;letter-spacing:0.02em;font-weight:bold;white-space:nowrap;overflow:hidden=\r\n;padding:0px;margin:0px;font-family:\'Helvetica Neue\', Helvetica, Arial, san=\r\ns-serif;line-height:14px;text-decoration:none;border-style:none;border:0;pa=\r\ndding:0;margin:0;"> <span class=3D"" style=3D"border-style:none;text-decora=\r\ntion:none;color:#ffffff;line-height:100%">Confirm now</span> </a> </td>\r\n<td class=3D"spacer" width=3D"30" style=3D"font-size:1px;font-size:1px;line=\r\n-height:1px;font-size:1px;padding:0;margin:0;line-height:1px;font-size:1px;=\r\n">&nbsp;</td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n</tr>\r\n<!--*********** end button ************-->\r\n<tr>\r\n<td height=3D"44" style=3D"height:44px;padding:0;margin:0;line-height:1px;f=\r\nont-size:1px;"></td>\r\n</tr>\r\n</tbody>\r\n</table> </td>\r\n<td width=3D"50" style=3D"width:50px;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;" class=3D"margins"></td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<!--//////////////////////////////////////////////-->\r\n<table class=3D"collapse" id=3D"footer" align=3D"center" width=3D"500" styl=\r\ne=3D"width:500px;background-color:#ffffff;padding:0;margin:0;line-height:1p=\r\nx;font-size:1px;" cellpadding=3D"0" cellspacing=3D"0" border=3D"0">\r\n<tbody>\r\n<tr>\r\n<td height=3D"1" style=3D"line-height:1px;display:block;height:1px;backgrou=\r\nnd-color:#e1e8ed;padding:0;margin:0;line-height:1px;font-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td height=3D"20" style=3D"height:20;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;"> <span class=3D"footer_type" style=3D"font-family:\'Helvetica Neue Lig=\r\nht\', Helvetica, Arial, sans-serif;-webkit-font-smoothing:antialiased;color:=\r\n#8899a6;font-size:12px;padding:0px;margin:0px;font-weight:normal;line-heigh=\r\nt:12px;"> <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A%2F%2Ftwi=\r\ntter.com%2Fi%2Fredirect%3Furl%3Dhttps%253A%252F%252Ftwitter.com%252Fsetting=\r\ns%252Fnotifications%253Fcn%253DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26t%3D1%26c=\r\nn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3D3084a7eb53ea988c00b18e060fa6a6=\r\n023b0f5c36%26iid%3D9df2edd3ab1d4c49a5c9ac3a0569baab%26uid%3D3816909793%26ni=\r\nd%3D14%2B27&amp;t=3D1&amp;cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&amp;sig=3Da=\r\n53a86b7487b15c908170e0d06203350ad2e0745&amp;iid=3D9df2edd3ab1d4c49a5c9ac3a0=\r\n569baab&amp;uid=3D3816909793&amp;nid=3D14+1555" class=3D"footer_link" style=\r\n=3D"text-decoration:none;border-style:none;border:0;padding:0;margin:0;font=\r\n-family:\'Helvetica Neue Light\', Helvetica, Arial, sans-serif;-webkit-font-s=\r\nmoothing:antialiased;-webkit-text-size-adjust:none;color:#55acee;font-size:=\r\n12px;padding:0px;margin:0px;font-weight:600;line-height:12px;">Settings</a>=\r\n | <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A%2F%2Fsupport.tw=\r\nitter.com%2F&amp;t=3D1&amp;cn=3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&amp;sig=3D=\r\n1dfdf7cecb06258c7e6a41ca318ec4370f621673&amp;iid=3D9df2edd3ab1d4c49a5c9ac3a=\r\n0569baab&amp;uid=3D3816909793&amp;nid=3D14+1557" class=3D"footer_link" styl=\r\ne=3D"text-decoration:none;border-style:none;border:0;padding:0;margin:0;fon=\r\nt-family:\'Helvetica Neue Light\', Helvetica, Arial, sans-serif;-webkit-font-=\r\nsmoothing:antialiased;-webkit-text-size-adjust:none;color:#55acee;font-size=\r\n:12px;padding:0px;margin:0px;font-weight:600;line-height:12px;">Help</a> | =\r\n<a href=3D"https://twitter.com/i/u?t=3D1&amp;cn=3DZW1haWxfY2hhbmdlX25vdGljZ=\r\nV9uZXh0&amp;sig=3D638d06973cb368d673778db5c8414b594d5c6ed2&amp;iid=3D9df2ed=\r\nd3ab1d4c49a5c9ac3a0569baab&amp;uid=3D3816909793&amp;nid=3D14+26" class=3D"f=\r\nooter_link" style=3D"text-decoration:none;border-style:none;border:0;paddin=\r\ng:0;margin:0;font-family:\'Helvetica Neue Light\', Helvetica, Arial, sans-ser=\r\nif;-webkit-font-smoothing:antialiased;-webkit-text-size-adjust:none;color:#=\r\n55acee;font-size:12px;padding:0px;margin:0px;font-weight:600;line-height:12=\r\npx;">Opt-out</a> | <a href=3D"https://twitter.com/i/redirect?url=3Dhttps%3A=\r\n%2F%2Ftwitter.com%2Faccount%2Fnot_my_account%2F3816909793%2F9CE5D-H4F5D-144=\r\n477%3Fut%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&amp;t=3D1&amp;cn=3DZW1=\r\nhaWxfY2hhbmdlX25vdGljZV9uZXh0&amp;sig=3D0e2b07faf8b7cab119459e512ea58097f5b=\r\n8e82b&amp;iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&amp;uid=3D3816909793&amp;n=\r\nid=3D14+25" class=3D"footer_link" style=3D"text-decoration:none;border-styl=\r\ne:none;border:0;padding:0;margin:0;font-family:\'Helvetica Neue Light\', Helv=\r\netica, Arial, sans-serif;-webkit-font-smoothing:antialiased;-webkit-text-si=\r\nze-adjust:none;color:#55acee;font-size:12px;padding:0px;margin:0px;font-wei=\r\nght:600;line-height:12px;">Not my account</a> </span> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"10" style=3D"height:10px;line-height:1px;font-size:1px;paddin=\r\ng:0;margin:0;line-height:1px;font-size:1px;"></td>\r\n</tr>\r\n<tr>\r\n<td align=3D"center" style=3D"padding:0;margin:0;line-height:1px;font-size:=\r\n1px;"> <span class=3D"address"> <a href=3D"" style=3D"text-decoration:none;=\r\nborder-style:none;border:0;padding:0;margin:0;font-family:\'Helvetica Neue L=\r\night\', Helvetica, Arial, sans-serif;-webkit-font-smoothing:antialiased;colo=\r\nr:#8899a6;font-size:12px;padding:0px;margin:0px;font-weight:normal;line-hei=\r\nght:12px;cursor:default;">Twitter, Inc. 1355 Market Street, Suite 900 San F=\r\nrancisco, CA 94103</a> </span> </td>\r\n</tr>\r\n<tr>\r\n<td height=3D"26" style=3D"height:26;padding:0;margin:0;line-height:1px;fon=\r\nt-size:1px;"></td>\r\n</tr>\r\n</tbody>\r\n</table> <img width=3D"1" height=3D"1" style=3D"display: block;margin:0;pad=\r\nding:0;display:block;-ms-interpolation-mode:bicubic;border:none;outline:non=\r\ne;" src=3D"https://twitter.com/scribe/ibis?t=3D1&amp;cn=3DZW1haWxfY2hhbmdlX=\r\n25vdGljZV9uZXh0&amp;iid=3D9df2edd3ab1d4c49a5c9ac3a0569baab&amp;uid=3D381690=\r\n9793&amp;nid=3D14+20" />\r\n<!--//////////////////////////////////////////////--> </td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n</body>\r\n</html>\r\n\r\n------=_Part_44683898_1221426234.1444773433942--\r\n') 

我試圖提取必須點擊確認電子郵件:

https://twitter.com/i/redirect?url=https%3A%2F%2Ftwitter.com%2Faccount%2Fconfirm_user_email%2F3816909793%2F9CE5D-H4F5D-144477%3Ft%3D1%26cn%3DZW1haWxfY2hhbmdlX25vdGljZV9uZXh0%26sig%3Da6878f323b83b61ceb5eaa8fbdb2214d25fc65ahdgdga33%3D9df2edd3ab1d4c49a5c9ac3a0569baab%26ac%3D1%26autoactions%3D1444773433%26uid%3D3816909793%26nid%3D14%2B309&amp;t=1&amp;cn=ZW1haWxfY2hhbmdlX25vdGljZV9uZXh0&amp;sig=2b56e3a59dd6b182afaf3abxcc67d73&amp;iid=9df2edd3ab1d4c49a5c9ac3a0569baab&amp;uid=3816909793&amp;nid=14+309 

使用regex101,我建立this正則表達式,它似乎運作良好。然而,當我提取生成的Python代碼:

import re 
p = re.compile(ur'(https.+)(\\r|\\n)') 
test_str = (the full email text) 

然後re.search(p, test_str)回報什麼。與re.findall()一樣。

爲什麼生成的Python代碼不工作,和/或有更好的正則表達式?注意:文本中有多個Twitter網址;我希望只匹配與'立即確認'按鈕綁定的那個。

的Python:2.7

+2

看起來原始郵件是HTML格式的。爲什麼不使用某種類型的XML解析器? – SiKing

+0

你究竟想要提取什麼? –

+0

@SiKing是的,我在看BeautifulSoup,謝謝。 – Pyderman

回答

2

在使用正則表達式或其他更合適的工具從電子郵件中提取數據之前,應該首先使用電子郵件解析器正確處理電子郵件。在Python中,我們有email.parser開箱即用的:

raw_content = 'Delivered-To: [email protected]' 

import email.parser 
email_parser = email.parser.Parser() 
email_content = email_parser.parsestr(raw_content) 

def get_all_messages(email_message): 
    stack = [email_message] 
    messages = [] 
    while len(stack): 
     msg = stack.pop() 
     if msg.is_multipart(): 
      stack += msg.get_payload() 
     else: 
      messages.append(msg) 
    return messages 

messages = get_all_messages(email_content) 

messages變量包含在電子郵件中的各個部分。您可以選擇使用正則表達式從text/plain消息中提取鏈接,或使用HTML解析器(如BeautifulSoup)從text/html消息中提取鏈接。

下面是從text/plain消息中提取鏈接示例代碼:

for msg in messages: 
    if msg.get_content_type() == 'text/plain': 
     import re 
     # Decode the message according to Content-Transfer-Encoding 
     # Then decode the text according to charset field in Content-Type header, fall back to UTF-8 if not specified 
     payload = msg.get_payload(decode=True).decode(msg.get_content_charset('utf-8')) 
     link = re.findall(ur'https?://.*', payload) 

採取記下通話.get_payload(decode=True)的。根據Content-Transfer-Encoding標頭,必須指定decode參數來解碼有效負載。雖然在text/plain消息中無關緊要,但它會影響text/html的正確性,因爲這種情況下的有效負載是quoted-printable

由於只有一個鏈接,所以上面的簡單正則表達式就足夠了。

在使用HTML解析器解析消息之前,可以使用類似的代碼來處理text/html消息的有效負載。 HTML解析後,您可以選擇所有<a>標籤,並且只保留鏈接中包含confirm_user_email的標籤。

+0

謝謝。函數get_all_messages中的第一行是否應該讀取stack = [email_message]'? 'email_message'目前作爲參數傳遞,但似乎未被使用? – Pyderman

+0

@Pyderman:你說得對。這是一個複製粘貼錯誤。 – nhahtdh

1

如果你使用字符串字面那麼不要試圖逃避\字符。因此,除去r開頭:

p = re.compile(u'(https.+)(\\r|\\n)') 

或者不使用雙backslahes:

p = re.compile(ur'(https.+)(\r|\n)') 

希望它能幫助!

+0

downvote的原因? – cdonts

1

嘗試從您的正則表達式開始刪除「你」。您也可以直接使用編譯的正則表達式作爲執行搜索的對象。

試試這個:

import re 
p = re.compile('(https.+)(\\r|\\n)') 
test_str = (the full email text) 
desired_string = p.search(test_str) 
print desired_string.group(0) 
1

我會用一個稍微不同的正則表達式:

import re 

with open('out') as f: # out contains the page content 
    content = f.read() 

p = re.compile(u'"(https:.*?)"') 

for m in re.findall(p, content): 
    print m 

.*?是一個非貪婪的比賽,並在第一個雙引號停止。

1
result = re.findall(r"(https.*?)(?:\r|\n)", email, re.MULTILINE) 
link = result[0] 

現場演示的Python

http://ideone.com/9R62Ug


正則表達式說明

(https.*?)(?:\r|\n) 

Match the regex below and capture its match into backreference number 1 «(https.*?)» 
    Match the character string 「https」 literally «https» 
    Match any single character that is NOT a line break character «.*?» 
     Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» 
Match the regular expression below «(?:\r|\n)» 
    Match this alternative «\r» 
     Match the carriage return character «\r» 
    Or match this alternative «\n» 
     Match the line feed character «\n» 
+0

很好地完成了,並感謝Ideone演示版「走得更遠」。 – Pyderman

+0

不客氣,很高興它解決了。 –

相關問題