2014-11-02 70 views
0

我有以下的一塊Hadoop集羣:找到基於一個子一個日誌文件中的特定行 - Python的

==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorage: Storage directory /data/1/dfs/nn has been successfully formatted. 
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorage: Storage directory /nfsmount/dfs/nn has been successfully formatted. 
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Saving image file /nfsmount/dfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression 
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Saving image file /data/1/dfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression 
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Image file of size 115 saved in 0 seconds. 
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Image file of size 115 saved in 0 seconds. 
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 
==> namenode_32: 14/11/02 02:19:32 INFO util.ExitUtil: Exiting with status 0 
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NameNode: SHUTDOWN_MSG: 
==> namenode_32: /************************************************************ 
==> namenode_32: SHUTDOWN_MSG: Shutting down NameNode at ip-10-45-129-157.ec2.internal/10.45.129.157 
==> namenode_32: ************************************************************/ 
==> namenode_32: * Starting Hadoop namenode: 
==> namenode_32: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-10-45-129-157.out 
==> namenode_32: * Starting Hadoop secondarynamenode: 
==> namenode_32: starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-ip-10-45-129-157.out 
==> namenode_32: * Starting Hadoop jobtracker: 
==> namenode_32: starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-ip-10-45-129-157.out 

,我試圖找到這種集羣的ip address。我知道SHUTDOWN_MSG: Shutting down NameNode ...這是我正在尋找的是私人DNSprivate ip的元組。對於關節例如,我得到:

(ip-10-45-129-157.ec2.internal, 10.45.129.157) 

所以我嘗試:

import re 
expr = "SHUTDOWN_MSG: Shutting down NameNode at" 
s = re.search(expr, log) 
>>> print (s.group()) 
SHUTDOWN_MSG: Shutting down NameNode at 

這不是我想要的...我怎麼能生產出使用正則表達式這樣的元組?

回答

1

您可以使用多個捕獲組捕獲上下文後記。

>>> re.search(r'SHUTDOWN_MSG: Shutting down NameNode at (.+)/(.+)', log).groups() 
('ip-10-45-129-157.ec2.internal', '10.45.129.157') 

你可以寫你的表達爲:使用

>>> re.search(r'SHUTDOWN_MSG:.+at (.+)/(.+)', log).groups() 
2

使用多個捕獲搜索字符串後羣體:

>>> expr = 'SHUTDOWN_MSG:.+at (.+)/(.+)' 
>>> re.search(expr, log).groups() 
('ip-10-45-129-157.ec2.internal', '10.45.129.157') 
0

捕捉組()

import re 
f=open('log_file','r').read() 
re.findall("SHUTDOWN_MSG:.+at (.+)/(.+)",f) 

re.findall()將不會停止在第一次發現它會發現,直到它到達終點的文件,所以它會給你所有匹配

相關問題