2012-12-24 142 views
2

我有一個文件有幾個IP地址。在txt的4行中有大約900個IP。我希望輸出爲每行1個IP。我怎樣才能做到這一點?基於其他的代碼,我想出了這個室內用,但它無法becasue多個IP單線路:python解析文件的IP地址

import sys 
import re 

try: 
    if sys.argv[1:]: 
     print "File: %s" % (sys.argv[1]) 
     logfile = sys.argv[1] 
    else: 
     logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ") 
    try: 
     file = open(logfile, "r") 
     ips = [] 
     for text in file.readlines(): 
      text = text.rstrip() 
      regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text) 
      if regex is not None and regex not in ips: 
       ips.append(regex) 

     for ip in ips: 
      outfile = open("/tmp/list.txt", "a") 
      addy = "".join(ip) 
      if addy is not '': 
       print "IP: %s" % (addy) 
       outfile.write(addy) 
       outfile.write("\n") 
    finally: 
     file.close() 
     outfile.close() 
except IOError, (errno, strerror): 
     print "I/O Error(%s) : %s" % (errno, strerror) 
+2

你要找的IPv4地址的規範形式。請注意,即使是IPv4地址,也有其他可接受的形式。例如嘗試http:// 2130706433 /如果您在本地主機端口80上運行HTTP服務器(2130706433 == 0x7f000001 == 127.0.0.1)。當然,如果你控制文件的格式,你不需要擔心這些事情......但是,如果你能夠切實支持IPv6,它將會對你的腳本有前瞻性。 –

+0

're.findall()'總是返回一個列表。它永遠不是'沒有'。 – jfs

回答

2

$錨在你的表達是阻止你找到任何東西,但最後一個條目。卸下,然後使用由.findall()返回的列表:

found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text) 
if regex: 
    ips.extend(found) 
1

的函數findAll返回匹配的數組,你是不是通過每場比賽迭代。

regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text) 
if regex is not None: 
    for match in regex: 
     if match not in ips: 
      ips.append(match) 
0

沒有re.MULTILINE標誌$只在字符串的結尾相匹配。

爲了使調試更容易將代碼拆分爲幾個可獨立測試的部分。

def extract_ips(data): 
    return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data) 

如果輸入文件是小,你並不需要保存IPS的原始順序:

with open(filename) as infile, open(outfilename, "w") as outfile: 
    outfile.write("\n".join(set(extract_ips(infile.read())))) 

否則:

with open(filename) as infile, open(outfilename, "w") as outfile: 
    seen = set() 
    for line in infile: 
     for ip in extract_ips(line): 
      if ip not in seen: 
       seen.add(ip) 
       print >>outfile, ip 
1

提取IP地址從文件

我在this discussion回答了類似的問題。總之,這是基於我正在進行的項目之一,用於提取液的網絡,並從不同類型的輸入數據的基於主機的指標(如字符串,文件,博客文章等):https://github.com/JohnnyWachter/intel


我會導入在IPAddresses數據類,然後用它們來完成你的任務,以下列方式:

#!/usr/bin/env/python 

"""Extract IPv4 Addresses From Input File.""" 

from Data import CleanData # Format and Clean the Input Data. 
from IPAddresses import ExtractIPs # Extract IPs From Input Data. 


def get_ip_addresses(input_file_path): 
    """" 
    Read contents of input file and extract IPv4 Addresses. 
    :param iput_file_path: fully qualified path to input file. Expecting str 
    :returns: dictionary of IPv4 and IPv4-like Address lists 
    :rtype: dict 
    """ 

    input_data = [] # Empty list to house formatted input data. 

    input_data.extend(CleanData(input_file_path).to_list()) 

    results = ExtractIPs(input_data).get_ipv4_results() 

    return results 
  • 現在你已經列出的字典,您可以輕鬆訪問您想要的數據並以您想要的任何方式輸出。下面的例子利用了上面的功能;結果打印到控制檯,並把它們寫入到一個指定的輸出文件:

    # Extract the desired data using the aforementioned function. 
    ipv4_list = get_ip_addresses('/path/to/input/file') 
    
    # Open your output file in 'append' mode. 
    with open('/path/to/output/file', 'a') as outfile: 
    
        # Ensure that the list of valid IPv4 Addresses is not empty. 
        if ipv4_list['valid_ips']: 
    
         for ip_address in ipv4_list['valid_ips']: 
    
          # Print to console 
          print(ip_address) 
    
          # Write to output file. 
          outfile.write(ip_address)