從文件中提取信息

我想要使用Python 3.4提取某個系統的IP地址，我在一個文件中有大約40,000行信息。該文件分成每個以「lease」開始並以「}」結尾的塊。我想搜索「SYSTEM123456789」並提取IP地址「10.0.0.2」。我該如何去做，那麼首選的方法是什麼？從文件中提取信息

1）讀入文件，將它們分解到列表中，然後搜索？
2）複製文件，然後在該文件中搜索？

lease 10.0.0.1 { 
    starts 1 2015/06/29 07:22:01; 
    ends 2 2015/06/30 07:22:01; 
    tstp 2 2015/06/30 07:22:01; 
    cltt 1 2015/06/29 07:22:01; 
    binding state active; 
    next binding state free; 
    hardware ethernet 08:2e:5f:f0:8b:a1; 
} 
lease 10.0.0.2{ 
    starts 1 2015/06/29 07:31:20; 
    ends 2 2015/06/30 07:31:20; 
    tstp 2 2015/06/30 07:31:20; 
    cltt 1 2015/06/29 07:31:20; 
    binding state active; 
    next binding state free; 
    hardware ethernet ec:b1:d7:87:6f:7a; 
    uid "\001\354\261\327\207oz"; 
    client-hostname "SYSTEM123456789"; 
}

來源

2015-06-29 dreamzboy

是'租賃..}'存儲在不同行的塊將打印IP？同時告訴我們你的嘗試。 – ssundarraj

還沒有嘗試，因爲我不知道從哪裏開始。我會打破每個塊並將其存儲在一個列表中。接下來我會用';'來分解它。 '分隔符。搜索SYSTEM123456789並搜索列表[0]以使用startswith（「租約」）查找IP。 – dreamzboy

看起來很好。你爲什麼不嘗試爲它編寫代碼？ – ssundarraj

你可以組使用租賃作爲分隔符與GROUPBY的部分：

from itertools import groupby 

def find_ip(s, f): 
    with open(f) as f: 
     grouped = groupby(f, key=lambda x: x.startswith("lease ")) 
     for k, v in grouped: 
      if k: # v is the lease line 
       # get ip from lease line 
       ip = next(v).rstrip().split()[1] 
       # call next to get next element from our groupby object 
       # which is each section after lease 
       val = list(next(grouped)[1])[-2] 
       # check for substring 
       if val.find(s) != -1: 
        return ip.rstrip("{") 
    return "No match"

使用輸入文件：

In [5]: find_ip('"SYSTEM123456789"',"in.txt") 
Out[5]: '10.0.0.2'

x.startswith("lease ")爲重點，以GROUPBY將文件分割成段，if k是真的，我們與lease一致，所以我們提取ip然後檢查租賃部分的第二行，如果我們發現子串然後返回IP。

的文件被分成行的部分看起來像如下：

[' starts 1 2015/06/29 07:22:01;\r\n', ' ends 2 2015/06/30 07:22:01;\r\n', ' tstp 2 2015/06/30 07:22:01;\r\n', ' cltt 1 2015/06/29 07:22:01;\r\n', ' binding state active; \r\n', ' next binding state free;\r\n', ' hardware ethernet 08:2e:5f:f0:8b:a1;\r\n', '}\r\n'] 
[' starts 1 2015/06/29 07:31:20;\r\n', ' ends 2 2015/06/30 07:31:20;\r\n', ' tstp 2 2015/06/30 07:31:20;\r\n', ' cltt 1 2015/06/29 07:31:20;\r\n', ' binding state active; \r\n', ' next binding state free;\r\n', ' hardware ethernet ec:b1:d7:87:6f:7a;\r\n', ' uid "\\001\\354\\261\\327\\207oz";\r\n', ' client-hostname "SYSTEM123456789";\r\n', '}']

你可以看到第二個最後一個元素是client-hostname所以我們每次提取時間和搜索字符串。

如果字符串可以出現在任何地方，你可以使用任何與檢查各行：

def find_ip(s, f): 
    with open(f) as f: 
     grouped = groupby(f, key=lambda x: x.startswith("lease ")) 
     for k, v in grouped: 
      if k: # v is the lease line 
       # get ip from lease line 
       ip = next(v).rstrip().split()[1] 
       # call next to get next element from our groupby object 
       # which is each section after lease 
       val = next(grouped)[1] 
       # check for substring 
       if any(sub.find(s) != -1 for sub in val): 
        return ip.rstrip("{") 
    return "No match"

您可以應用同樣的邏輯只是遍歷文件對象與外的內環，當你發現一個行以「lease」開頭的內容開始內部循環，直到找到子字符串並返回ip或打開}表示該部分結束時打破內部循環。

def find_ip(s, f): 
    with open(f) as f: 
     for line in f: 
      if line.startswith("lease "): 
       ip = line.rstrip().split()[1] 
       for n_line in f: 
        if n_line.find(s) != -1: 
         return ip.rstrip("{") 
        if n_line.startswith("}"): 
         break 
    return "No match"

輸出：

In [9]: find_ip('"SYSTEM123456789"',"in.txt") 
Out[9]: '10.0.0.2'

既不方法涉及在任一個時刻在存儲器中存儲的行多於一個的部分。

來源

2015-06-29 19:10:43

這個itertool黑魔法的作品。您的帖子非常詳細，可以很容易地進行。我認爲itertool比嵌套2 for循環更高效。 – dreamzboy

無論如何，itertools方法更簡潔，更好看！ –

走了@Ijk提到的，我想出了這個。

import re 

find_ip = False 

with open(f) as f: 
    for line in f: 
     mat = re.match(r'lease ([0-9]*.[0-9]*.[0-9]*.[0-9]*).*', line, re.M) 
     if mat: 
      ip = mat.group(1) 
     mat = re.match(r'.* ("SYSTEM123456789").*', line, re.M) 
     if mat: 
      print(ip)

OP問一個優先的方法，這是我的，雖然我不是最好的正則表達式。不過，我認爲這正是OP在尋找的。

我改變了正則表達式的IP地址，以便它可以找到隨機IP的，只有當它發現系統名稱

來源

2015-06-29 19:48:23 SirParselot

這可能是我可能結束的方法，但IP是隨機的。這裏的關鍵不在於搜索IP而是系統的名稱。感謝您的貢獻。 – dreamzboy

從文件中提取信息

回答

相關問題