2014-02-19 56 views
0

我想從文件中只提取IP地址,將它們按數字組織並將結果放入另一個文件中。提取特定分隔符後的IP地址

的數據是這樣的:

The Spammer (and all his/her info): 
Username: user 
User ID Number: 0 
User Registration IP Address: 77.123.134.132 
User IP Address for Selected Post: 177.43.168.35 
User Email: [email protected] 

這裏是我的代碼,它不會將IP地址正確地排序(即它77.123.134.132之前列出177.43.168.35):

import re 

spammers = open('spammers.txt', "r") 
ips = [] 
for text in spammers.readlines(): 
    text = text.rstrip() 
    print text 
    regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text) 
    if regex is not None and regex not in ips: 
     ips.append(regex) 

for ip in ips: 
    OrganizedIPs = open("Organized IPs.txt", "a") 
    addy = "".join(ip) 
    if addy is not '': 
     print "IP: %s" % (addy) 
     OrganizedIPs.write(addy) 
     OrganizedIPs.write("\n") 
     spammers.close() 
     OrganizedIPs.close() 

organize = open("Organized IPs.txt", "r") 
ips = organize.readlines(); 
ips = list(set(ips)) 
print ips 
for i in range(len(ips)): 
    ips[i] = ips[i].replace('\n', '') 
print ips 
ips.sort() 
finish = open('organized IPs.txt', 'w') 
finish.write('\n'.join(ips)) 
finish.close() 
clean = open('spammers.txt', 'w') 
clean.close() 

我曾嘗試使用this IP sorter code,但它需要一個字符串作爲正則表達式返回一個列表。

+0

也許那裏有一個聰明的辦法,但爲什麼不直接劈在並將int映射到您得到的列表並對int列表進行排序? – deinonychusaur

+0

@deinonychusaur這正是我要做的! –

+0

在你的例子中不要使用真實的IP地址。 –

回答

0
import re 

LOG = "spammers.txt" 
IPV4 = re.compile(r"(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})") 
RESULT = "organized_ips.txt" 

def get_ips(fname): 
    with open(fname) as inf: 
     return IPV4.findall(inf.read()) 

def numeric_ip(ip): 
    return [int(i) for i in ip.split(".")] 

def write_to(fname, iterable, fmt): 
    with open(fname, "w") as outf: 
     for i in iterable: 
      outf.write(fmt.format(i)) 

def main(): 
    ips = get_ips(LOG) 
    ips = list(set(ips))  # uniquify 
    ips.sort(key=numeric_ip) 
    write_to(RESULT, ips, "IP: {}\n") 

if __name__=="__main__": 
    main() 
0

試試這個:

sorted_ips = sorted(ips, key=lambda x: '.'.join(["{:>03}".format(octet) for octet in x.split(".")]) 
3

或者這(節省您的字符串格式化成本): 「」

def ipsort (ip): 
    return tuple (int (t) for t in ip.split ('.')) 

ips = ['1.2.3.4', '100.2.3.4', '62.1.2.3', '62.1.22.4'] 
print (sorted (ips, key = ipsort)) 
+0

非常漂亮!我傾向於默認爲「明顯但緩慢」的答案:) –

+0

就個人而言,我已經與'lambda':'sorted_list = sorted(ips,key = lambda ip:tuple(int(t)for t in ip。 split('。')))',但你的答案很好(並且完全相同的東西= P)。 – That1Guy

+0

或'lambda ip:tuple(map(int,ip.split('。')))',但我們真的只是重新哈希Hyperboreus做得非常好! –