將.txt文件轉換爲字典（Python v2.7）

我目前正在尋找處理和解析出this .txt file的信息。該文件似乎是製表符分隔的。我正在尋找解析出基本16值（即000000）作爲詞典鍵和公司名稱（即Xerox Corporation）作爲詞典值。因此，例如，如果我在字典中查找關鍵字000001，施樂公司將作爲相應的價值返回。將.txt文件轉換爲字典（Python v2.7）

我試過解析.txt文件作爲csv讀取每一行的條目，但不幸的是沒有任何模式和第n個數字的變化。

有沒有什麼辦法來捕捉例如術語「基礎16」之前的值，然後是術語後面的術語來作爲字典條目？

非常感謝

來源

2011-11-09 thefragileomen

嘛項分隔兩個換行。第二行始終是base16之一。第一個選項卡之前的數據是base16密鑰，最後一個是公司名稱。

import urllib 

inputfile = urllib.urlopen("http://standards.ieee.org/develop/regauth/oui/oui.txt") 
data = inputfile.read() 

entries = data.split("\n\n")[1:-1] #ignore first and last entries, they're not real entries 

d = {} 
for entry in entries: 
    parts = entry.split("\n")[1].split("\t") 
    company_id = parts[0].split()[0] 
    company_name = parts[-1] 
    d[company_id] = company_name

一些結果：

40F52E: Leica Microsystems (Schweiz) AG 
3831AC: WEG 
00B0F0: CALY NETWORKS 
9CC077: PrintCounts, LLC 
000099: MTX, INC. 
000098: CROSSCOMM CORPORATION 
000095: SONY TEKTRONIX CORP. 
000094: ASANTE TECHNOLOGIES 
000097: EMC Corporation 
000096: MARCONI ELECTRONICS LTD. 
000091: ANRITSU CORPORATION 
000090: MICROCOM 
000093: PROTEON INC. 
000092: COGENT DATA TECHNOLOGIES 
002192: Baoding Galaxy Electronic Technology Co.,Ltd 
90004E: Hon Hai Precision Ind. Co.,Ltd. 
002193: Videofon MV 
00A0D4: RADIOLAN, INC. 
E0F379: Vaddio 
002190: Goliath Solutions

來源

2011-11-09 16:37:27 orlp

非常感謝@nightcracker。如何將此鏈接到我下載的.txt文件？這個委託在例如「oui.txt」中。我如何首先打開並閱讀這個文件中的條目？謝謝 – thefragileomen

@ thefragileomen：我用這個信息更新了我的答案。如果我的答案解決了您的問題，請考慮點擊答案左側的「接受答案」按鈕。 – orlp

result = dict() 
for lig in open('oui.txt'): 
    if 'base 16' in lig: 
     num, sep, txt = lig.strip().partition('(base 16)') 
     result.[num.strip()] = txt.strip()

來源

2011-11-09 16:36:29 dugres

def oui_parse(fn='oui.txt'): 
    with open(fn) as ouif: 
     content = ouif.read() 
    for block in content.split('\n\n'): 
     lines = block.split('\n') 

     if not lines or not '(hex)' in lines[0]: # First block 
      continue 

     assert '(base 16)' in lines[1] 
     d = {} 
      d['oui'] = lines[1].split()[0] 
     d['company'] = lines[1].split('\t')[-1] 
     if len(lines) == 6: 
      d['division'] = lines[2].strip() 
     d['street'] = lines[-3].strip() 
     d['city'] = lines[-2].strip() 
     d['country'] = lines[-1].strip() 
     yield d 

oui_info = list(oui_parse())

來源

2011-11-09 16:42:35 phihag

>>> import urllib 
... 
... f = urllib.urlopen('http://standards.ieee.org/develop/regauth/oui/oui.txt') 
... d = dict([(s[:6], s[22:].strip()) for s in f if 'base 16' in s]) 
... print d['000001'] 
XEROX CORPORATION

來源

2011-11-09 16:47:39

將.txt文件轉換爲字典（Python v2.7）

回答

相關問題