2012-12-27 66 views
6

我有以下lisp文件,它來自UCI machine learning database。我想用python把它轉換成一個扁平的文本文件。一個典型的行看起來是這樣的:使用Python解析lisp文件

(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0))) 

我想解析爲像一個文本文件,這樣的:

time pitch duration keysig timesig fermata 
8 67 4  1  12  0 
12 67 8  1  12  0 

是否有一個Python模塊智能地解析這個?這是我第一次看到lisp。

+0

不[解析S-表達式在Python(http://stackoverflow.com/q/3182594)幫助? –

+0

謝謝,請看看。 – qua

+1

爲什麼不使用lisp將其轉換爲另一種格式? – Marcin

回答

20

this answer所示,pyparsing似乎是該合適的工具:

inputdata = '(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))' 

from pyparsing import OneOrMore, nestedExpr 

data = OneOrMore(nestedExpr()).parseString(inputdata) 
print data 

# [['1', [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']], [['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]]] 

對於完整起見,這是如何格式化的結果(使用texttable):

from texttable import Texttable 

tab = Texttable() 
for row in data.asList()[0][1:]: 
    row = dict(row) 
    tab.header(row.keys()) 
    tab.add_row(row.values()) 
print tab.draw() 
 
+---------+--------+----+-------+-----+---------+ 
| timesig | keysig | st | pitch | dur | fermata | 
+=========+========+====+=======+=====+=========+ 
| 12  | 1  | 8 | 67 | 4 | 0  | 
+---------+--------+----+-------+-----+---------+ 
| 12  | 1  | 12 | 67 | 8 | 0  | 
+---------+--------+----+-------+-----+---------+ 

將這些數據轉換回口齒不清符號:

def lisp(x): 
    return '(%s)' % ' '.join(lisp(y) for y in x) if isinstance(x, list) else x 

d = lisp(d[0]) 
+0

這是絕對的答案,因爲操作系統要求「一個python *模塊* *智能*解析這個」 – Bakuriu

+0

謝謝!真的很有幫助。 – qua

+0

那麼你會如何將數據轉換回Lisp代碼呢? –

1

使用正則表達式分離成對:

In [1]: import re 

In [2]: txt = '(((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))' 

In [3]: [p.split() for p in re.findall('\w+\s+\d+', txt)] 
Out[3]: [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0'], ['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']] 

然後使之成爲一個詞典:

dct = {} 
for p in data: 
    if not p[0] in dct.keys(): 
     dct[p[0]] = [p[1]] 
    else: 
     dct[p[0]].append(p[1]) 

其結果是:

In [10]: dct 
Out[10]: {'timesig': ['12', '12'], 'keysig': ['1', '1'], 'st': ['8', '12'], 'pitch': ['67', '67'], 'dur': ['4', '8'], 'fermata': ['0', '0']} 

印刷:

print 'time pitch duration keysig timesig fermata' 
for t in range(len(dct['st'])): 
    print dct['st'][t], dct['pitch'][t], dct['dur'][t], 
    print dct['keysig'][t], dct['timesig'][t], dct['fermata'][t] 

正確的格式就留給讀者做練習...

2

如果您知道數據是正確的,格式統一的(在第一眼看起來如此),如果你只需要這個數據和唐不需要解決一般問題......那麼爲什麼不只是用空格替換每個非數字,然後再用split呢?

import re 
data = open("chorales.lisp").read().split("\n") 
data = [re.sub("[^-0-9]+", " ", x) for x in data] 
for L in data: 
    L = map(int, L.split()) 
    i = 1 # first element is chorale number 
    while i < len(L): 
     st, pitch, dur, keysig, timesig, fermata = L[i:i+6] 
     i += 6 
     ... your processing goes here ... 
1

由於數據已經在Lisp中,使用LISP本身:

(let ((input '(1 ((ST 8) (PITCH 67) (DUR 4) (KEYSIG 1) (TIMESIG 12) (FERMATA 0)) 
      ((ST 12) (PITCH 67) (DUR 8) (KEYSIG 1) (TIMESIG 12) (FERMATA 0))))) 

     (let ((row-headers (mapcar 'car (second input))) 
      (row-data (mapcar (lambda (row) (mapcar 'second row)) (cdr input)))) 

    (format t "~{~A~^ ~}~%" row-headers) 
    (format t "~{~{~A~^ ~}~^ ~%~}" row-data)))