2013-07-07 43 views
2

我正在編寫一個程序來將標準SVG路徑轉換爲Raphael.js友好格式。Python-刪除字符,然後加入字符串

的路徑數據格式爲

d="M 62.678745, 
    259.31235 L 63.560745, 
    258.43135 L 64.220745, 
    257.99135 L 64.439745, 
    258.43135 L 64.000745 
    ... 
    ... 
    " 

我想要做的是首先刪除十進制數字,然後取出空白。最終的結果應該是在格式

d="M62, 
    259L63, 
    258L64, 
    257L64, 
    258L64 
    ... 
    ... 
    " 

我有大約2000個左右,這些路徑來解析和轉換成一個JSON文件。

我已經得到了迄今所做的是

from bs4 import BeautifulSoup 

svg = open("/path/to/file.svg", "r").read() 
soup = BeautifulSoup(svg) 
paths = soup.findAll("path") 

raphael = [] 

for p in paths: 
    splitData = p['d'].split(",") 
    tempList = [] 

    for s in splitData: 
     #strip decimals from string 
     #don't know how to do this 

     #remove whitespace 
     s.replace(" ", "") 

     #add to templist 
     tempList.append(s + ", ") 

    tempList[-1].replace(", ", "") 
    raphael.append(tempList) 

回答

1

試試這個:

import re 
from bs4 import BeautifulSoup 

svg = open("/path/to/file.svg", "r").read() 
soup = BeautifulSoup(svg) 
paths = soup.findAll("path") 

raphael = [] 

for p in paths: 
    splitData = p['d'].split(",") 
    for line in splitData: 
     # Remove ".000000" part 
     line = re.sub("\.\d*", "", line) 
     line = line.replace(" ", "") 
     raphael.append(line) 

d = ",\n".join(raphael) 
3

您可以使用regex

>>> import re 
>>> d="""M 62.678745, 
    259.31235 L 63.560745, 
    258.43135 L 64.220745, 
    257.99135 L 64.439745, 
    258.43135 L 64.000745""" 

for strs in d.splitlines(): 
    print re.sub(r'(\s+)|(\.\d+)','',strs) 
...  
M62, 
259L63, 
258L64, 
257L64, 
258L64 
+0

+1更簡單的解決方案... –

1

您可以構建一個蠻力解析器:

def isint(x): 
    try: 
     int(float(x)) 
     return True 
    except: 
     return False 

def parser(s): 
    mystr = lambda x: str(int(float(x))) 
    s = s.replace('\n','##') 
    tmp = ','.join([''.join([mystr(x) if isint(x) else x \ 
         for x in j.split()]) \ 
         for j in s.split(',')]) 
    return tmp.replace('##', '\n') 

測試:

d="M 62.678745,\n 259.31235 L 63.560745,\n 258.43135 L 64.220745, \n 257.99135 L 64.439745, \n 258.43135 L 64.000745 " 
print parser(d) 
# M62, 
# 259L63, 
# 258L64, 
# 257L64, 
# 258L64 
+0

你不覺得這是矯枉過正?這可以通過're.sub(r'(\ s +)|(\。\ d +)','',line)很容易地完成' –

+0

我肯定必須學習正則表達式...... –