2015-06-30 39 views
1

我正在使用Python替換SQL文件中的某些字符串。該字符串看起來這樣的:使用Python替換SQL中的某些文本

<img title="\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\dpi{50}&amp;space;\fn_phv&amp;space;\frac{3}{8}" alt="" /> 

基本上它包含了一小部分的HTML代碼。但現在我想用來替換:

<sup>3</sup>&frasl;<sub>8</sub> 

取代它在SQL文件,我使用Python中的代碼,

for line in filedata: 
    re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", line) 

這不會改變數據,所以我也嘗試過這一點。

filedata1 = re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", filedata) 

這也沒有幫助我。需要一些幫助。

我全碼:

import re 
with open('/Users/cnnlakshmen/Downloads/qz_question.sql', 'r') as fin: 
    filedata = fin.read() 

for line in filedata: 
    re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", line) 

filedata1 = re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", filedata) 
print filedata1 

# Write the file out again 
with open('/Users/cnnlakshmen/Downloads/qz_question1.sql', 'w') as fin: 
    fin.write(filedata1) 

每一個數據行看起來是這樣的:

(163, 'S001', 'T005', 'ST015', 'Medium', '1', 9, '1', '<p>The ratio of the number of children to the number of adults at a funfair was 2 : 5.​&nbsp;&nbsp;<sup>1</sup>&frasl;<sub>5</sub>of the children were boys. If there were 120 more adults than children, how many girls were there at the funfair?</p>\n<p>&nbsp;</p>', 'without_image', '[{"value":"16","answer":"0"},{"value":"40","answer":"0"},{"value":"64","answer":"1"},{"value":"120","answer":"0"}]', '<p>5 -2 = 3</p>\n<p>3 units --&gt; 120</p>\n<p>1 unit --&gt; 120 &divide; 3 = 40</p>\n<p>2 units --&gt; 40 x 2 = 80</p>\n<p>1 - <img title="\\small \\frac{1}{5}" src="http://latex.codecogs.com/gif.latex?\\small&amp;space;\\frac{1}{5}" alt="" width="5" height="20" />&nbsp;=&nbsp;<img title="\\small \\frac{4}{5}" src="http://latex.codecogs.com/gif.latex?\\small&amp;space;\\frac{4}{5}" alt="" width="4" height="16" /></p>\n<p><img title="\\small \\frac{4}{5}" src="http://latex.codecogs.com/gif.latex?\\small&amp;space;\\frac{4}{5}" alt="" width="4" height="16" />&nbsp;x 80 = 64</p>', 'lakshmen K', NULL, '1', '0', '2015-05-03 15:54:19', '0000-00-00 00:00:00'), 

回答

0

你的正則表達式不工作,你可能會認爲這是工作。

>>> a = '<img title="\\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\\dpi{50}&amp;space;\\fn_phv&amp;space;\\frac{3}{8}" alt="" />' 
>>> pattern = r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>' 
>>> re.findall(pattern, a) 
[('3', '8')] 

這提取了你的分數的數字。現在,這也適用於查找字符串,

>>> pattern = r'<img\b[^<]*(?<=title=")\\frac\{\d+\}\{\d+\}"[^<]*>' 
>>> re.findall(pattern, a) 
['<img title="\\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\\dpi{50}&amp;space;\\fn_phv&amp;space;\\frac{3}{8}" alt="" />'] 

,改變你的替換字符串,使您的工作方法。

>>> sub = "<sup>1</sup>&frasl;<sub>2</sub>" 
>>> re.sub(pattern, sub, a) 
'<sup>3</sup>&frasl;<sub>8</sub>' 
0

您與r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>'方法失敗,原因是兩方面的原因:

  1. 要匹配\,該模式必須包含一個轉義\,我。即數據線部分\\fracr'\\\\frac'模式相匹配。

  2. 從你在上面寫了什麼不同的字符串的樣子(title="\frac{3}{8}"),你的問題的底部提供的數據線有title="\\small \\frac{1}{5}" - 你沒佔到\\small也在格局。

合併了到您的模式產生

r'<img\b[^<]*(?<=title=")(?:\\\\small)?\\\\frac\{(\d+)}\{(\d+)}"[^<]*>' 

與您的數據相匹配。

相關問題