2012-05-06 14 views
2

我想在python中使用re模塊將每個區段分成不同的區域(統一格式)。差異的格式是這樣的...在Python中使用正則表達式分割一個diff文件

diff --git a/src/core.js b/src/core.js 
index 9c8314c..4242903 100644 
--- a/src/core.js 
+++ b/src/core.js 
@@ -801,7 +801,7 @@ jQuery.extend({ 
     return proxy; 
    }, 

- // Mutifunctional method to get and set values to a collection 
+ // Multifunctional method to get and set values of a collection 
    // The value/s can optionally be executed if it's a function 
    access: function(elems, fn, key, value, chainable, emptyGet, pass) { 
     var exec, 
diff --git a/src/sizzle b/src/sizzle 
index fe2f618..feebbd7 160000 
--- a/src/sizzle 
+++ b/src/sizzle 
@@ -1 +1 @@ 
-Subproject commit fe2f618106bb76857b229113d6d11653707d0b22 
+Subproject commit feebbd7e053bff426444c7b348c776c99c7490ee 
diff --git a/test/unit/manipulation.js b/test/unit/manipulation.js 
index 18e1b8d..ff31c4d 100644 
--- a/test/unit/manipulation.js 
+++ b/test/unit/manipulation.js 
@@ -7,7 +7,7 @@ var bareObj = function(value) { return value; }; 
var functionReturningObj = function(value) { return (function() { return value; }); }; 

test("text()", function() { 
- expect(4); 
+ expect(5); 
    var expected = "This link has class=\"blog\": Simon Willison's Weblog"; 
    equal(jQuery("#sap").text(), expected, "Check for merged text of more then one element."); 

@@ -20,6 +20,10 @@ test("text()", function() { 
     frag.appendChild(document.createTextNode("foo")); 

    equal(jQuery(frag).text(), "foo", "Document Fragment Text node was retreived from .text()."); 
+ 
+ var $newLineTest = jQuery("<div>test<br/>testy</div>").appendTo("#moretests"); 
+ $newLineTest.find("br").replaceWith("\n"); 
+ equal($newLineTest.text(), "test\ntesty", "text() does not remove new lines (#11153)"); 
}); 

test("text(undefined)", function() { 
diff --git a/version.txt b/version.txt 
index 0a182f2..0330b0e 100644 
--- a/version.txt 
+++ b/version.txt 
@@ -1 +1 @@ 
-1.7.2 
\ No newline at end of file 
+1.7.3pre 
\ No newline at end of file 

我試過下列組合模式,但不能完全正確地得到它。這是迄今爲止我來最接近...

re.compile(r'(diff.*?[^\rdiff])', flags=re.S|re.M) 

但這產生

['diff ', 'diff ', 'diff ', 'diff '] 

我怎麼會匹配所有部分在這個差異?

回答

1

該做的:

r=re.compile(r'^(diff.*?)(?=^diff|\Z)', re.M | re.S) 
for m in re.findall(r, s): 
    print '====' 
    print m 
+0

是的,這對我來說非常合適,謝謝。 – Kevin

0

你爲什麼使用正則表達式?如果迭代線並開始一個新的部分,當一行開始diff

list_of_diffs = [] 
temp_diff = '' 
for line in patch: 
    if line.startswith('diff'): 
     list_of_diffs.append(temp_diff) 
     temp_diff = '' 
    else: temp_diff.append(line) 

聲明,上面的代碼只應該被認爲是說明性的僞代碼,並且預計不會實際運行。

正則表達式是一個錘子,但你的問題不是釘子。

1

你並不需要使用正則表達式,只是拆分文件:

diff_file = open('diff.txt', 'r') 
diff_str = diff_file.read() 
diff_split = ['diff --git%s' % x for x in diff_str.split('diff --git') \ 
       if x.strip()] 
print diff_split 
0

就拆對,再接一個字diff任何換行符:

result = re.split(r"\n(?=diff\b)", subject) 

儘管爲了安全起見,您可能應該嘗試匹配\r\r\n以及:

result = re.split(r"(?:\r\n|[\r\n])(?=diff\b)", subject) 
相關問題