2017-04-18 17 views
2

我想通過獨立排列的數字來分割文本。Python正則表達式按行數分行本身

1 
root -0.307087 17.6356 -28.2214 2.36076 1.44212 -4.54601 
lowerback 15.4094 -0.182495 1.65268 
upperback 1.54579 0.0318172 -0.110122 
thorax -6.9977 -0.0335751 -1.06068 
lowerneck -3.24163 -0.676991 -1.34632 
upperneck -9.28199 -0.818331 1.08102 
head -2.3551 -0.388697 0.578143 
rclavicle 1.74931e-014 -4.77083e-015 
rhumerus -42.2757 19.3184 -90.6312 
rradius 79.2191 
rwrist 2.46902 
rhand -35.8906 32.487 
rfingers 7.12502 
rthumb -9.00425 2.69918 
lclavicle 1.74931e-014 -4.77083e-015 
lhumerus -46.581 -10.5126 91.072 
lradius 108.082 
lwrist 30.7395 
lhand -39.5085 13.512 
lfingers 7.12502 
lthumb -12.4939 43.1185 
rfemur 4.30283 -1.72433 25.7796 
rtibia 82.7602 
rfoot 27.83 -8.73877 
rtoes 20.2614 
lfemur -27.49 -2.09007 -20.1015 
ltibia 38.398 
lfoot -7.19848 -5.78026 
ltoes 5.97973 
2 
root -0.303728 17.5624 -27.7253 2.02549 1.77071 -4.33872 
lowerback 16.0608 -0.380636 1.35189 
upperback 1.68665 -0.267024 -0.0539964 
thorax -7.21419 -0.169571 -0.765959 
lowerneck -2.88855 -0.493739 -1.55908 
upperneck -9.88628 -0.567977 1.15901 
head -2.623 -0.258251 0.642519 
rclavicle -7.65321e-015 -2.38542e-015 
rhumerus -42.619 18.2084 -90.2387 
rradius 76.8375 
rwrist 5.33346 
rhand -37.643 32.4997 
rfingers 7.12502 
rthumb -10.695 2.7919 
lclavicle -7.65321e-015 -2.38542e-015 
lhumerus -43.8177 -11.0502 91.3641 
lradius 108.431 
lwrist 30.2025 
lhand -38.9758 12.3082 
lfingers 7.12502 
lthumb -11.9803 41.9454 
rfemur 1.76685 -3.0026 24.5235 
rtibia 87.0878 
rfoot 27.0955 -9.32294 
rtoes 22.2194 
lfemur -26.5572 -2.78834 -20.4876 
ltibia 40.7855 
lfoot -10.1476 -3.85256 
ltoes 0.48001 
3 
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517 
lowerback 16.9292 -0.51999 1.14183 
upperback 1.81465 -0.483798 -0.143209 
thorax -7.55951 -0.270454 -0.690263 
lowerneck -2.59928 -0.313935 -1.56078 
upperneck -10.5834 -0.320817 1.24057 
head -2.91503 -0.136576 0.671345 
rclavicle -1.54058e-014 -3.97569e-015 
rhumerus -42.9367 16.607 -89.7942 
rradius 74.9122 
rwrist 7.29535 
rhand -38.4744 33.0964 
rfingers 7.12502 
rthumb -11.4968 3.43167 
lclavicle -1.54058e-014 -3.97569e-015 
lhumerus -40.8446 -11.9999 91.445 
lradius 108.671 
lwrist 29.7854 
lhand -38.5919 11.658 
lfingers 7.12502 
lthumb -11.6101 41.3163 
rfemur -0.94671 -4.033 23.2605 
rtibia 91.2781 
rfoot 26.5333 -9.15277 
rtoes 23.1538 
lfemur -25.0499 -3.27418 -20.9658 
ltibia 42.1017 
lfoot -12.067 -2.99804 
ltoes -2.17676 

理想情況下,我希望獲得獨立數字之間的內容,不包括數字。 我已經試過這條規則:

r"[0-9]+(?<=)[\r\n]" 

,我想發現什麼都沒有它之前之後一個新的行號。

這樣做的正確規則是什麼?

+1

有什麼預期的輸出? –

+0

那麼,你需要'1'和'2'之間的所有界限? – Harvey

+0

是的。 1和2,2和3之間的所有行,依此類推。 – terminix00

回答

2

由於多種原因,您的正則表達式嘗試無法工作,例如它會消耗十進制數字,因爲它不是由換行啓動的。此外,這個lookahead是沒有意義的(看起來是空的),你不需要它。

我會用「數字」正則表達式拆分,包括2個換行符之間(帶有可選的回車字符換行,以防萬一之前)

測試:

import re 

text = """rfoot 27.0955 -9.32294 
lfoot -10.1476 -3.85256 
ltoes 0.48001 
3 
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517 
rwrist 7.29535 
5 
rhand -38.4744 33.0964 
lradius 108.671 
lwrist 29.7854""" 


print(re.split(r"\r?\n\d+\r?\n",text)) 

result: ['rfoot 27.0955 -9.32294\nlfoot -10.1476 -3.85256\nltoes 0.48001', 'root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517\nrwrist 7.29535', 'rhand -38.4744 33.0964\nlradius 108.671\nlwrist 29.7854'] 

注意,這個簡單的方法沒有按」處理文本開始或結束時只有一行數字的情況。我們必須通過添加^||$個案來稍微複雜一點,但在這種情況下,我們會留下單個換行符並且還會出現空字段。因此,我們可以應用修正列表解析篩選出「空白」領域(也許可以用純的正則表達式完成的,雖然):

text = """1 
rfoot 27.0955 -9.32294 
lfoot -10.1476 -3.85256 
ltoes 0.48001 
3 
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517 
rwrist 7.29535 
5 
rhand -38.4744 33.0964 
lradius 108.671 
lwrist 29.7854 
4""" 


print([x for x in re.split(r"(^|\r?\n)\d+(\r?\n|$)",text) if x.strip()]) 

結果:

['rfoot 27.0955 -9.32294\nlfoot -10.1476 -3.85256\nltoes 0.48001', 'root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517\nrwrist 7.29535', 'rhand -38.4744 33.0964\nlradius 108.671\nlwrist 29.7854'] 
+0

非常感謝! – terminix00

+1

它不處理數字在開始或結束的情況。我會編輯。 –