從Python字符串中提取十進制數與Python正則表達式

我試過這個使用Python的重新庫。從一個文件中，我得到幾行包含由條（'|'）分隔的元素。我把它們放在一個列表中，我需要的是將數字存入內部以便與它們一起操作。從Python字符串中提取十進制數與Python正則表達式

這將是我想要分割的字符串之一：

>>print(line_input) 
>>[240, 7821, 0, 12, 605, 0, 3]|[1.5, 7881.25, 0, 543, 876, 0, 121]|[237, 761, 0, 61, 7, 605, 605]

和我的意圖是，以形成與每一方括號之間的元素的向量。

我創造了這個正則表達式

>>test_pattern="\|\[(\d*(\.\d+)?), (\d*(\.\d+)?), (\d*(\.\d+)?)]"

但結果有點混亂。特別是，結果是

>>vectors = re.findall(test_pattern, line_input) 

>>print(vectors) 
>>[('240', '', '7821', '', '0', '', '12', '', '605', '', '0', '', '3', ''), ('1.5', '.5', '7881.25', '.25', '0', '', '0', '', '0', '', '0', '', '0', ''), ('23437', '', '76611', '', '0', '', '0', '', '0', '', '605', '', '605', '')]

我不明白白色空間來自何處，也不知道小數部分爲什麼重複。我知道我幾乎可以得到它，至少，我確信它只是一個簡單的細節，但我不明白。

非常感謝您提前。

來源

2017-09-13 VictorHMartin

空格可能來自您的數字正則表達式。 '（\ d *（\。\ d +）？）'匹配一個空字符串。（'\ d *'匹配0或更多數字，'（\。\ d +）？'*可選*匹配'.'後面的一個或多個數字。 ''''） – 0x5453

是的，正如@ 0x5453所說的那樣，這些空格是空的可能的小數。你的'vectors'變量包含所有匹配的組，無論是否爲空。所以當有一個小數時，你會得到外部組的一個匹配（\ d *（\。\ d +）？）'，另一個匹配內部組（\。\ d +）？'。使它們不匹配組。 –

那些空白是空的可能小數。您的vectors變量包含所有捕獲組，無論是否爲空。所以當有小數點時，你會得到外部組(\d*(\.\d+)?)的一個匹配，並且一個用於內部組(\.\d+)?。使非捕獲組內：

(\d+(?:\.\d+)?)

注：我也改成了要求小數點前的數字（如果有的話）。

來源

2017-09-13 20:02:21

你可以試試這個：

import re 
s = "[240, 7821, 0, 12, 605, 0, 3]|[1.5, 7881.25, 0, 543, 876, 0, 121]|[237, 761, 0, 61, 7, 605, 605]" 
data = re.findall("\d+\.*\d+", s)

輸出：

['240', '7821', '12', '605', '1.5', '7881.25', '543', '876', '121', '237', '761', '61', '605', '605']

來源

2017-09-13 19:56:16 Ajax1234

'\ d + \。* \ d +'只匹配至少兩位數字的數字。你可能想要'\ d +（\。\ d +）？' – 0x5453

另一種（如果輸入格式不同，可能不健壯）的方式是將字符串拆分爲'] | ['以獲取列表，然後拆分'，'以獲取值：

from decimal import Decimal 
input_str = '[240, 7821, 0, 12, 605, 0, 3]|[1.5, 7881.25, 0, 543, 876, 0, 121]|[237, 761, 0, 61, 7, 605, 605]' 

# ignore the first and last '[' and ']' chars, then split on list separators 
list_strs = input_str[1:-1].split(']|[') 

# Split on ', ' to get individual decimal values 
int_lists = [[Decimal(i) for i in s.split(', ')] for s in list_strs] 

# int_lists contains a list of lists of decimal values, like the input format 

for l in int_lists: 
    print(', '.join(str(d) for d in l))

結果：

240, 7821, 0, 12, 605, 0, 3 
1.5, 7881.25, 0, 543, 876, 0, 121 
237, 761, 0, 61, 7, 605, 605

來源

2017-09-13 20:27:10 Darthfett

正則表達式有它的地方。但是，使用pyparsing編寫的語法通常更易於編寫 - 並且更易於閱讀。

>>> import pyparsing as pp

這些數字就像是用數字和句點/完全停止字符組成的單詞。它們後面可以跟隨逗號，我們可以簡單地壓制。

>>> number = pp.Word(pp.nums+'.') + pp.Optional(',').suppress()

其中一個列表由一個左括號，這是我們抑制，接着是一個或多個數字（如剛剛定義的），接着是右方括號，我們也抑制，接着任選的酒吧人物，再次受到壓制。（順便說一下，這個欄在某種程度上是多餘的，因爲右括號關閉了列表。）

我們將Group應用於整個構造，以便pyparsing將我們未壓縮的項目組織到單獨的Python列表中。

>>> one_list = pp.Group(pp.Suppress('[') + pp.OneOrMore(number) + pp.Suppress(']') + pp.Suppress(pp.Optional('|')))

整個列表集合只是一個或多個列表。

>>> whole = pp.OneOrMore(one_list)

這裏的輸入，

>>> line_input = '[240, 7821, 0, 12, 605, 0, 3]|[1.5, 7881.25, 0, 543, 876, 0, 121]|[237, 761, 0, 61, 7, 605, 605]'

...這是我們分析到結果r。

>>> r = whole.parseString(line_input)

我們可以顯示結果列表。

>>> r[0] 
(['240', '7821', '0', '12', '605', '0', '3'], {}) 
>>> r[1] 
(['1.5', '7881.25', '0', '543', '876', '0', '121'], {}) 
>>> r[2] 
(['237', '761', '0', '61', '7', '605', '605'], {})

更可能的是，我們希望將數字作爲數字。在這種情況下，我們知道列表中的字符串表示浮點數或整數。

>>> for l in r.asList(): 
...  [int(_) if _.isnumeric() else float(_) for _ in l] 
... 
[240, 7821, 0, 12, 605, 0, 3] 
[1.5, 7881.25, 0, 543, 876, 0, 121] 
[237, 761, 0, 61, 7, 605, 605]

來源

2017-09-13 21:04:25

從Python字符串中提取十進制數與Python正則表達式

回答

相關問題