0
我想從正則表達式產生的結果中創建一個熊貓數據框中的新列。pandas函數中的正則表達式
我期待的結果是:
In[1]: df
Out[1]:
valueProduct valueService totValue
0 $465580.99 $322532.34 $788113.33
我的數據框dtypes是:
df.dtypes
Contracting Office Name object
Contracting Office Region object
PIID object
PIID Agency ID object
Major Program object
Description of Requirement object
Referenced IDV PIID object
Completion Date datetime64[ns]
Prepared By object
Funding Office Name object
Funding Agency ID object
Funding Agency Name object
Funding Office ID object
Effective Date datetime64[ns]
Fiscal Year int64
Ultimate Contract Value float64
Count int64
1行中題爲「要求的說明」一欄有如下的長字符串值(在這一列中的相似字符串值通過數據集):
管理員添加額外的體積和道路工作變化銀滑道監護項目 - ALLEGHENY國家產品的森林VALUE =服務$ 465580.99 VALUE =合同的$ 322532.34總額= $ 788113.33
我想成功地寫一個正則表達式從這個字符串中提取3項,但僅產生新列的美元價值:
VALUE OF PRODUCT = $465580.99
VALUE OF SERVICE = $322532.34
TOTAL VALUE OF CONTRACT = $788113.33
下面的代碼做這個假設在數據幀的字符串進行一個簡單的字符串值數據框之外:
text = "STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33"
pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
getPattern = re.search(pattern, text)
print (getPattern.group())
將產生:
VALUE OF PRODUCT = $465580.99
我可以爲其他兩個項目重複此操作。
現在,感覺我在一個數據幀的工作我試圖做類似如下:
def valProduct(row):
pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
findPattern = re.search(pattern, row['Description of Requirement'])
return findPatter
df['valueProduct'] = df.apply(lambda row: valProduct(row), axis=1)
In[2]: sf[['valueProduct']][:1]
Out[2]: None
這將產生一個新的列,但其空,但應該至少是表明:
VALUE OF PRODUCT = $465580.99
任何幫助,非常感謝!