字符串拆分在單個字符串上工作，但不是在熊貓系列的字符串

我很新的python &熊貓，並有問題。我有一系列需要編輯的45398個字符串。我從一個excel文件導入它們。字符串拆分在單個字符串上工作，但不是在熊貓系列的字符串

import pandas as pd 
import numpy as np 
import xlrd 

file_location = "#mypath/leistungen_2017.xlsx" 
workbook = xlrd.open_workbook(file_location) 
sheet = workbook.sheet_by_index(0)` 

df = pd.read_excel("leistungen_2017.xlsx")

這是前面的幾行，只是舉例。

>>> df 
Leistungserbringer Anzahl Leistung Code Rechnungsnummer 
0 Albert 1 15.0160 Vollständige Spirometrie und Resistanc... 1 8957 
1 Albert 1 15.0200 CO-Diffusion, jede Methode 1 8957 
2 Albert 1 15.0285 Messung ausgeatmetes Stickstoffmonoxid... 1 8957 
3 Albert 1 AMC-30864 Spirometriefilter mit Mundstück 1 8957 
4 Albert 1 5889797 RELVAR ELLIPTA Inh Plv 92mcg/22mcg 30 Dos 1 8957 
5 Albert 1 00.0010 Konsultation, erste 5 Min. (Grundkonsu... 1 8957

在第四列中，在文本前面有一串數字，我想在整個系列中刪除它們。

我周圍的測試單串並工作正常：

>>> str("15.0200 CO-Diffusion, jede Methode".split(' ', 1)[1:]).strip('[]')` 
"'CO-Diffusion, jede Methode'"

我想這適用於整個系列：

for entry in df.Leistung: 
    df.Leistung.replace({entry : str(entry.split(' ', 1)[1:]).strip('[]')}, inplace=True)

爲df.Leistung結果看起來應該像這樣：

0  Vollständige Spirometrie und Resistance (Plet... 
1        CO-Diffusion, jede Methode 
2   Messung ausgeatmetes Stickstoffmonoxid ({eNO}) 
3      Spirometriefilter mit Mundstück 
4    RELVAR ELLIPTA Inh Plv 92mcg/22mcg 30 Dos 
5   Konsultation, erste 5 Min. (Grundkonsultation)

相反，我收到此：

一行給出了這樣的：

45384 'Dos\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'"\\\\\\\\\...

我需要更新舊系列在同一列的新系列。我希望這是可以理解的，並提前感謝您發佈任何幫助。

來源

2017-07-12 Jari Klingler

謝謝@stephenmuss –

你不需要循環熊貓，它都是矢量化的。您之後的替換函數屬於.str.命名空間。所以你需要做的::

df.Leistung.str.replace(r'\d+', '')

來源

2017-07-12 12:41:45 Meitham

謝謝你的提示，完美的作品！還有一個「。」在每個句子的開頭，但是不好的話也刪除。我會贊成你的評論，但我的分數太低了。 –

字符串拆分在單個字符串上工作，但不是在熊貓系列的字符串

回答

相關問題