0
我想用一個更短的字符串替換我的數據框中的長字符串。我有一個我想做的替代品的簡短字典。如何在我的數據框中將各種長字符串替換爲較短的字符串?
import pandas as pd
from StringIO import StringIO
replacement_dict = {
"substring1": "substring1",
"substring2": "substring2",
"a short substring": "substring3",
}
exampledata = StringIO("""id;Long String
1;This is a long substring1 of text that has lots of words
2;This is substring2 and also contains more text than needed
3;This is a long substring1 of text that has lots of words
4;This is substring2 and also contains more text than needed
5;This is substring2 and also contains more text than needed
6;This is substring2 and also contains more text than needed
7;Within this string is a short substring that is unique
8;This is a long substring1 of text that has lots of words
9;Within this string is a short substring that is unique
10;Within this string is a short substring that is unique
""")
df = pd.read_csv(exampledata, sep=";")
print df
for s in replacement_dict.keys():
if df['Long String'].str.contains(s):
df['Long String'] = replacement_dict[df['Long String'].str.contains(s)]
預期的數據幀是這樣的:
id Long String
0 1 substring1
1 2 substring2
2 3 substring1
3 4 substring2
4 5 substring2
5 6 substring2
6 7 substring3
7 8 substring1
8 9 substring3
9 10 substring3
當我運行的代碼,上面,我得到這個錯誤:
Traceback (most recent call last):
File "test.py", line 27, in <module>
if df['Long String'].str.contains(s):
File "h:\Anaconda\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我怎麼能代替各家之長串在我的數據框中有更短的字符串?