2017-06-12 71 views
1
d_hsp={"1":"I","2":"II","3":"III","4":"IV","5":"V","6":"VI","7":"VII","8":"VIII", 
     "9":"IX","10":"X","11":"XI","12":"XII","13":"XIII","14":"XIV","15":"XV", 
     "16":"XVI","17":"XVII","18":"XVIII","19":"XIX","20":"XX","21":"XXI", 
     "22":"XXII","23":"XXIII","24":"XXIV","25":"XXV"} 
HSP_OLD['tryl'] = HSP_OLD['tryl'].replace(d_hsp, regex=True) 

HSP_OLD是一個數據幀,trylHSP_OLD一列,這裏是在tryl值的一些例子:轉換小數爲羅馬數字

SAF/HSP: Secondary diagnosis E code 1

SAF/HSP: Secondary diagnosis E code 11

我使用字典取代它,它適用於1-10,但是對於11,它將變成「II」,對於12,它將變成「III」。

+2

標題似乎與代碼相反。 – stark

回答

2

您需要保留項目的順序,並開始搜索最長的子字符串。

您可以在這裏使用OrderDict。要初始化它,請使用元組列表。在初始化時,您可能會在此處將其取消,但您也可以稍後再執行此操作。

import collections 
import pandas as pd 
# My test data  
HSP_OLD = pd.DataFrame({'tryl':['1. Text', '11. New Text', '25. More here']}) 

d_hsp_lst=[("1","I"),("2","II"),("3","III"),("4","IV"),("5","V"),("6","VI"),("7","VII"),("8","VIII"), ("9","IX"),("10","X"),("11","XI"),("12","XII"),("13","XIII"),("14","XIV"),("15","XV"), ("16","XVI"),("17","XVII"),("18","XVIII"),("19","XIX"),("20","XX"),("21","XXI"), ("22","XXII"),("23","XXIII"),("24","XXIV"),("25","XXV")] 
d_hsp = collections.OrderedDict(d_hsp_lst) # Creating the OrderedDict 
d_hsp = collections.OrderedDict(reversed(d_hsp.items())) # Here, reversing 

>>> HSP_OLD['tryl'] = HSP_OLD['tryl'].replace(d_hsp, regex=True) 
>>> HSP_OLD 
      tryl 
0   I. Text 
1 XI. New Text 
2 XXV. More here 
2

對不起,沒注意,你不只是更新領域,但你真正想要更換號碼末,但即使是這樣的話 - 這是好多了你的電話號碼正確地轉換爲羅馬數字比映射每一個可能出現的這種情況(如果數字大於25,你的代碼會發生什麼?)。所以,這裏是做到這一點的一種方法:

ROMAN_MAP = [(1000, 'M'), (900, 'CM'), (500, 'D'), (400, 'CD'), (100, 'C'), (90, 'XC'), 
      (50, 'L'), (40, 'XL'), (10, 'X'), (9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')] 

def romanize(data): 
    if not data or not isinstance(data, str): # we know how to work with strings only 
     return data 
    data = data.rstrip() # remove potential extra whitespace at the end 
    space_pos = data.rfind(" ") # find the last space before the number 
    if space_pos != -1: 
     try: 
      number = int(data[space_pos + 1:]) # get the number at the end 
      roman_number = "" 
      while number > 0: # loop-reduce while converting our number to roman numerals 
       for i, r in ROMAN_MAP: # simple substitution based on the above ROMAN_MAP 
        while number >= i: 
         roman_number += r 
         number -= i 
      return data[:space_pos + 1] + roman_number # put everything back together 
     except (TypeError, ValueError): 
      pass # couldn't extract a number 
    return data 

因此,如果我們創建自己的數據幀:

HSP_OLD['tryl'] = HSP_OLD['tryl'].apply(romanize) 

HSP_OLD = pd.DataFrame({"tryl": ["SAF/HSP: Secondary diagnosis E code 1", 
           None, 
           "SAF/HSP: Secondary diagnosis E code 11", 
           "Something else without a number at the end"]}) 

我們可以NOE容易在整個柱,運用我們的功能

導致:

          tryl 
0  SAF/HSP: Secondary diagnosis E code I 
1          None 
2  SAF/HSP: Secondary diagnosis E code XI 
3 Something else without a number at the end 

當然,您可以根據自己的需要調整romanize()函數來搜索字符串中的任意數字並將其轉換爲羅馬數字 - 這只是如何在字符串末尾快速找到數字的示例。