2013-01-19 144 views
2

我處理我已經使用解析成大熊貓資產負債表:大熊貓據幀行更改類型

table = xls_file.parse('Consolidated_Balance_Sheet') 
    table.ix[:, 1] 

    0   None 
    1   None 
    2  $ 3,029 
    3   1989 
    5   None 
    6  $ 34,479 

我試圖使用Unicode識別行和剝離$符號和逗號,轉換爲浮動。

for row in table.ix[:, 1]: 
     if isinstance(row, unicode): 
      print type(row), row 
      num = float(row.lstrip('$').replace(',','')) 
      print num 
      row = num 
      print type(row), row 

這將產生以下的輸出:

<type 'unicode'> $ 3,029 
    3029.0 
    <type 'float'> 3029.0 
    <type 'unicode'> $ 34,479 
    34479.0 
    <type 'float'> 34479.0 

但是,值是不變的,當我檢查表

table.ix[2, 1] 
    u'$ 3,029' 

我怎樣才能正確的值更改爲一個浮動?

編輯:感謝這兩個答覆,我可以重現那些沒有問題。但是,當我使用apply函數來處理我的情況時,我得到一個'不可用類型'的錯誤。

In [167]: thead = table.head() 
In [168]: thead 

Out[168]: 
     Consolidated Balance Sheet (USD $) Sep. 30, 2012 Dec. 31, 2011 
    0 In Millions, unless otherwise specified  None None 
    1 Current assets        None None 
    2 Cash and cash equivalents     $ 3,029 $ 2,219 
    3 Marketable securities - current    1989 1461 
    4 Accounts receivable - net     4409 3867 

In [170]: def no_comma_or_dollar(num): 
       if isinstance(num, unicode): 
        return float(num.lstrip('$').replace(',','')) 
       else: 
        return num 

      thead[:, 1] = thead[:, 1].apply(no_comma_or_dollar) 

產生如下:

我不能讓我的身邊,爲什麼,因爲我不改變的鑰匙,只值頭。是否有另一種方法來更改數據框中的值?

EDIT2

In [171]: thead.to_dict() 
Out[171]: {u'Consolidated Balance Sheet (USD $)': {0: u'In Millions, unless otherwise specified', 
    1: u'Current assets', 
    2: u'Cash and cash equivalents', 
    3: u'Marketable securities - current', 
    4: u'Accounts receivable - net'}, 
u'Dec. 31, 2011': {0: None, 1: None, 2: u'$ 2,219', 3: 1461.0, 4: 3867.0}, 
u'Sep. 30, 2012': {0: None, 1: None, 2: u'$ 3,029', 3: 1989.0, 4: 4409.0}} 
+0

你能張貼'thead.to_dict()'所以我們可以(也許)重現該問題? – unutbu

+0

'thead.ix [:, 1] = thead.ix [:, 1] .apply(no_comma_o​​r_dollar)'或'thead.ix [:, 1:] = thead.ix [:, 1:]。applymap(no_comma_o​​r_dollar )'(爲兩個)應該工作。 – DSM

+0

非常感謝,我現在要低下頭,閱讀applymap! – Ralph

回答

3

你只是打印這些不apply他們-ing到數據幀,這裏的一個辦法做到這一點:

創建一個函數做條帶化(如果Unicode的),或者如果離開它已經是一個數:

def no_comma_or_dollar(num): 
    if isinstance(num, unicode): 
     return float(num.lstrip('$').replace(',','')) 
    else: 
     return num 

table[col_name] = table[col_name].apply(no_comma_or_dollar) 

例如:

df = pd.DataFrame([[u'$1,000'], [200.]]) 

In [3]: df[0].apply(no_comma_or_dollar) 
Out[3]: 
0 1000 
1  200 
Name: 0 

更新:

隨着你給的,我會受到誘惑而放棄的no_comma_or_dollarapplymap稍微懶版本thread

def no_comma_or_dollar2(num): 
    try: 
     return float(num.lstrip('$').replace(',','')) 
    except: # if you can't strip/replace/convert just leave it 
     return num 

In [5]: thread.applymap(no_comma_or_dollar2) 
Out[5]: 
     Consolidated Balance Sheet (USD $) Dec. 31, 2011 Sep. 30, 2012 
0 In Millions, unless otherwise specified   NaN   NaN 
1       Current assets   NaN   NaN 
2    Cash and cash equivalents   2219   3029 
3   Marketable securities - current   1461   1989 
4    Accounts receivable - net   3867   4409 
3

如果我理解你的權利,你尋找apply方法:

In [33]: import pandas as pd 

In [34]: table = pd.Series([None, u'$ 3,12', u'$ 4,5']) 

In [35]: table 
Out[35]: 
0  None 
1 $ 3,12 
2  $ 4,5 

In [36]: def f(cell): 
    ....:  if isinstance(cell, unicode): 
    ....:   return float(cell.lstrip('$').replace(',','')) 
    ....:  else: 
    ....:   return cell 
    ....:  

In [37]: table.apply(f) 
Out[37]: 
0 NaN 
1 312 
2  45 

這確實會創建一個新的對象。爲了存儲新的對象,而不是舊的,這樣做:

In [42]: table = table.apply(f) 

In [43]: table 
Out[43]: 
0 NaN 
1 312 
2  45