在熊貓

我感到困惑關於下面的代碼行的語法雙括號之間`[[...]]`和單一捆`[..]`索引的區別：在熊貓

x_values = dataframe[['Brains']]

數據框對象由2列（大腦和身體）

Brains Bodies 
42  34 
32  23

當我打印x_values我得到的是這樣的：

Brains 
0 42 
1 32

我所知道的熊貓文檔就dataframe對象的屬性和方法而言，但是雙括號的語法讓我感到困惑。

來源

2017-07-19 Mike Fellner

考慮一下：

來源DF：

In [79]: df 
Out[79]: 
    Brains Bodies 
0  42  34 
1  32  23

選擇一列 - 結果Pandas.Series：數據幀的

In [80]: df['Brains'] 
Out[80]: 
0 42 
1 32 
Name: Brains, dtype: int64 

In [81]: type(df['Brains']) 
Out[81]: pandas.core.series.Series

選擇子集 - 結果數據幀：

In [82]: df[['Brains']] 
Out[82]: 
    Brains 
0  42 
1  32 

In [83]: type(df[['Brains']]) 
Out[83]: pandas.core.frame.DataFrame

結論：第二種方法允許我們從DataFrame中選擇多個列。第一個只選擇單個列...

演示：

In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef')) 

In [85]: df 
Out[85]: 
      a   b   c   d   e   f 
0 0.065196 0.257422 0.273534 0.831993 0.487693 0.660252 
1 0.641677 0.462979 0.207757 0.597599 0.117029 0.429324 
2 0.345314 0.053551 0.634602 0.143417 0.946373 0.770590 
3 0.860276 0.223166 0.001615 0.212880 0.907163 0.437295 
4 0.670969 0.218909 0.382810 0.275696 0.012626 0.347549 

In [86]: df[['e','a','c']] 
Out[86]: 
      e   a   c 
0 0.487693 0.065196 0.273534 
1 0.117029 0.641677 0.207757 
2 0.946373 0.345314 0.634602 
3 0.907163 0.860276 0.001615 
4 0.012626 0.670969 0.382810

，如果我們指定列表只有一列，我們將得到一個數據幀有一列：

In [87]: df[['e']] 
Out[87]: 
      e 
0 0.487693 
1 0.117029 
2 0.946373 
3 0.907163 
4 0.012626

來源

2017-07-19 21:16:18 MaxU

只是爲了封鎖任何可能的混淆，第一種形式相當於'column ='Brains'; df [column]'，第二個相當於'subset = ['Brains']; DF [子集]'。第一個傳遞一個字符串，第二個傳遞一個列表。並不是使用'[['和']]'做了一種特殊的索引形式，而是傳遞的對象是不同類型的。 – SethMMorton

謝謝，有道理。你知道雙括號是Python語法還是特定於數據幀對象？我試着回想一下數組和對象的Python語法，但是找不到任何東西。 –

@SethMMorton，很好的例子 - 謝謝！ – MaxU

Python中沒有針對[[和]]的特殊語法。而是創建一個列表，然後將該列表作爲參數傳遞給DataFrame索引函數。

根據@ MaxU的回答，如果您將單個字符串傳遞給DataFrame，則表示返回一列的系列。如果傳遞一個字符串列表，則返回包含給定列的DataFrame。

所以，當你做以下

# Print "Brains" column as Series 
print(df['Brains']) 
# Return a DataFrame with only one column called "Brains" 
print(df[['Brains']])

它等同於以下

# Print "Brains" column as Series 
column_to_get = 'Brains' 
print(df[column_to_get]) 
# Return a DataFrame with only one column called "Brains" 
subset_of_columns_to_get = ['Brains'] 
print(df[subset_of_columns_to_get])

在這兩種情況下，數據幀被索引與[]運營商。

Python將[]運算符用於索引和構建列表文字，最終我相信這是您的困惑。 df[['Brains']]中的外部[和]正在執行索引，並且內部正在創建列表。

>>> some_list = ['Brains'] 
>>> some_list_of_lists = [['Brains']] 
>>> ['Brains'] == [['Brains']][0] 
True 
>>> 'Brains' == [['Brains']][0][0] == [['Brains'][0]][0] 
True

什麼我上面說明的是，在任何時候不Python中曾經看到[[和專門解釋。在過去的令人費解的例子（[['Brains'][0]][0]）沒有特別的][運營商或運營商]][ ...什麼情況是

創建一個單元素列表（['Brains']）
該列表的第一個元素是索引（['Brains'][0] =>'Brains'）
即放置到另一列表（[['Brains'][0]] =>['Brains']）
然後該列表中的第一個元素是索引（[['Brains'][0]][0] =>'Brains'）

來源

2017-07-19 21:42:20 SethMMorton

回答

相關問題