2015-05-06 58 views
1

嗨,我正在學習數據科學,並試圖從各行各業的公司名單中創建一個大數據公司名單。在現有數據框中添加多行

我有一個大數據公司的行號列表,名爲comp_rows。 現在,我試圖根據行號爲過濾後的公司製作一個新的數據框。在這裏,我需要將行添加到現有的數據框中,但出現錯誤。有人可以幫忙嗎?

我的數據框看起來像這樣。

company_url company tag_line product data 
0 https://angel.co/billguard BillGuard The fastest smartest way to track your spendin... BillGuard is a personal finance security app t... New York City · Financial Services · Security ... 
1 https://angel.co/tradesparq Tradesparq The world's largest social network for global ... Tradesparq is Alibaba.com meets LinkedIn. Trad... Shanghai · B2B · Marketplaces · Big Data · Soc... 
2 https://angel.co/sidewalk Sidewalk Hoovers (D&B) for the social era Sidewalk helps companies close more sales to s... New York City · Lead Generation · Big Data · S... 
3 https://angel.co/pangia Pangia The Internet of Things Platform: Big data mana... We collect and manage data from sensors embedd... San Francisco · SaaS · Clean Technology · Big ... 
4 https://angel.co/thinknum Thinknum Financial Data Analysis Thinknum is a powerful web platform to value c... New York City · Enterprise Software · Financia... 

我的代碼是下面:

bigdata_comp = DataFrame(data=None,columns=['company_url','company','tag_line','product','data']) 

for count, item in enumerate(data.iterrows()): 
    for number in comp_rows: 
     if int(count) == int(number): 
      bigdata_comp.append(item) 

錯誤:

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-234-1e4ea9bd9faa> in <module>() 
     4  for number in comp_rows: 
     5   if int(count) == int(number): 
----> 6    bigdata_comp.append(item) 
     7 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity) 
    3814   from pandas.tools.merge import concat 
    3815   if isinstance(other, (list, tuple)): 
-> 3816    to_concat = [self] + other 
    3817   else: 
    3818    to_concat = [self, other] 

TypeError: can only concatenate list (not "tuple") to list 
+0

可能有辦法做到這一點,沒有循環使用索引或布爾索引。請發佈您的期望輸出澄清 –

+0

謝謝! fixxxer爲我解釋得非常好。 – pythonlearner

回答

1

似乎您試圖篩選出基於指數(存儲在您的變量稱爲comp_rows)現有的數據幀。您可以通過使用loc做到不使用循環,像圖所示:

In [1161]: df1.head() 
Out[1161]: 
      A   B   C   D 
a 1.935094 -0.160579 -0.173458 0.433267 
b 1.669632 -1.130893 -1.210353 0.822138 
c 0.494622 1.014013 0.215655 1.045139 
d -0.628889 0.223170 -0.616019 -0.264982 
e -0.823133 0.385790 -0.654533 0.582255 

我們將獲得與指數「A」,「B」和「C」,該行對所有列:

In [1162]: df1.loc[['a','b','c'],:] 
Out[1162]: 
      A   B   C   D 
a 1.935094 -0.160579 -0.173458 0.433267 
b 1.669632 -1.130893 -1.210353 0.822138 
c 0.494622 1.014013 0.215655 1.045139 

你可以閱讀更多關於它here.

關於你的代碼:

1. 你並不需要通過列表進行迭代以查看是否有產品存在於其中: 使用in運算符。例如 - 的

In [1199]: 1 in [1,2,3,4,5] 
Out[1199]: True 

如此,而不是

for number in comp_rows: 
     if int(count) == int(number): 

做到這一點

if number in comp_rows 

2. 大熊貓append不就地發生。您必須將結果存儲到另一個變量中。見here

3.

追加一次一個行就是做你想要什麼緩慢的方式。 而是將要添加的每一行保存到列表列表中,製作其數據框並將其附加到目標數據框中。像這樣的東西..

temp = [] 
for count, item in enumerate(df1.loc[['a','b','c'],:].iterrows()): 
    # if count in comp_rows: 
    temp.append(list(item[1])) 

## -- End pasted text -- 

In [1233]: temp 
Out[1233]: 
[[1.9350940285526077, 
    -0.16057932637141861, 
    -0.17345827000000605, 
    0.43326722021644282], 
[1.66963201034217, 
    -1.1308932586268696, 
    -1.2103527446031515, 
    0.82213753819050794], 
[0.49462218161377397, 
    1.0140133740187862, 
    0.2156547595968879, 
    1.0451391564351897]] 

In [1236]: df2 = df1.append(pd.DataFrame(temp, columns=['A','B','C','D'])) 

In [1237]: df2 
Out[1237]: 
      A   B   C   D 
a 1.935094 -0.160579 -0.173458 0.433267 
b 1.669632 -1.130893 -1.210353 0.822138 
c 0.494622 1.014013 0.215655 1.045139 
d -0.628889 0.223170 -0.616019 -0.264982 
e -0.823133 0.385790 -0.654533 0.582255 
f -0.872135 2.938475 -0.099367 -1.472519 
0 1.935094 -0.160579 -0.173458 0.433267 
1 1.669632 -1.130893 -1.210353 0.822138 
2 0.494622 1.014013 0.215655 1.045139 
+1

感謝您的詳細解釋!我學到了很多東西。 – pythonlearner

0

替換下面的行:

for count, item in enumerate(data.iterrows()): 

通過

for count, (index, item) in enumerate(data.iterrows()): 

甚至乾脆作爲

for count, item in data.iterrows(): 
相關問題