在現有數據框中添加多行

嗨，我正在學習數據科學，並試圖從各行各業的公司名單中創建一個大數據公司名單。在現有數據框中添加多行

我有一個大數據公司的行號列表，名爲comp_rows。現在，我試圖根據行號爲過濾後的公司製作一個新的數據框。在這裏，我需要將行添加到現有的數據框中，但出現錯誤。有人可以幫忙嗎？

我的數據框看起來像這樣。

company_url company tag_line product data 
0 https://angel.co/billguard BillGuard The fastest smartest way to track your spendin... BillGuard is a personal finance security app t... New York City · Financial Services · Security ... 
1 https://angel.co/tradesparq Tradesparq The world's largest social network for global ... Tradesparq is Alibaba.com meets LinkedIn. Trad... Shanghai · B2B · Marketplaces · Big Data · Soc... 
2 https://angel.co/sidewalk Sidewalk Hoovers (D&B) for the social era Sidewalk helps companies close more sales to s... New York City · Lead Generation · Big Data · S... 
3 https://angel.co/pangia Pangia The Internet of Things Platform: Big data mana... We collect and manage data from sensors embedd... San Francisco · SaaS · Clean Technology · Big ... 
4 https://angel.co/thinknum Thinknum Financial Data Analysis Thinknum is a powerful web platform to value c... New York City · Enterprise Software · Financia...

我的代碼是下面：

bigdata_comp = DataFrame(data=None,columns=['company_url','company','tag_line','product','data']) 

for count, item in enumerate(data.iterrows()): 
    for number in comp_rows: 
     if int(count) == int(number): 
      bigdata_comp.append(item)

錯誤：

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-234-1e4ea9bd9faa> in <module>() 
     4  for number in comp_rows: 
     5   if int(count) == int(number): 
----> 6    bigdata_comp.append(item) 
     7 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity) 
    3814   from pandas.tools.merge import concat 
    3815   if isinstance(other, (list, tuple)): 
-> 3816    to_concat = [self] + other 
    3817   else: 
    3818    to_concat = [self, other] 

TypeError: can only concatenate list (not "tuple") to list

來源

2015-05-06 pythonlearner

可能有辦法做到這一點，沒有循環使用索引或布爾索引。請發佈您的期望輸出澄清 –

謝謝！ fixxxer爲我解釋得非常好。 – pythonlearner

似乎您試圖篩選出基於指數（存儲在您的變量稱爲comp_rows）現有的數據幀。您可以通過使用loc做到不使用循環，像圖所示：

In [1161]: df1.head() 
Out[1161]: 
      A   B   C   D 
a 1.935094 -0.160579 -0.173458 0.433267 
b 1.669632 -1.130893 -1.210353 0.822138 
c 0.494622 1.014013 0.215655 1.045139 
d -0.628889 0.223170 -0.616019 -0.264982 
e -0.823133 0.385790 -0.654533 0.582255

我們將獲得與指數「A」，「B」和「C」，該行對所有列：

In [1162]: df1.loc[['a','b','c'],:] 
Out[1162]: 
      A   B   C   D 
a 1.935094 -0.160579 -0.173458 0.433267 
b 1.669632 -1.130893 -1.210353 0.822138 
c 0.494622 1.014013 0.215655 1.045139

你可以閱讀更多關於它here.

關於你的代碼：

1. 你並不需要通過列表進行迭代以查看是否有產品存在於其中：使用in運算符。例如 - 的

In [1199]: 1 in [1,2,3,4,5] 
Out[1199]: True

如此，而不是

for number in comp_rows: 
     if int(count) == int(number):

做到這一點

if number in comp_rows

2. 大熊貓append不就地發生。您必須將結果存儲到另一個變量中。見here。

追加一次一個行就是做你想要什麼緩慢的方式。而是將要添加的每一行保存到列表列表中，製作其數據框並將其附加到目標數據框中。像這樣的東西..

temp = [] 
for count, item in enumerate(df1.loc[['a','b','c'],:].iterrows()): 
    # if count in comp_rows: 
    temp.append(list(item[1])) 

## -- End pasted text -- 

In [1233]: temp 
Out[1233]: 
[[1.9350940285526077, 
    -0.16057932637141861, 
    -0.17345827000000605, 
    0.43326722021644282], 
[1.66963201034217, 
    -1.1308932586268696, 
    -1.2103527446031515, 
    0.82213753819050794], 
[0.49462218161377397, 
    1.0140133740187862, 
    0.2156547595968879, 
    1.0451391564351897]] 

In [1236]: df2 = df1.append(pd.DataFrame(temp, columns=['A','B','C','D'])) 

In [1237]: df2 
Out[1237]: 
      A   B   C   D 
a 1.935094 -0.160579 -0.173458 0.433267 
b 1.669632 -1.130893 -1.210353 0.822138 
c 0.494622 1.014013 0.215655 1.045139 
d -0.628889 0.223170 -0.616019 -0.264982 
e -0.823133 0.385790 -0.654533 0.582255 
f -0.872135 2.938475 -0.099367 -1.472519 
0 1.935094 -0.160579 -0.173458 0.433267 
1 1.669632 -1.130893 -1.210353 0.822138 
2 0.494622 1.014013 0.215655 1.045139

來源

2015-05-06 20:08:07 fixxxer

感謝您的詳細解釋！我學到了很多東西。 – pythonlearner

替換下面的行：

for count, item in enumerate(data.iterrows()):

通過

for count, (index, item) in enumerate(data.iterrows()):

甚至乾脆作爲

for count, item in data.iterrows():

來源

2015-05-06 16:32:28

在現有數據框中添加多行

回答

相關問題