2017-08-14 99 views
1

新的但令人興奮的Python,我需要你的建議。我想出了下面的代碼基於nmap的掃描來比較兩個CSV文件:熊貓:如何在csv文件的數據框上添加列名

import pandas as pd 
from pandas import DataFrame 
import os 
file = raw_input('\nEnter the Old CSV file: ') 
file1 = raw_input('\nEnter the New CSV file: ') 
A=set(pd.read_csv(file, index_col=False, header=None)[0]) 
B=set(pd.read_csv(file1, index_col=False, header=None)[0]) 
final=list(A-B) 
df = pd.DataFrame(final, columns=["host"]) 
df.to_csv('DIFF_'+file) 

print "Completed!" 

當我運行它,我得到了以下結果: ,

host 
0,82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
1,82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 

我的問題是如何添加標籤/ enter code here列2,3上的名稱例如:hostanme,端口,端口名稱,狀態等 我試過了: df ['hostname'] = range(1,len(df)+ 1)當我用Excel打開文件時,第一列的主機名與主機一起添加

+0

你想比較所有列,或者僅第一? – jezrael

回答

2

我認爲你需要read_csv與參數sep=','names的第一定義列名:

file = raw_input('\nEnter the Old CSV file: ') 
file1 = raw_input('\nEnter the New CSV file: ') 

cols = ['hostname','port','portname', ...] 
A= pd.read_csv(file, index_col=False, header=None, sep=';', names=cols) 
B= pd.read_csv(file1, index_col=False, header=None, sep=';', names=cols) 

然後使用mergeboolean indexing比較,如果需要比較所有列:

df = pd.merge(A, B, how='outer', indicator=True) 
df = df[df['_merge']=='left_only'].drop('_merge',axis=1) 

df.to_csv('DIFF_'+file) 

print "Completed!" 

樣品

import pandas as pd 
from pandas.compat import StringIO 

temp=u"""82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.74;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.75;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j'] 
A = pd.read_csv(StringIO(temp), sep=";", names=cols) 
print (A) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
1 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
3 open NaN NaN syn-ack NaN 3 NaN 

temp=u"""82.214.228.75;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.77;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j'] 
B = pd.read_csv(StringIO(temp), sep=";", names=cols) 
print (B) 
     hostname       port portname a b  c \ 
0 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
1 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 

df1 = pd.merge(A, B, how='outer', indicator=True) 

print (df1) 

     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
4 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
5 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j  _merge 
0 open NaN NaN syn-ack NaN 3 NaN left_only 
1 open NaN NaN syn-ack NaN 3 NaN  both 
2 open NaN NaN syn-ack NaN 3 NaN left_only 
3 open NaN NaN syn-ack NaN 3 NaN left_only 
4 open NaN NaN syn-ack NaN 3 NaN right_only 
5 open NaN NaN syn-ack NaN 3 NaN right_only 
#only values in A 
df1 = df1[df1['_merge']=='left_only'].drop('_merge',axis=1) 
print (df1) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
3 open NaN NaN syn-ack NaN 3 NaN 
#only values in B 
df1 = pd.merge(A, B, how='outer', indicator=True) 
df11 = df1[df1['_merge']=='right_only'].drop('_merge',axis=1) 
print (df11) 
     hostname       port portname a b  c \ 
4 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
5 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
4 open NaN NaN syn-ack NaN 3 NaN 
5 open NaN NaN syn-ack NaN 3 NaN 
#same values in both dataframes 
df12 = df1[df1['_merge']=='both'].drop('_merge',axis=1) 
print (df12) 
     hostname       port portname a b  c \ 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
1 open NaN NaN syn-ack NaN 3 NaN 

但如果需要只比較第一列hostname使用isin的面具,~boolean indexing反相:

df2 = A[~A['hostname'].isin(B['hostname'])] 
print (df2) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
+0

嘿Jez.Thanks!試試吧,然後回去 –

+0

是的,當然。小通知 - 如果csv也有csv標題,請刪除參數'header = None'和參數名稱' – jezrael

+0

Perfect Jez!像魅力一樣工作!只需添加sep =';'在寫作聲明中:df.to_csv('DIFF_'+ file,sep =';'),我得到了我想要的:)。我正在考慮這個答案,如果你不介意的話,我只是另外一件事。我收到以下內容: host hostname hostname_type protocol port \ 24 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 32 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111 –

1

您可以在定義數據框的位置添加標籤。例如,下面應該工作

df = pd.DataFrame(final, columns=["host"].append([x for x in range(1, len(df) + 1)])) 
+0

謝謝阿米特!將盡力回覆 –

+0

謝謝Amit.This也不錯! –

+0

@IvanMadolev感謝您的反饋 – Amit