我想從我的熊貓數據框中刪除一些行df
。它看起來像這樣,有180行和2745列。我想擺脫那些有curv_typ
的PYC_RT
和YCIF_RT
的行。我也想擺脫geo\time
專欄。我從一個CSV文件中提取這些數據,並必須認識到,curv_typ,maturity,bonds,geo\time
和下面的字符,例如PYC_RT,Y1,GBAAA,EA
都在一列:在Python中分割數據幀列
curv_typ,maturity,bonds,geo\time 2015M06D16 2015M06D15 2015M06D11 \
0 PYC_RT,Y1,GBAAA,EA -0.24 -0.24 -0.24
1 PYC_RT,Y1,GBA_AAA,EA -0.02 -0.03 -0.10
2 PYC_RT,Y10,GBAAA,EA 0.94 0.92 0.99
3 PYC_RT,Y10,GBA_AAA,EA 1.67 1.70 1.60
4 PYC_RT,Y11,GBAAA,EA 1.03 1.01 1.09
我決定嘗試拆分此列,然後下降所產生的各列,但我在代碼df_new = pd.DataFrame(df['curv_typ,maturity,bonds,geo\time'].str.split(',').tolist(), df[1:]).stack()
import os
import urllib2
import gzip
import StringIO
import pandas as pd
baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz"
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read())
#Now have to deal with tsv file
import csv
with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
tsvin = csv.reader(tsvin, delimiter='\t')
writer = csv.writer(csvout)
for data in tsvin:
writer.writerow(data)
csvout = 'C:\Users\Sidney\ECB.csv'
#df = pd.DataFrame.from_csv(csvout)
df = pd.read_csv('C:\Users\Sidney\ECB.csv', delimiter=',', encoding="utf-8-sig")
print df
df_new = pd.DataFrame(df['curv_typ,maturity,bonds,geo\time'].str.split(',').tolist(), df[1:]).stack()
編輯的最後一行得到的錯誤KeyError: 'curv_typ,maturity,bonds,geo\time'
:從reptilicus的答案我用下面的代碼:
#Now have to deal with tsv file
import csv
outFilePath = filename.split('/')[1][:-3] #As in the code above, just put here for reference
csvout = 'C:\Users\Sidney\ECB.tsv'
outfile = open(csvout, "w")
with open(outFilePath, "rb") as f:
for line in f.read():
line.replace(",", "\t")
outfile.write(line)
outfile.close()
df = pd.DataFrame.from_csv("ECB.tsv", sep="\t", index_col=False)
我仍然得到和以前一樣的確切輸出。
謝謝
看起來像你需要以不同的方式讀取它。它看起來像curve_type,成熟度,債券,地理時間應該都有自己的專欄。試試DataFrame.from_csv()也 – reptilicus
@reptilicus謝謝你。但是,當使用'df = pd.DataFrame.from_csv(csvout)'而不是'pd.read_csv'時,我得到相同的錯誤。我失去了如何處理這個問題。 – user131983
哦,我認爲它在地理\時間也許,當你讀它時,可能會搞亂那一列 – reptilicus