2016-02-03 61 views
1

每次狀態更改時,我只需要返回每個「狀態」的第一個。這裏是一個摘錄,例如從這個數據集我只需要行1,2,5,6,8,10,11,12,13,15,18,19,21。Python - 根據列標準從CSV文件中刪除行

Row,Serial Number,Time,Status 
1,1400004,3/10/2014 11:52,GREEN 
2,1400004,3/15/2014 11:45,YELLOW 
3,1400004,3/29/2014 7:59,YELLOW 
4,1400004,4/16/2014 15:59,YELLOW 
5,1400004,5/10/2014 8:18,GREEN 
6,1400004,5/11/2014 15:28,YELLOW 
7,1400004,5/24/2014 7:56,YELLOW 
8,1400004,5/26/2014 7:59,GREEN 
9,1400004,5/28/2014 8:26,GREEN 
10,1400004,6/13/2014 17:29,YELLOW 
11,1400004,6/15/2014 15:12,GREEN 
12,1400004,6/17/2014 8:57,YELLOW 
13,1400007,1/3/2014 11:55,GREEN 
14,1400007,1/18/2014 5:35,GREEN 
15,1400007,1/18/2014 18:32,YELLOW 
16,1400007,1/19/2014 21:50,YELLOW 
17,1400007,1/21/2014 10:56,YELLOW 
18,1400007,1/27/2014 8:15,GREEN 
19,1400007,2/6/2014 9:47,YELLOW 
20,1400007,2/12/2014 12:44,YELLOW 
21,1400007,2/18/2014 12:40,GREEN 
22,1400007,2/24/2014 12:08,YELLOW 

這裏是我的代碼,我很接近,但其關了一下。

import csv 
with open('NEW2.csv', 'rb') as f: 
    csv_input = csv.reader(f) 
    entries = [] 
    for x in csv_input: 
     if x[3] == csv_input.next()[3]: 
      pass 
     else: 
      entries.append(x) 
    print entries 
+0

是那些行製表符分隔? – inspectorG4dget

+0

CSV文件,逗號分隔 –

+1

逗號在哪裏?請給我們顯示真實文件 – inspectorG4dget

回答

0

這應該這樣做。先保留一個標誌並將其更新到最近找到的行。如果當前狀態與以前不一樣,請將其添加到條目中

with open('so_data.txt', 'r') as f: 
     prev= None 
     f.readline() #skip the first line 
     entries = [] 
     for i, line in enumerate(f): 
      curStat = line.split()[-1] 
      if not prev or curStat != prev: 
       entries.append(i+1) 
       #entries.append(line) #for the line instead of the line number 
       prev = curStat 
    print entries 
+0

使用'以​​前的旗幟'的方法,我能夠檢索我需要的東西。 @GarrettR –

0

試試這個關於大小:

import csv 
import itertools 
import operator 

answer = [] 
with open('path/to/file') as infile: 
    for k, group in itertools.groupby(operator.itemgetter(4), csv.reader(infile, delimiter='\t')): 
     answer.append(next(group)) 

for row in answer: print '\t'.join(row) 
0

我會使用「熊貓」來完成此任務。 我們只需要添加另一列:「prev_status」並打印的行式中:previos_status = CURRENT_STATUS ...

import pandas as pd 

def strip(text): 
    try: 
     return text.strip() 
    except AttributeError: 
     return text 

df = pd.read_csv(
     'status.csv', 
     index_col=['row'],     # use "row" column as index 
     parse_dates=['time'],    # parse time as date/time 
     names=['row','serial_number','time','status'], # let's define column names 
     skiprows=1,       # skip header row 
     converters={'status': strip}  # get rid of trailing whitespaces 
) 

# let's create a new column [prev_status] 
# and fill it with the "previos" status 
df['prev_status'] = df.status.shift(1) 

#print(df) 
print(df.ix[(df['status'] != df['prev_status'])]) 

我用「條」轉換器,因爲有尾隨在所提供的CSV空格。因此,如果您的CSV文件中沒有尾隨空格,則不需要「轉換器」參數,您可以刪除「strip」功能。

輸出:

 serial_number    time status prev_status 
row 
1   1400004 2014-03-10 11:52:00 GREEN   NaN 
2   1400004 2014-03-15 11:45:00 YELLOW  GREEN 
5   1400004 2014-05-10 08:18:00 GREEN  YELLOW 
6   1400004 2014-05-11 15:28:00 YELLOW  GREEN 
8   1400004 2014-05-26 07:59:00 GREEN  YELLOW 
10   1400004 2014-06-13 17:29:00 YELLOW  GREEN 
11   1400004 2014-06-15 15:12:00 GREEN  YELLOW 
12   1400004 2014-06-17 08:57:00 YELLOW  GREEN 
13   1400007 2014-01-03 11:55:00 GREEN  YELLOW 
15   1400007 2014-01-18 18:32:00 YELLOW  GREEN 
18   1400007 2014-01-27 08:15:00 GREEN  YELLOW 
19   1400007 2014-02-06 09:47:00 YELLOW  GREEN 
21   1400007 2014-02-18 12:40:00 GREEN  YELLOW 
22   1400007 2014-02-24 12:08:00 YELLOW  GREEN