提取包含某個名稱的列

我正在嘗試使用它來處理大型txt文件中的數據。提取包含某個名稱的列

我有一個超過2000列的txt文件，其中約三分之一的標題包含「Net」字樣。我只想提取這些列並將它們寫入一個新的txt文件。任何關於我如何做到這一點的建議？

我已經搜索了一下，但一直未能找到可以幫助我的東西。如果以前有類似的問題被問及解決，我們表示歉意。

編輯1：謝謝大家！在寫這篇文章的時候，有3位用戶提出瞭解決方案，他們都工作得很好。我真的不認爲人們會回答，所以我沒有檢查一兩天，並且很高興爲此感到驚訝。我非常感動。

編輯2：我添加的圖片，顯示了原來的txt文件的一部分，可以是什麼樣子，在情況下，它會幫助任何人在未來：

Sample from original txt-file

來源

2015-05-04 Rickyboy

你能請附上您的文件的一個小樣本有問題，使問題陳述更清楚一點？ – ZdaR

當然！我已經獲得了幫助，但是我現在包含了一段代碼樣本的小圖片，以防將來幫助任何人 – Rickyboy

這樣做的一種方式，沒有安裝像numpy/pandas這樣的第三方模塊，如下所示。給定一個輸入文件，名爲「input.csv」是這樣的：

A，B，c_net，d，e_net

0,0,1,0,1

0,0,1， 0,1

（去除之間的空行，它們只是格式化這個職位的內容）

下面的代碼你想要做什麼。

import csv 


input_filename = 'input.csv' 
output_filename = 'output.csv' 

# Instantiate a CSV reader, check if you have the appropriate delimiter 
reader = csv.reader(open(input_filename), delimiter=',') 

# Get the first row (assuming this row contains the header) 
input_header = reader.next() 

# Filter out the columns that you want to keep by storing the column 
# index 
columns_to_keep = [] 
for i, name in enumerate(input_header): 
    if 'net' in name: 
     columns_to_keep.append(i) 

# Create a CSV writer to store the columns you want to keep 
writer = csv.writer(open(output_filename, 'w'), delimiter=',') 

# Construct the header of the output file 
output_header = [] 
for column_index in columns_to_keep: 
    output_header.append(input_header[column_index]) 

# Write the header to the output file 
writer.writerow(output_header) 

# Iterate of the remainder of the input file, construct a row 
# with columns you want to keep and write this row to the output file 
for row in reader: 
    new_row = [] 
    for column_index in columns_to_keep: 
     new_row.append(row[column_index]) 
    writer.writerow(new_row)

請注意，沒有錯誤處理。至少應該處理兩個。第一個是檢查輸入文件是否存在（提示：檢查os和os.path模塊提供的功能）。第二個是處理空白行或列數不一致的行。

來源

2015-05-04 12:08:43

哇，非常感謝，很有魅力！非常感動:) – Rickyboy

這可能是做了實例與熊貓，

import pandas as pd 

df = pd.read_csv('path_to_file.txt', sep='\s+') 
print(df.columns) # check that the columns are parsed correctly 
selected_columns = [col for col in df.columns if "net" in col] 
df_filtered = df[selected_columns] 
df_filtered.to_csv('new_file.txt')

當然，因爲我們沒有你的文本文件的結構，你必須適應這種變化的read_csv的參數，使你的情況下，這個工作（參見相應的documentation）。

這將加載內存中的所有文件，然後過濾出不必要的列。如果您的文件太大以至於無法立即將其加載到RAM中，則只能使用usecols參數加載特定列。

來源

2015-05-04 12:05:29 rth

整潔！完美的作品 – Rickyboy

可以使用熊貓過濾功能來選擇基於正則表達式的幾列

data_filtered = data.filter(regex='net')

來源

2015-05-04 16:48:44

不錯！一旦文件被讀取，這條簡單的線就可以很好地提取列。謝謝！ – Rickyboy

提取包含某個名稱的列

回答

相關問題