使用python將CSV文件轉換爲LIBSVM兼容數據文件

我正在做一個使用libsvm的項目，我準備使用lib的數據。如何將CSV文件轉換爲LIBSVM兼容數據？使用python將CSV文件轉換爲LIBSVM兼容數據文件

CSV文件： https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/data/iris.csv

在頻率問題：

如何與其他數據格式轉換爲LIBSVM格式？

這取決於您的數據格式。一個簡單的方法是在libsvm matlab/octave接口中使用libsvmwrite。以UCI機器學習存儲庫中的CSV（逗號分隔值）文件爲例。我們下載SPECTF.train。標籤在第一列。以下步驟以libsvm格式生成文件。

matlab> SPECTF = csvread('SPECTF.train'); % read a csv file 
matlab> labels = SPECTF(:, 1); % labels from the 1st column 
matlab> features = SPECTF(:, 2:end); 
matlab> features_sparse = sparse(features); % features must be in a sparse matrix 
matlab> libsvmwrite('SPECTFlibsvm.train', labels, features_sparse); 
The tranformed data are stored in SPECTFlibsvm.train. 
Alternatively, you can use convert.c to convert CSV format to libsvm format.

，但我不想使用MATLAB，我使用Python。

我發現這個解決方案，以及使用JAVA

誰能推薦一個方法來解決這個問題？

來源

2014-04-19 user3378649

你打算使用'libsvm'可執行文件嗎？或Python綁定？ – emeth

如果使用'libsvm'，則需要將'csv'轉換爲'libsvm'數據。如果使用Python綁定，則需要將'csv'加載到Python。 – emeth

我打算使用libsvm可執行文件。我發現了這個（https://github.com/seamusabshere/vector_embed），如果它有幫助，我現在就明白了。但是我想分解預測變量和目標（這是一列）。這會影響嗎？ – user3378649

您可以使用csv2libsvm.py轉換csv到libsvm data

python csv2libsvm.py iris.csv libsvm.data 4 True

其中4指target index和True意味着csv有一個頭。

最後，你可以得到libsvm.data作爲

0 1:5.1 2:3.5 3:1.4 4:0.2 
0 1:4.9 2:3.0 3:1.4 4:0.2 
0 1:4.7 2:3.2 3:1.3 4:0.2 
0 1:4.6 2:3.1 3:1.5 4:0.2 
...

從iris.csv

150,4,setosa,versicolor,virginica 
5.1,3.5,1.4,0.2,0 
4.9,3.0,1.4,0.2,0 
4.7,3.2,1.3,0.2,0 
4.6,3.1,1.5,0.2,0 
...

來源

2014-04-19 13:53:47 emeth

我共有16個功能，我的第16個功能是類屬性，我沒有標題如何使用上述文件轉換csv2libsvm – nifCody

csv2libsvm.py不Python3工作，也它不支持標籤目標（字符串對象），我有輕微修改它。現在它應該與Python3以及標籤目標一起工作。我對Python很新，所以我的代碼可能不是最佳實踐，但我希望可以幫助某人。

#!/usr/bin/env python 

""" 
Convert CSV file to libsvm format. Works only with numeric variables. 
Put -1 as label index (argv[3]) if there are no labels in your file. 
Expecting no headers. If present, headers can be skipped with argv[4] == 1. 

""" 

import sys 
import csv 
import operator 
from collections import defaultdict 

def construct_line(label, line, labels_dict): 
    new_line = [] 
    if label.isnumeric(): 
     if float(label) == 0.0: 
      label = "0" 
    else: 
     if label in labels_dict: 
      new_line.append(labels_dict.get(label)) 
     else: 
      label_id = str(len(labels_dict)) 
      labels_dict[label] = label_id 
      new_line.append(label_id) 

    for i, item in enumerate(line): 
     if item == '' or float(item) == 0.0: 
      continue 
     elif item=='NaN': 
      item="0.0" 
     new_item = "%s:%s" % (i + 1, item) 
     new_line.append(new_item) 
    new_line = " ".join(new_line) 
    new_line += "\n" 
    return new_line 

# --- 

input_file = sys.argv[1] 
try: 
    output_file = sys.argv[2] 
except IndexError: 
    output_file = input_file+".out" 


try: 
    label_index = int(sys.argv[3]) 
except IndexError: 
    label_index = 0 

try: 
    skip_headers = sys.argv[4] 
except IndexError: 
    skip_headers = 0 

i = open(input_file, 'rt') 
o = open(output_file, 'wb') 

reader = csv.reader(i) 

if skip_headers: 
    headers = reader.__next__() 

labels_dict = {} 
for line in reader: 
    if label_index == -1: 
     label = '1' 
    else: 
     label = line.pop(label_index) 

    new_line = construct_line(label, line, labels_dict) 
    o.write(new_line.encode('utf-8'))

來源

2016-06-29 09:26:04 Memin

使用python將CSV文件轉換爲LIBSVM兼容數據文件

回答

相關問題