使用python将CSV文件转换为LIBSVM兼容数据文件

我正在使用libsvm做一个项目，我正在准备我的数据来使用lib。如何将CSV文件转换为LIBSVM兼容数据？

CSV文件： https ： //github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/data/iris.csv

在频率问题中：

如何将其他数据格式转换为LIBSVM格式？

这取决于您的数据格式。一种简单的方法是在libsvm matlab / octave接口中使用libsvmwrite。以UCI机器学习库中的CSV（逗号分隔值）文件为例。我们下载SPECTF.train。标签位于第一列。以下步骤以libsvm格式生成文件。

matlab> SPECTF = csvread('SPECTF.train'); % read a csv file matlab> labels = SPECTF(:, 1); % labels from the 1st column matlab> features = SPECTF(:, 2:end); matlab> features_sparse = sparse(features); % features must be in a sparse matrix matlab> libsvmwrite('SPECTFlibsvm.train', labels, features_sparse); The tranformed data are stored in SPECTFlibsvm.train. Alternatively, you can use convert.c to convert CSV format to libsvm format.

但我不想使用matlab，我使用python。

我使用JAVA也找到了这个解决方案

任何人都可以推荐一种解决这个问题的方法吗？

您可以使用csv2libsvm.py将csv转换为libsvm data

 python csv2libsvm.py iris.csv libsvm.data 4 True

其中4表示target index ， True表示csv表示target index 。

最后，您可以将libsvm.data作为

 0 1:5.1 2:3.5 3:1.4 4:0.2 0 1:4.9 2:3.0 3:1.4 4:0.2 0 1:4.7 2:3.2 3:1.3 4:0.2 0 1:4.6 2:3.1 3:1.5 4:0.2 ...

来自iris.csv

 150,4,setosa,versicolor,virginica 5.1,3.5,1.4,0.2,0 4.9,3.0,1.4,0.2,0 4.7,3.2,1.3,0.2,0 4.6,3.1,1.5,0.2,0 ...

csv2libsvm.py不能与Python3一起使用，而且它也不支持标签目标（字符串目标），我稍微修改了它。现在它应该与Python3以及标签目标一起使用。我是Python的新手，所以我的代码可能不是最佳实践，但我希望可以帮助某人。

 #!/usr/bin/env python """ Convert CSV file to libsvm format. Works only with numeric variables. Put -1 as label index (argv[3]) if there are no labels in your file. Expecting no headers. If present, headers can be skipped with argv[4] == 1. """ import sys import csv import operator from collections import defaultdict def construct_line(label, line, labels_dict): new_line = [] if label.isnumeric(): if float(label) == 0.0: label = "0" else: if label in labels_dict: new_line.append(labels_dict.get(label)) else: label_id = str(len(labels_dict)) labels_dict[label] = label_id new_line.append(label_id) for i, item in enumerate(line): if item == '' or float(item) == 0.0: continue elif item=='NaN': item="0.0" new_item = "%s:%s" % (i + 1, item) new_line.append(new_item) new_line = " ".join(new_line) new_line += "\n" return new_line # --- input_file = sys.argv[1] try: output_file = sys.argv[2] except IndexError: output_file = input_file+".out" try: label_index = int( sys.argv[3] ) except IndexError: label_index = 0 try: skip_headers = sys.argv[4] except IndexError: skip_headers = 0 i = open(input_file, 'rt') o = open(output_file, 'wb') reader = csv.reader(i) if skip_headers: headers = reader.__next__() labels_dict = {} for line in reader: if label_index == -1: label = '1' else: label = line.pop(label_index) new_line = construct_line(label, line, labels_dict) o.write(new_line.encode('utf-8'))

使用python将CSV文件转换为LIBSVM兼容数据文件

在Java中读取CSV文件时跳过第一行

是否有一种简单的方法来输出逐列CSV？

需要将多个columNames映射到Univocity中的单个字段

java中的Java derby数据库批量加载

有没有办法在java中改变csv文件中特定单元格的值？

在Java中读取一行csv文件

如何读取文本文件中字符串数据的特定位置

比较Java中的两个csv文件

如何使用Java有效地读取Hadoop（HDFS）文件中的第一行？

将数据从CSV保存到Realm