python - Understanding format of data in scikit-learn -
python - Understanding format of data in scikit-learn -
i trying work multi-label text classification using scikit-learn in python 3.x. have info in libsvm format loading using load_svmlight_file
module. info format this.
each of these lines corresponds 1 document. first 3 numbers labels, , next entries feature numbers values. each feature corresponds word.
i loading info using script.
from sklearn.datasets import load_svmlight_file x,y = load_svmlight_file("train.csv", multilabel = true, zero_based = true)
my question is, when see format of info doing example, print (x[0])
, output.
(0, 1) 1.0
(0, 2) 1.0
(0, 3) 1.0
(0, 4) 1.0
(0, 5) 1.0
(0, 6) 1.0
(0, 7) 1.0
(0, 8) 1.0
(0, 9) 1.0
(0, 10) 1.0
(0, 11) 1.0
(0, 12) 2.0
(0, 13) 1.0
i don't understand meaning of format. shouldn't format this.
> 1 2 3 4 5 6 7 8 9 10 11 12 13 > 1 1 1 1 1 1 1 1 1 1 1 2 1i new scikit. appreciate help in regard.
this has nil multilabel classification per se. feature matrix x
load_svmlight_file
scipy csr matrix, explained in docs, , print in rather unfortunate format:
>>> scipy.sparse import csr_matrix >>> x = csr_matrix([[0, 0, 1], [2, 3, 0]]) >>> x <2x3 sparse matrix of type '<type 'numpy.int64'>' 3 stored elements in compressed sparse row format> >>> x.toarray() array([[0, 0, 1], [2, 3, 0]]) >>> print(x) (0, 2) 1 (1, 0) 2 (1, 1) 3
python numpy machine-learning scipy scikit-learn
Comments
Post a Comment