python - Understanding format of data in scikit-learn -

i trying work multi-label text classification using scikit-learn in python 3.x. have info in libsvm format loading using load_svmlight_file module. info format this.

314523,165538,76255 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1 10:1 11:1 12:2 13:1 410523,230296,368303,75145 8:1 19:2 22:1 24:1 29:1 63:1 68:1 69:3 76:1 82:1 83:1 84:1

each of these lines corresponds 1 document. first 3 numbers labels, , next entries feature numbers values. each feature corresponds word.

i loading info using script.

from sklearn.datasets import load_svmlight_file  x,y = load_svmlight_file("train.csv", multilabel = true, zero_based = true)

my question is, when see format of info doing example, print (x[0]), output.

(0, 1) 1.0

(0, 2) 1.0

(0, 3) 1.0

(0, 4) 1.0

(0, 5) 1.0

(0, 6) 1.0

(0, 7) 1.0

(0, 8) 1.0

(0, 9) 1.0

(0, 10) 1.0

(0, 11) 1.0

(0, 12) 2.0

(0, 13) 1.0

i don't understand meaning of format. shouldn't format this.

> 1 2 3 4 5 6 7 8 9 10 11 12 13 > 1 1 1 1 1 1 1 1 1 1 1 2 1

i new scikit. appreciate help in regard.

this has nil multilabel classification per se. feature matrix x load_svmlight_file scipy csr matrix, explained in docs, , print in rather unfortunate format:

>>> scipy.sparse import csr_matrix >>> x = csr_matrix([[0, 0, 1], [2, 3, 0]]) >>> x <2x3 sparse matrix of type '<type 'numpy.int64'>'     3 stored elements in compressed sparse row format> >>> x.toarray() array([[0, 0, 1],        [2, 3, 0]]) >>> print(x)   (0, 2)    1   (1, 0)    2   (1, 1)    3

python numpy machine-learning scipy scikit-learn

Search This Blog

Three

python - Understanding format of data in scikit-learn -

Comments

Post a Comment

Popular posts from this blog

model view controller - MVC Rails Planning -

ruby on rails - Devise Logout Error in RoR -

html - Submenu setup with jquery and effect 'fold' -