python - Adding words to scikit-learn's CountVectorizer's stop list -
python - Adding words to scikit-learn's CountVectorizer's stop list -
scikit-learn's countvectorizer class lets pass string 'english' argument stop_words. want add together things predefined list. can tell me how this?
according source code sklearn.feature_extraction.text
, total list (actually frozenset
, stop_words
) of english_stop_words
exposed through __all__
. hence if want utilize list plus more items, like:
from sklearn.feature_extraction import text stop_words = text.english_stop_words.union(my_additional_stop_words)
(where my_additional_stop_words
sequence of strings) , utilize result stop_words
argument. input countvectorizer.__init__
parsed _check_stop_list
, pass new frozenset
straight through.
python scikit-learn stop-words
Comments
Post a Comment