python - Adding words to scikit-learn's CountVectorizer's stop list -
python - Adding words to scikit-learn's CountVectorizer's stop list -
scikit-learn's countvectorizer class lets pass string 'english' argument stop_words. want add together things predefined list. can tell me how this?
according source code sklearn.feature_extraction.text, total list (actually frozenset, stop_words) of english_stop_words exposed through __all__. hence if want utilize list plus more items, like:
from sklearn.feature_extraction import text stop_words = text.english_stop_words.union(my_additional_stop_words) (where my_additional_stop_words sequence of strings) , utilize result stop_words argument. input countvectorizer.__init__ parsed _check_stop_list, pass new frozenset straight through.
python scikit-learn stop-words
Comments
Post a Comment