python 2.7 - Naive Bayes Classifier load saved picked results differ from train and test immediately -



python 2.7 - Naive Bayes Classifier load saved picked results differ from train and test immediately -

i encountered same problem shown here. solution doesn't seem work me. not sure if help me it. thanks.

from sentimentanalyzer import tweettokenizer sentimentanalyzer import dataset import json import re import collections import nltk.metrics import nltk.classify import pickle tweetstokenizer = tweettokenizer() featurelist = [] tweets = [] dataset = dataset() train_data = dataset.gettraindata() test_data = dataset.gettestdata() def extract_features(tweet): tweet_words = set(tweet) features = {} word in featurelist: features['contains(%s)' % word] = (word in tweet_words) homecoming features trainsets = collections.defaultdict(set) testsets = collections.defaultdict(set) nbclassifier = none train = true if train: ... preprocessing codes above ... # generate training set print 'extracting features...' training_set = nltk.classify.util.apply_features(extract_features, tweets) # train naive bayes classifier print 'training dataset...' nbclassifier = nltk.naivebayesclassifier.train(training_set) print 'saving model...' f = open('naivebayesclassifier.pickle', 'wb') pickle.dump(nbclassifier, f) f.close() else: f = open('naivebayesclassifier.pickle', 'rb') nbclassifier = pickle.load(f) f.close() # test classifier print 'testing model...' i, line in enumerate(test_data): tweetjson = json.loads(line) labelledsentiment = dataset.gettestsentiment(tweetjson['id_str']).encode('utf-8') trainsets[labelledsentiment].add(i) testtweet = tweetjson['text'].encode('utf-8') processedtesttweet = tweetstokenizer.preprocess(testtweet) sentiment = nbclassifier.classify(extract_features(tweetstokenizer.getfeaturevector(processedtesttweet))) testsets[sentiment].add(i) print "testtweet = %s, classified sentiment = %s, labelled sentiment = %s\n" % (testtweet, sentiment, labelledsentiment) # print "testtweet = %s, classified sentiment = %s, labelled sentiment = %s\n" % (testtweet, sentiment, labelledsentiment) print 'positive precision:', nltk.metrics.precision(trainsets['positive'], testsets['positive']) print 'positive recall:', nltk.metrics.recall(trainsets['positive'], testsets['positive']) print 'positive f-measure:', nltk.metrics.f_measure(trainsets['positive'], testsets['positive']) print 'negative precision:', nltk.metrics.precision(trainsets['negative'], testsets['negative']) print 'negative recall:', nltk.metrics.recall(trainsets['negative'], testsets['negative']) print 'negative f-measure:', nltk.metrics.f_measure(trainsets['negative'], testsets['negative']) print 'neutral precision:', nltk.metrics.precision(trainsets['neutral'], testsets['neutral']) print 'neutral recall:', nltk.metrics.recall(trainsets['neutral'], testsets['neutral']) print 'neutral f-measure:', nltk.metrics.f_measure(trainsets['neutral'], testsets['neutral']) print 'done'

the classifier when trained , tested gives different results compared classifier straight loaded without training. not figure out why. thanks.

python-2.7 nltk pickle

Comments

Popular posts from this blog

model view controller - MVC Rails Planning -

ruby on rails - Devise Logout Error in RoR -

html - Submenu setup with jquery and effect 'fold' -