python - Group data by month and year -
python - Group data by month and year -
i have .json file containing lot of articles, each article formatted this:
{ "source": "....", "title": ".......", "original_time": "ora: 20:03, 06 dec 2006", "datetime": "2006-12-06t20:03:00+00:00", "views": 398, "comments": 1, "content": "..." "id": "13", }
now have sum numbers of views of articles each month , year , plot results...but don't know how because i'm new python...this have done:
import json #from pprint import pprint import csv import time import datetime views = [] time = [] art_timpul = 0 unimedia = 0 total_articles = 0 json_data=open('all.json') info = json.load(json_data) #pprint(data) json_data.close() in data: if i["source"] == 'unimedia': art_unimedia += 1 x = i["views"] views.append(int(x)) y = i["original_time"] time.append(y) if i["source"] == 'timpul': art_timpul += 1 total_articles += 1 myfile = open('output.csv', 'wb') wr = csv.writer(myfile, quoting=csv.quote_all) wr.writerow(views) print time #print views print "articles unimedia", art_unimedia print "articles timpul", art_timpul print "total articles", total_articles
edit: have grouping info month , year, have sum nr of views articles written in month , year...and export them file
not exclusively clear question, i'll assume not have problem reading , writing files, parsing date string , grouping data.
first, parsing date. here can utilize e.g. dateutil.parser.parse
or time.strptime
. dateutil.parser
seems expect date format yours default, we'll utilize instead of configuring format strptime
.
next, grouping: easiest utilize number of dictionaries mapping months or years views. utilize dictionary different sources, instead of 2 variables have now. utilize month or year key dictionary , update value accordingly. create life bit easier, can utilize collections.defaultdict
, don't have check whether key exists.
example grouping month (similar year , source etc. in same loop):
import collections, dateutil.parser views_by_month = collections.defaultdict(int) item in data: views = item["views"] date = dateutil.parser.parse(item["datetime"]) views_by_month[date.month] += views print views_by_month
python
Comments
Post a Comment