ruby - Why does this JSON file get filled with 1747 times the last Hash data? -
ruby - Why does this JSON file get filled with 1747 times the last Hash data? -
i'm using next code generate json file containing category info particular website.
require 'mechanize' @hashes = [] @categories_hash = {} @categories_hash['category'] ||= {} @categories_hash['category']['id'] ||= {} @categories_hash['category']['name'] ||= {} @categories_hash['category']['group'] ||= {} # initialize mechanize object = mechanize.new # begin scraping a.get('http://www.marktplaats.nl/') |page| groups = page.search('//*[(@id = "navigation-categories")]//a') groups.each_with_index |group, index_1| a.get(group[:href]) |page_2| categories = page_2.search('//*[(@id = "category-browser")]//a') categories.each_with_index |category, index_2| @categories_hash['category']['id'] = "#{index_1}_#{index_2}" @categories_hash['category']['name'] = category.text @categories_hash['category']['group'] = group.text @hashes << @categories_hash['category'] # uncomment if want see what's beingness written puts @categories_hash['category'].to_json end end end end file.open("json/magic/#{time.now.strftime '%y%m%d%h%m%s'}_magic_categories.json", 'w') |f| puts '# writing category info json file' f.write(@hashes.to_json) puts "|-----------> done. #{@hashes.length} written." end puts '# finished.' but code returns json file filled lastly category data. total json file take here. sample:
[ { "id":"36_17", "name":"overige diversen", "group":"diversen" }, { "id":"36_17", "name":"overige diversen", "group":"diversen" }, { "id":"36_17", "name":"overige diversen", "group":"diversen" }, {...} ] the question is, what's causing , how can solve it?
the same object, result of @categories_hash['category'], beingness updated each loop.
thus array filled same object 1747 times, , object reflects mutations done on lastly loop when viewed later.
while prepare might utilize @categories_hash[category_name] or similar (i.e. fetch/ensure different object each loop), next avoids problem described , unused/misused hash of 'category' keys.
categories.each_with_index |category, index_2| # creates new hash object item = { id: "#{index_1}_#{index_2}", name: category.text, group: group.text } # adds new (per yield) object @hashes << item end alternatively, more "functional" approach might utilize map, solves problem in same way - creating new [hash] objects. (this expanded include outer loop, it's here taste.)
h = categories.each_with_index.map |category, index_2| { id: "#{index_1}_#{index_2}", name: category.text, group: group.text } end @hashes.concat(h) ruby json hash web-scraping mechanize
Comments
Post a Comment