ruby - Why does this JSON file get filled with 1747 times the last Hash data? -



ruby - Why does this JSON file get filled with 1747 times the last Hash data? -

i'm using next code generate json file containing category info particular website.

require 'mechanize' @hashes = [] @categories_hash = {} @categories_hash['category'] ||= {} @categories_hash['category']['id'] ||= {} @categories_hash['category']['name'] ||= {} @categories_hash['category']['group'] ||= {} # initialize mechanize object = mechanize.new # begin scraping a.get('http://www.marktplaats.nl/') |page| groups = page.search('//*[(@id = "navigation-categories")]//a') groups.each_with_index |group, index_1| a.get(group[:href]) |page_2| categories = page_2.search('//*[(@id = "category-browser")]//a') categories.each_with_index |category, index_2| @categories_hash['category']['id'] = "#{index_1}_#{index_2}" @categories_hash['category']['name'] = category.text @categories_hash['category']['group'] = group.text @hashes << @categories_hash['category'] # uncomment if want see what's beingness written puts @categories_hash['category'].to_json end end end end file.open("json/magic/#{time.now.strftime '%y%m%d%h%m%s'}_magic_categories.json", 'w') |f| puts '# writing category info json file' f.write(@hashes.to_json) puts "|-----------> done. #{@hashes.length} written." end puts '# finished.'

but code returns json file filled lastly category data. total json file take here. sample:

[ { "id":"36_17", "name":"overige diversen", "group":"diversen" }, { "id":"36_17", "name":"overige diversen", "group":"diversen" }, { "id":"36_17", "name":"overige diversen", "group":"diversen" }, {...} ]

the question is, what's causing , how can solve it?

the same object, result of @categories_hash['category'], beingness updated each loop.

thus array filled same object 1747 times, , object reflects mutations done on lastly loop when viewed later.

while prepare might utilize @categories_hash[category_name] or similar (i.e. fetch/ensure different object each loop), next avoids problem described , unused/misused hash of 'category' keys.

categories.each_with_index |category, index_2| # creates new hash object item = { id: "#{index_1}_#{index_2}", name: category.text, group: group.text } # adds new (per yield) object @hashes << item end

alternatively, more "functional" approach might utilize map, solves problem in same way - creating new [hash] objects. (this expanded include outer loop, it's here taste.)

h = categories.each_with_index.map |category, index_2| { id: "#{index_1}_#{index_2}", name: category.text, group: group.text } end @hashes.concat(h)

ruby json hash web-scraping mechanize

Comments

Popular posts from this blog

model view controller - MVC Rails Planning -

ruby on rails - Devise Logout Error in RoR -

html - Submenu setup with jquery and effect 'fold' -