forms - In Mechanize (Ruby), how to login then scrape? -
forms - In Mechanize (Ruby), how to login then scrape? -
this question has reply here:
how fill out login form mechanize in ruby? 1 replymy aim: on ror 3, pdf file site requires login before can download it
my method, using mechanize:
step 1: log in step 2: since i'm logged in, pdf link
thing is, when debug , click on link scraped, i'm redirected login page instead of getting file
there 2 controls did on step 1:
(...) search_results = form.submit puts search_results.body
=> {"succes":true,"url":"/sso/inscription/"} apparently login succeed
puts agent.cookie_jar.jar
=> find info session, si guess cookies saved
any hint did wrong ? (could important: on site, when login "http://elwatan.com/sso/inscription/inscription_payant.php", redirected home page (elwatan.com)
below code:
# step 1, login: agent = mechanize.new page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php") form = page.form_with(:id => 'form-login-page') form.login = "my_mail" form.password = "my_pasword" search_results = form.submit # step 2, pdf: @watan = {} page.parser.xpath('//th/a').each |link| puts @watan[link.text.strip] = link['href'] end
the agent
variable retains session , cookies.
so first login, did, , write agent.get(---your-pdf-link-here--)
.
in illustration code little error: result of submit
in search_results
, go on utilize page
search links?
so in case, guess should (untested of course) :
# step 1, login: agent = mechanize.new agent.pluggable_parser.pdf = mechanize::filesaver page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php") form = page.form_with(:id => 'form-login-page') form.login = "my_mail" form.password = "my_pasword" page = form.submit # step 2, pdf: page.parser.xpath('//th/a').each |link| agent.get link['href'] end
ruby forms screen-scraping mechanize
Comments
Post a Comment