Crawler in Groovy (JSoup VS Crawler4j) -
Crawler in Groovy (JSoup VS Crawler4j) -
i wish develop web crawler in groovy(using grails framework , mongodb database) has ability crawl website, creating list of site urls , resource types, content, response times , number of redirects involved.
i debating on jsoup vs crawler4j. have read cannot understand difference between two. can suggest improve 1 above functionality? or totally wrong compare two?
thanks.
crawler4j crawler, jsoup parser. could/should utilize both. crawler4j easy-multithreaded interface urls , pages(content) of site want. after can utilize jsoup in order parse data, amazing (jquery-like) css selectors , it. of course of study have consider dynamic (javascript generated) content. if want content too, have utilize else includes javascript engine (headless browser + parser) htmlunit or webdriver (selenium), execute javascript before parsing content.
jsoup web-crawler crawler4j
Comments
Post a Comment