qt - how to load multiple pages one by one in QWebPage -



qt - how to load multiple pages one by one in QWebPage -

i trying crawl news article pages comments. after research found websites utilize iframe it. want "src" of iframe. using qtwebkit in python using pyside. working once. not loading other webpages. using next code:

import sys import pymysql pyside.qtgui import * pyside.qtcore import * pyside.qtwebkit import * pprint import pprint bs4 import beautifulsoup class render(qwebpage): def __init__(self, url): try: self.app = qapplication(sys.argv) except runtimeerror: self.app = qcoreapplication.instance() qwebpage.__init__(self) self.loadfinished.connect(self._loadfinished) self.mainframe().load(qurl(url)) self.app.exec_() def _loadfinished(self, result): self.frame = self.mainframe() self.app.quit() def visit(url): r = render(url) p = r.frame.tohtml() f_url = str(r.frame.url().tostring()) homecoming p def is_comment_url(url): lower_url = url.lower() n = lower_url.find("comment") if n>0: homecoming true else: homecoming false open("urls.txt") f: content = f.read().splitlines() list_of_urls = [] url in content: page = visit(url) soup = beautifulsoup(page) tag in soup.findall('iframe', src=true): link = tag['src'] if is_comment_url(link): print(link) list_of_urls += link pprint(list_of_urls)

but issue works single iteration , gets stuck.

also there way save web page as displayed browser (after executing javascript etc.)

qt pyside web-crawler qwebkit qwebpage

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

django - Access session in user model .save() -

php - .htaccess Multiple Rewrite Rules / Prioritizing -