python - Fetching Image from URL using BeautifulSoup -



python - Fetching Image from URL using BeautifulSoup -

i trying fetch of import images , not thumbnail or other gifs wikipedia page , using next code. "img" coming length of "0". suggestion on how rectify it.

code :

import urllib import urllib2 bs4 import beautifulsoup import os html = urllib2.urlopen("http://en.wikipedia.org/wiki/main_page") soup = beautifulsoup(html) imgs = soup.findall("div",{"class":"image"})

also if can explain in detail how utilize findall looking @ "source element" in webpage. awesome.

the a tags on page have image class, not div:

>>> img_links = soup.findall("a", {"class":"image"}) >>> img_link in img_links: ... print img_link.img['src'] ... //upload.wikimedia.org/wikipedia/commons/thumb/1/1f/stora_kronan.jpeg/100px-stora_kronan.jpeg //upload.wikimedia.org/wikipedia/commons/thumb/4/4b/christuss%c3%a4ule_8.jpg/77px-christuss%c3%a4ule_8.jpg ...

or, better, utilize a.image > img css selector:

>>> img in soup.select('a.image > img'): ... print img['src'] //upload.wikimedia.org/wikipedia/commons/thumb/1/1f/stora_kronan.jpeg/100px-stora_kronan.jpeg //upload.wikimedia.org/wikipedia/commons/thumb/4/4b/christuss%c3%a4ule_8.jpg/77px-christuss%c3%a4ule_8.jpg ...

upd (downloading images using urllib.urlretrieve):

from urllib import urlretrieve import urlparse bs4 import beautifulsoup import urllib2 url = "http://en.wikipedia.org/wiki/main_page" soup = beautifulsoup(urllib2.urlopen(url)) img in soup.select('a.image > img'): img_url = urlparse.urljoin(url, img['src']) file_name = img['src'].split('/')[-1] urlretrieve(img_url, file_name)

python url web-scraping beautifulsoup urllib

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

c# - Create a Notification Object (Email or Page) At Run Time -- Dependency Injection or Factory -

Set Up Of Common Name Of SSL Certificate To Protect Plesk Panel -