web scraping - python : Website Data to txt or xls -
web scraping - python : Website Data to txt or xls -
i not in python, trying info website , info in tables, want info in txt / xls,
i made script when script go website, work until entry come whihc have no data.
webiste : bizearch.com
in entry python script stop: www.bizearch.com/company/russell_metal_products_inc_125558.htm
i using centos, python, beautifulsoup.
my script :
#/usr/bin/env python # bs4 import beautifulsoup import urllib getinfo = ['company name', 'contact person', 'company address', 'postal code', 'telephone number', 'mobile number', 'fax number', 'website', 'business type', 'business role'] flushdata = {} print "company name|contact person|company address|postal code|telephone number|mobile number|fax number|website|business type|business role" page in range(1,900): pagedata = urllib.urlopen("http://www.bizearch.com/company/electrical_equipment~supplies.8-%d.htm" % (page)) html = pagedata.read() parsed_html = beautifulsoup(html) row in parsed_html.body.findall('div', attrs={'class':'ls'}): profileurl = row.find('a').get('href') profileurlhtml = urllib.urlopen(profileurl) profileurlhtml = beautifulsoup(profileurlhtml) finaldata = [] details in profileurlhtml.body.find('div', attrs={'id':'yellowpage'}).findall('tr') : if details.find('th').text in getinfo: flushdata[details.find('th').text] = details.find('td').text flushdataprint = "%s|%s|%s|%s|%s|%s|%s|%s|%s|%s" % (flushdata['company name'], flushdata['contact person'], flushdata['company address'], flushdata['postal code'], flushdata['telephone number'], flushdata['mobile number'], flushdata['fax number'], flushdata['website'], flushdata['business type'], flushdata['business role']) print flushdataprint
i new in website, apologise if miss something.
python web-scraping beautifulsoup
Comments
Post a Comment