python 3.x - XML Parsing with ElementTree and Requests -
python 3.x - XML Parsing with ElementTree and Requests -
i trying work yahoo weather api, having few issues parsing xml
api responds with. using python 3.4
. here's code working with:
weather_url = 'http://weather.yahooapis.com/forecastrss?w=%s&u=%s' url = weather_url % (zip_code, units) try: rss = parse(requests.get(url, stream=true).raw).getroot() conditions = rss.find('channel/item/{%s}condition' % weather_ns) homecoming { 'current_condition': conditions.get('text'), 'current_temp': conditions.get('temp'), 'title': rss.findtext('channel/title') } except: raise
here's stack trace getting:
traceback (most recent phone call last): file "<input>", line 1, in <module> file "/home/jonathan/pycharmprojects/pyweather/pyweather/pyweather.py", line 42, in yahoo_conditions rss = parse(requests.get(url, stream=true).raw).getroot() file "/usr/lib/python3.4/xml/etree/elementtree.py", line 1187, in parse tree.parse(source, parser) file "/usr/lib/python3.4/xml/etree/elementtree.py", line 598, in parse self._root = parser._parse_whole(source) file "<string>", line none xml.etree.elementtree.parseerror: not well-formed (invalid token): line 1, column 0
the xml.etree.elementtree parse function doesn't raw object returned requests library. looking little bit deeper, raw object resolves to
>>> r = requests.get('http://weather.yahooapis.com/forecastrss?w=2502265', stream=true) >>> r.raw <requests.packages.urllib3.response.httpresponse object @ 0x7f32c24f9e48>
i referenced this solution, it's still leading same issue. why doesn't approach above work? urllib3 response object not supported elementtree.parse function? have read of docs, haven't enlightened me @ all.
the doc list here:
xml.etree.elementtree.parse doc requests.request doc urllib3.response.httpresponse docedit: after more experimentation, still haven't found solution problem outlined above. however, have found workaround. if utilize elementtree's fromstring method on xml content, works fine.
def fetch_xml(url): """ fetch url , parse document's xml. :param url: url xml located at. :return: root element of xml. :raises: :requests.exceptions.requestexception: requests not open url. :xml.etree.elementtree.parseerror: xml.etree.elementtree failed parse xml document. """ homecoming et.fromstring(requests.get(url).content)
i guess downside approach uses more memory. think? i'd communities opinion.
why using streaming requests download rss xml data? want maintain connection open time? weather hardly changes quickly, why not poll service every 5 minutes instead?
below finish code doing poll , parsing using beautifulsoup , requests. short , sweet.
import requests bs4 import beautifulsoup r = requests.get('http://weather.yahooapis.com/forecastrss?w=%s&u=%s' % (2459115, "c")) if r.status_code == 200: soup = beautifulsoup(r.text) print("current condition: ", soup.find("description").string) print("temperature: ", soup.find('yweather:condition')['temp']) print("title: ", soup.find("title").string) else: r.raise_for_status()
output:
current condition: yahoo! weather new york, ny temperature: 28 title: yahoo! weather - new york, ny
there lot more can beautifulsoup. first-class documentation.
xml python-3.x python-requests elementtree
Comments
Post a Comment