Parsing html table with BeautifulSoup to python dictionary -
Parsing html table with BeautifulSoup to python dictionary -
this html code i'm trying parse beautifulsoup:
<table> <tr> <th width="100">menu1</th> <td> <ul class="classno1" style="margin-bottom:10;"> <li>some data1</li> <li>foo1<a href="/link/to/bar1">bar1</a></li> ... (amount of tags isn't fixed) </ul> </td> </tr> <tr> <th width="100">menu2</th> <td> <ul class="classno1" style="margin-bottom:10;"> <li>some data2</li> <li>foo2<a href="/link/to/bar2">bar2</a></li> <li>foo3<a href="/link/to/bar3">bar3</a></li> <li>some data3</li> ... (amount of tags isn't fixed too) </ul> </td> </tr> </table>
the output dictionary this:
dict = { 'menu1': ['some data1','foo1 bar1'], 'menu2': ['some data2','foo2 bar2','foo3 bar3','some data3'], }
as mentioned in code, amount of <li>
tags not fixed. additionally, there be: menu1 , menu2 menu1 menu2 no menu1 , menu2 (just <table></table>
)
so e.g. looks this:
<table> <tr> <th width="100">menu1</th> <td> <ul class="classno1" style="margin-bottom:10;"> <li>some data1</li> <li>foo1<a href="/link/to/bar1">bar1</a></li> ... (amount of tags isn't fixed) </ul> </td> </tr> </table>
i trying utilize example no success. think it's because of <ul>
tags, can't read proper info table. problem me variable amount of menus
, <li>
tags. question how parse particular table python dictionary? should mention parsed simple info .text
attribute of beautifulsoup handler nice if maintain is.
request = c.get('http://example.com/somepage.html) soup = bs(request.text)
and first table of page, can with:
table = soup.find_all('table')[0]
thank in advance help.
html = """<table> <tr> <th width="100">menu1</th> <td> <ul class="classno1" style="margin-bottom:10;"> <li>some data1</li> <li>foo1<a href="/link/to/bar1">bar1</a></li> </ul> </td> </tr> <tr> <th width="100">menu2</th> <td> <ul class="classno1" style="margin-bottom:10;"> <li>some data2</li> <li>foo2<a href="/link/to/bar2">bar2</a></li> <li>foo3<a href="/link/to/bar3">bar3</a></li> <li>some data3</li> </ul> </td> </tr> </table>""" import beautifulsoup bs soup = bs.beautifulsoup(html) table = soup.findall('table')[0] results = {} th = table.findchildren('th')#,text=['menu1','menu2']) x in th: #print x results_li = [] li = x.nextsibling.nextsibling.findchildren('li') y in li: #print y.next results_li.append(y.next) results[x.next] = results_li print results
.
{ u'menu2': [u'some data2', u'foo2', u'foo3', u'some data3'], u'menu1': [u'some data1', u'foo1'] }
python html dictionary html-parsing beautifulsoup
Comments
Post a Comment