python - Callback function not working properly in scrapy -



python - Callback function not working properly in scrapy -

hi new scrapy , trying scrape asp.net site. have identified parameters of form called when form gets posted , have used them in code. though info gets scraped first page info not scraped after though spider indicates other pages have been crawled successfully. stuck trying figure out why not working :s . 'clean_parsed_string' , 'get_parsed_string' own functions used string elements , have been tested on other websites.

def parse(self, response): sel = selector(response) snodes = sel.xpath('//div[@id="hotel_result_hotel_item"]') snode in snodes: hotel_item = hotel_items() hotel_item['name'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//a[@class="hot_name"]/text()')) hotel_item['address'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//span[@class="fontsmalli"]/text()')) hotel_item['stars'] = clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class=""]/table[@class="widthfull"]//div[@class="mbluebold col_hotelinfo_name"]/input/@class')) hotel_item['room1'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[1]/td[1]/p[@class="roomtype"]/span/text()')) hotel_item['room1_price_usd'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[1]/td[5]/p[@class="ratepernight"]/span/text()')) hotel_item['room2'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[2]/td[1]/p[@class="roomtype"]/span/text()')) hotel_item['room2_price_usd'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[2]/td[5]/p[@class="ratepernight"]/span/text()')) hotel_item['room3'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[3]/td[1]/p[@class="roomtype"]/span/text()')) hotel_item['room3_price_usd'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[3]/td[5]/p[@class="ratepernight"]/span/text()')) hotel_item['room4'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[4]/td[1]/p[@class="roomtype"]/span/text()')) hotel_item['room4_price_usd'] = clean_parsed_string(get_parsed_string(snode_restaurant,'div[@class=""]/div[@class="showroom_rates"]/table[@class="widthfull text_left"]/tr[4]/td[5]/p[@class="ratepernight"]/span/text()')) yield hotel_item viewstate = sel.xpath('//input[@name="__viewstate"]/@value').extract()[0] yield formrequest.from_response(response,formdata={'ctl00$scriptmanager1':'ctl00$contentmain$upresultfooter|ctl00$contentmain$lbtnfooternext', 'ctl00_scriptmanager1_hiddenfield':'', '__eventtarget':'ctl00$contentmain$lbtnfooternext', '__eventargument':'', '__lastfocus':'', '__viewstate': viewstate, '__scrollpositionx':'0', '__scrollpositiony':'0', 'ctl00$googlesearch$txtsearch':'', 'ctl00$ddlcurrency$hidcurrencychange':'usd', 'ctl00$contentmain$hdfminprice':'', 'ctl00$contentmain$hdfmaxprice':'', 'ctl00$contentmain$ddlsort':'1', 'ctl00$contentmain$hidmenu':'0', 'ctl00$contentmain$hidsubmenu':'', 'ctl00$contentmain$destinationsearchbox1$arrivaldate':'06/23/2014', 'ctl00$contentmain$destinationsearchbox1$departdate':'06/25/2014', 'ctl00$contentmain$destinationsearchbox1$controlmode':'1', 'ctl00$contentmain$destinationsearchbox1$jsrooms':'0', 'ctl00$contentmain$destinationsearchbox1$jsadults':'0', 'ctl00$contentmain$destinationsearchbox1$jschildren':'0', 'ctl00$contentmain$destinationsearchbox1$searchhotel':'no', 'ctl00$contentmain$destinationsearchbox1$errorcharlengthmessage':'please come in @ to the lowest degree first 2 letters of name looking for.', 'ctl00$contentmain$destinationsearchbox1$texterror':'please come in name of country, city, airport, area, landmark or hotel proceed.', 'ctl00$contentmain$destinationsearchbox1$textsearch1$tmptextdefault':'country, city, airport, area, landmark', 'ctl00$contentmain$destinationsearchbox1$textsearch1$txtsearch':'colombo', 'ctl00$contentmain$destinationsearchbox1$ddldistance':'1', 'ddlcheckinday':'23', 'ddlcheckinmonthyear':'6,2014', 'datepickerarrival':'', 'ddlcheckoutday':'25', 'ddlcheckoutmonthyear':'6,2014', 'ctl00$contentmain$destinationsearchbox1$ddlnights':'2', 'datepickerdepart':'', 'ctl00$contentmain$destinationsearchbox1$ddlroom':'1', 'ctl00$contentmain$destinationsearchbox1$ddladult':'2', 'ctl00$contentmain$destinationsearchbox1$ddlchildren':'0', 'ctl00$contentmain$txthotelname':'', 'ctl00$contentmain$hidhotellist2603':'', 'ctl00$contentmain$hotelfilterstarrating$hiddenfilterstatus':'', 'ctl00$contentmain$hotelfilterfacilities$hiddenfilterstatus':'', 'ctl00$contentmain$hotelfilteraccommodationtype$hiddenfilterstatus':'', 'ctl00$contentmain$hotelfilterarea$hiddenfilterstatus':'', 'ctl00$contentmain$hotelfilterchainandbrand$hiddenfilterstatus':'', #'__asyncpost':'true' }, callback=self.parse,clickdata=none)

it's possible site may homecoming 200 ok status though post headers wrong. seek using scrapy shell , submit formrequest formdata made see site returns.

i suggest using similar avoid having type every header , avoiding possible mistakes:

formdata = {} hid in sel.xpath('//input[@type="hidden" , @value , @name]'): formdata[hid.xpath('@name').extract()[0]] = hid.xpath('@value').extract()[0]

python web-scraping scrapy

Comments

Popular posts from this blog

model view controller - MVC Rails Planning -

ruby on rails - Devise Logout Error in RoR -

html - Submenu setup with jquery and effect 'fold' -