做为技术流,解决不了问题,睡不好觉都!
http://storelocator.officedepot.com/index.html?form=getlist_search&clientkey=101
这个网页里的
Store # 101
LAFAYETTE, LA 70503
怎么无法看到,也提取不来!经过一天搜索发现这个问题是Ajax异步数据传输抓取!也是个难题啦!继续开贴寻找高手!
ajax
异步数据
抓取
------解决方案--------------------
那个信息是通过接口
http://storelocator.officedepot.com/ajax?&xml_request=%3Crequest%3E%3Cappkey%3EAC2AD3C2-C08F-11E1-8600-DCAD4D48D7F4%3C%2Fappkey%3E%3Cgeoip%3E1%3C%2Fgeoip%3E%3Cformdata%20id%3D%22getlist%22%3E%3Cobjectname%3ELocator%3A%3AStore%3C%2Fobjectname%3E%3Cwhere%3E%3Cclientkey%3E%3Ceq%3E101%3C%2Feq%3E%3C%2Fclientkey%3E%3C%2Fwhere%3E%3C%2Fformdata%3E%3C%2Frequest%3E
获取到的,参数为上面红色的部分,参数值为XML结构的
xml_request <request><appkey>AC2AD3C2-C08F-11E1-8600-DCAD4D48D7F4</appkey><geoip>1</geoip><formdata id="getlist"><objectname>Locator::Store</objectname><where><clientkey><eq>101</eq></clientkey></where></formdata></request>
修改红色部分的xml节点的参数值就可以获取到不同的内容了。返回的内容是XML格式的,解析下xml得到需要的内容就行了
<?xml version="1.0" encoding="UTF-8"?><response code="1"><collection name="poi" count="1" geocoder="GoogleMaps" code="620"><poi>
<name>OFFICEDEPOT</name>
<address1>2627 OKEECHOBEE BLVD</address1>
<address2>WESTWARD SHOPPING CENTER</address2>
<adtileimage>http://www.officedepot.com/images/us/od/tiles/052310_180x132_cpd.gif</adtileimage>
<adtileurl>http://www.officedepot.com/a/design-print-and-ship/?cm_re=StoreLoc-_-MINI-_-CPDMINI</adtileurl>
<city>WEST PALM BEACH</city>
<clientkey>102</clientkey>
<country>US</country>
<dist>179</dist>
<expanded_furn>y</expanded_furn>
<fax>(561) 640-4359</fax>
<fname>EDDIE</fname>
<fri>08:00-09:00</fri>
<icon>default</icon>
<ink_refill></ink_refill>
<latitude>26.70718</latitude>
<lname>ANDERSON</lname>
<longitude>-80.09327</longitude>
<mon>08:00-09:00</mon>
<newstore></newstore>
<notaryservice></notaryservice>
<nowdocs>x</nowdocs>
<phone>(561) 687-2600</phone>
<photoprint></photoprint>
<postalcode>33409</postalcode>
<promo></promo>
<province></province>
<reg>25</reg>
<sat>09:00-09:00</sat>
<selfservews></selfservews>
<shredding>y</shredding>
<state>FL</state>
<sun>10:00-06:00</sun>
<thur>08:00-09:00</thur>
<tues>08:00-09:00</tues>
<uid>930883151</uid>
<usps>y</usps>