采集方案>>使用正则和循环采集获取机票信息
使用正则和循环采集获取机票信息
火车采集器支持通配符和正则两种采集方式获取数据.通配符可以解决大部分的采集问题,但在一些特殊数据的处理上,正则更显示出它的优势.现在,我以 http://www.caac-jp.com/ 的机票采集来说明火车采集器正则的使用.
首先,经检测,这个机票信息是使用js控制显示的,拿出我们的 fiddler工具,可以获取到机票的信息如下. http://www.caac-jp.com/flight/formatdata.asp?t=&sc=CKG&ec=SZX&sd=2009-12-17&values=&_= 其中,cs和ec是关于地区的信息,sd是日期.我们可以使用火车采集器的源码查看工具得到实际内容为
我们查看具体的代码,是这样的.我们的目标是循环的将航空公司,航班,价格等循环采集下来。并保存为单个记录。
- <ul class="ul2" id="prifltlistul0" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:3U8781<br>日期:2009-12-17">
- <li class="l1 pubFlights_3U">四川航空公司</li>
- <li class="l2">3U8781</li>
- <li class="l3">07:30</li>
- <li class="l9">09:20</li>
- <li class="l4"><a href="#">320</a></li>
- <li class="l7">50/50</li>
- <li class="l8">7.5折(M)</li>
- <li class="l5" title="M9">
- ¥960
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist0').style.display=='none'?$('otherfltlist0').style.display='block':$('otherfltlist0').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('960', '50', '50', '四川航空公司', '3U8781', 'CKG', 'SZX', '2009-12-17', '07:30', '09:20', '7.5折(M)', '320', 'M', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist0" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul0'))" onMouseOut="hidefltlist($('prifltlistul0'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">8折(H)</li>
- <li class="l5" title="H7">
- <div style="font-size:13px">¥1020</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1020', '50', '50', '四川航空公司', '3U8781', 'CKG', 'SZX', '2009-12-17', '07:30', '09:20', '8折(H)', '320', 'H', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">9折(T)</li>
- <li class="l5" title="T7">
- <div style="font-size:13px">¥1150</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1150', '50', '50', '四川航空公司', '3U8781', 'CKG', 'SZX', '2009-12-17', '07:30', '09:20', '9折(T)', '320', 'T', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '四川航空公司', '3U8781', 'CKG', 'SZX', '2009-12-17', '07:30', '09:20', '经济舱(Y)', '320', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F8">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '四川航空公司', '3U8781', 'CKG', 'SZX', '2009-12-17', '07:30', '09:20', '头等舱(F)', '320', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul1" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:CA4369<br>日期:2009-12-17">
- <li class="l1 pubFlights_CA">中国国际航空公司</li>
- <li class="l2">CA4369</li>
- <li class="l3">07:55</li>
- <li class="l9">09:35</li>
- <li class="l4"><a href="#">738</a></li>
- <li class="l7">50/50</li>
- <li class="l8">7.5折(L)</li>
- <li class="l5" title="L9">
- ¥960
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist1').style.display=='none'?$('otherfltlist1').style.display='block':$('otherfltlist1').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('960', '50', '50', '中国国际航空公司', 'CA4369', 'CKG', 'SZX', '2009-12-17', '07:55', '09:35', '7.5折(L)', '738', 'L', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist1" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul1'))" onMouseOut="hidefltlist($('prifltlistul1'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">8.8折(M)</li>
- <li class="l5" title="M9">
- <div style="font-size:13px">¥1130</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1130', '50', '50', '中国国际航空公司', 'CA4369', 'CKG', 'SZX', '2009-12-17', '07:55', '09:35', '8.8折(M)', '738', 'M', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '中国国际航空公司', 'CA4369', 'CKG', 'SZX', '2009-12-17', '07:55', '09:35', '经济舱(Y)', '738', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F9">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '中国国际航空公司', 'CA4369', 'CKG', 'SZX', '2009-12-17', '07:55', '09:35', '头等舱(F)', '738', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul2" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:ZH9826<br>日期:2009-12-17">
- <li class="l1 pubFlights_ZH">深圳航空公司</li>
- <li class="l2">ZH9826</li>
- <li class="l3">11:20</li>
- <li class="l9">13:00</li>
- <li class="l4"><a href="#">320</a></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- ¥1280
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist2').style.display=='none'?$('otherfltlist2').style.display='block':$('otherfltlist2').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '深圳航空公司', 'ZH9826', 'CKG', 'SZX', '2009-12-17', '11:20', '13:00', '经济舱(Y)', '320', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist2" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul2'))" onMouseOut="hidefltlist($('prifltlistul2'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F7">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '深圳航空公司', 'ZH9826', 'CKG', 'SZX', '2009-12-17', '11:20', '13:00', '头等舱(F)', '320', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul3" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:CZ3466<br>日期:2009-12-17">
- <li class="l1 pubFlights_CZ">中国南方航空公司</li>
- <li class="l2">CZ3466</li>
- <li class="l3">11:30</li>
- <li class="l9">13:10</li>
- <li class="l4"><a href="#">320</a></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- ¥1280
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist3').style.display=='none'?$('otherfltlist3').style.display='block':$('otherfltlist3').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '中国南方航空公司', 'CZ3466', 'CKG', 'SZX', '2009-12-17', '11:30', '13:10', '经济舱(Y)', '320', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist3" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul3'))" onMouseOut="hidefltlist($('prifltlistul3'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F7">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '中国南方航空公司', 'CZ3466', 'CKG', 'SZX', '2009-12-17', '11:30', '13:10', '头等舱(F)', '320', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul4" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:3U8783<br>日期:2009-12-17">
- <li class="l1 pubFlights_3U">四川航空公司</li>
- <li class="l2">3U8783</li>
- <li class="l3">13:15</li>
- <li class="l9">14:55</li>
- <li class="l4"><a href="#">320</a></li>
- <li class="l7">50/50</li>
- <li class="l8">7折(G)</li>
- <li class="l5" title="G1">
- ¥900 <img src='/v1/images/piaojz.gif' alt='该售价票量较紧张!'>
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist4').style.display=='none'?$('otherfltlist4').style.display='block':$('otherfltlist4').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('900', '50', '50', '四川航空公司', '3U8783', 'CKG', 'SZX', '2009-12-17', '13:15', '14:55', '7折(G)', '320', 'G', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist4" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul4'))" onMouseOut="hidefltlist($('prifltlistul4'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">8折(H)</li>
- <li class="l5" title="H9">
- <div style="font-size:13px">¥1020</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1020', '50', '50', '四川航空公司', '3U8783', 'CKG', 'SZX', '2009-12-17', '13:15', '14:55', '8折(H)', '320', 'H', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">9折(T)</li>
- <li class="l5" title="T9">
- <div style="font-size:13px">¥1150</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1150', '50', '50', '四川航空公司', '3U8783', 'CKG', 'SZX', '2009-12-17', '13:15', '14:55', '9折(T)', '320', 'T', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '四川航空公司', '3U8783', 'CKG', 'SZX', '2009-12-17', '13:15', '14:55', '经济舱(Y)', '320', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F2">
- <div style="font-size:13px">¥1920<img src='/v1/images/piaojz.gif' alt='该售价票量较紧张!'></div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '四川航空公司', '3U8783', 'CKG', 'SZX', '2009-12-17', '13:15', '14:55', '头等舱(F)', '320', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul5" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:CA4347<br>日期:2009-12-17">
- <li class="l1 pubFlights_CA">中国国际航空公司</li>
- <li class="l2">CA4347</li>
- <li class="l3">14:05</li>
- <li class="l9">15:50</li>
- <li class="l4"><a href="#">738</a></li>
- <li class="l7">50/50</li>
- <li class="l8">8折(K)</li>
- <li class="l5" title="K9">
- ¥1020
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist5').style.display=='none'?$('otherfltlist5').style.display='block':$('otherfltlist5').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1020', '50', '50', '中国国际航空公司', 'CA4347', 'CKG', 'SZX', '2009-12-17', '14:05', '15:50', '8折(K)', '738', 'K', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist5" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul5'))" onMouseOut="hidefltlist($('prifltlistul5'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">8.8折(M)</li>
- <li class="l5" title="M9">
- <div style="font-size:13px">¥1130</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1130', '50', '50', '中国国际航空公司', 'CA4347', 'CKG', 'SZX', '2009-12-17', '14:05', '15:50', '8.8折(M)', '738', 'M', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '中国国际航空公司', 'CA4347', 'CKG', 'SZX', '2009-12-17', '14:05', '15:50', '经济舱(Y)', '738', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F3">
- <div style="font-size:13px">¥1920<img src='/v1/images/piaojz.gif' alt='该售价票量较紧张!'></div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '中国国际航空公司', 'CA4347', 'CKG', 'SZX', '2009-12-17', '14:05', '15:50', '头等舱(F)', '738', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul6" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:ZH9916<br>日期:2009-12-17">
- <li class="l1 pubFlights_ZH">深圳航空公司</li>
- <li class="l2">ZH9916</li>
- <li class="l3">15:20</li>
- <li class="l9">17:10</li>
- <li class="l4"><a href="#">320</a></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- ¥1280
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist6').style.display=='none'?$('otherfltlist6').style.display='block':$('otherfltlist6').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '深圳航空公司', 'ZH9916', 'CKG', 'SZX', '2009-12-17', '15:20', '17:10', '经济舱(Y)', '320', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist6" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul6'))" onMouseOut="hidefltlist($('prifltlistul6'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F4">
- <div style="font-size:13px">¥1920<img src='/v1/images/piaojz.gif' alt='该售价票量较紧张!'></div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '深圳航空公司', 'ZH9916', 'CKG', 'SZX', '2009-12-17', '15:20', '17:10', '头等舱(F)', '320', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul7" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:CZ3456<br>日期:2009-12-17">
- <li class="l1 pubFlights_CZ">中国南方航空公司</li>
- <li class="l2">CZ3456</li>
- <li class="l3">17:10</li>
- <li class="l9">18:40</li>
- <li class="l4"><a href="#">320</a></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- ¥1280
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist7').style.display=='none'?$('otherfltlist7').style.display='block':$('otherfltlist7').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '中国南方航空公司', 'CZ3456', 'CKG', 'SZX', '2009-12-17', '17:10', '18:40', '经济舱(Y)', '320', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist7" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul7'))" onMouseOut="hidefltlist($('prifltlistul7'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F6">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '中国南方航空公司', 'CZ3456', 'CKG', 'SZX', '2009-12-17', '17:10', '18:40', '头等舱(F)', '320', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul8" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:CA4345<br>日期:2009-12-17">
- <li class="l1 pubFlights_CA">中国国际航空公司</li>
- <li class="l2">CA4345</li>
- <li class="l3">19:20</li>
- <li class="l9">21:20</li>
- <li class="l4"><a href="#">73G</a></li>
- <li class="l7">50/50</li>
- <li class="l8">8折(K)</li>
- <li class="l5" title="K9">
- ¥1020
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist8').style.display=='none'?$('otherfltlist8').style.display='block':$('otherfltlist8').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1020', '50', '50', '中国国际航空公司', 'CA4345', 'CKG', 'SZX', '2009-12-17', '19:20', '21:20', '8折(K)', '73G', 'K', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist8" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul8'))" onMouseOut="hidefltlist($('prifltlistul8'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">8.8折(M)</li>
- <li class="l5" title="M9">
- <div style="font-size:13px">¥1130</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1130', '50', '50', '中国国际航空公司', 'CA4345', 'CKG', 'SZX', '2009-12-17', '19:20', '21:20', '8.8折(M)', '73G', 'M', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '中国国际航空公司', 'CA4345', 'CKG', 'SZX', '2009-12-17', '19:20', '21:20', '经济舱(Y)', '73G', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F5">
- <div style="font-size:13px">¥1920<img src='/v1/images/piaojz.gif' alt='该售价票量较紧张!'></div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '中国国际航空公司', 'CA4345', 'CKG', 'SZX', '2009-12-17', '19:20', '21:20', '头等舱(F)', '73G', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul9" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:3U8787<br>日期:2009-12-17">
- <li class="l1 pubFlights_3U">四川航空公司</li>
- <li class="l2">3U8787</li>
- <li class="l3">19:40</li>
- <li class="l9">21:20</li>
- <li class="l4"><a href="#">319</a></li>
- <li class="l7">50/50</li>
- <li class="l8">8折(H)</li>
- <li class="l5" title="H9">
- ¥1020
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist9').style.display=='none'?$('otherfltlist9').style.display='block':$('otherfltlist9').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1020', '50', '50', '四川航空公司', '3U8787', 'CKG', 'SZX', '2009-12-17', '19:40', '21:20', '8折(H)', '319', 'H', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist9" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul9'))" onMouseOut="hidefltlist($('prifltlistul9'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">9折(T)</li>
- <li class="l5" title="T9">
- <div style="font-size:13px">¥1150</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1150', '50', '50', '四川航空公司', '3U8787', 'CKG', 'SZX', '2009-12-17', '19:40', '21:20', '9折(T)', '319', 'T', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '四川航空公司', '3U8787', 'CKG', 'SZX', '2009-12-17', '19:40', '21:20', '经济舱(Y)', '319', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F8">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '四川航空公司', '3U8787', 'CKG', 'SZX', '2009-12-17', '19:40', '21:20', '头等舱(F)', '319', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul10" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:HU7058<br>日期:2009-12-17">
- <li class="l1 pubFlights_HU">海南航空公司</li>
- <li class="l2">HU7058</li>
- <li class="l3">20:00</li>
- <li class="l9">21:45</li>
- <li class="l4"><a href="#">738</a></li>
- <li class="l7">50/50</li>
- <li class="l8">8折(K)</li>
- <li class="l5" title="K9">
- ¥1020
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist10').style.display=='none'?$('otherfltlist10').style.display='block':$('otherfltlist10').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1020', '50', '50', '海南航空公司', 'HU7058', 'CKG', 'SZX', '2009-12-17', '20:00', '21:45', '8折(K)', '738', 'K', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist10" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul10'))" onMouseOut="hidefltlist($('prifltlistul10'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">8.5折(H)</li>
- <li class="l5" title="H9">
- <div style="font-size:13px">¥1090</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1090', '50', '50', '海南航空公司', 'HU7058', 'CKG', 'SZX', '2009-12-17', '20:00', '21:45', '8.5折(H)', '738', 'H', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '海南航空公司', 'HU7058', 'CKG', 'SZX', '2009-12-17', '20:00', '21:45', '经济舱(Y)', '738', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F7">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '海南航空公司', 'HU7058', 'CKG', 'SZX', '2009-12-17', '20:00', '21:45', '头等舱(F)', '738', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
- <ul class="ul2" id="prifltlistul11" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:ZH9856<br>日期:2009-12-17">
- <li class="l1 pubFlights_ZH">深圳航空公司</li>
- <li class="l2">ZH9856</li>
- <li class="l3">20:50</li>
- <li class="l9">22:55</li>
- <li class="l4"><a href="#">320</a></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- ¥1280
- </li>
- <li class="l6">
- <a href="javascript:void(0)" onClick="javascript:$('otherfltlist11').style.display=='none'?$('otherfltlist11').style.display='block':$('otherfltlist11').style.display='none'" class="showallprice">所有价格</a> <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1280', '50', '50', '深圳航空公司', 'ZH9856', 'CKG', 'SZX', '2009-12-17', '20:50', '22:55', '经济舱(Y)', '320', 'Y', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- <div id="otherfltlist11" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul11'))" onMouseOut="hidefltlist($('prifltlistul11'))">
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">头等舱(F)</li>
- <li class="l5" title="F9">
- <div style="font-size:13px">¥1920</div>
- </li>
- <li class="l6">
- <input name="btnOrderRoom" type="button" class="btn" value="预 订" style="font-weight:bold" onClick="javascript:order('1920', '50', '50', '深圳航空公司', 'ZH9856', 'CKG', 'SZX', '2009-12-17', '20:50', '22:55', '头等舱(F)', '320', 'F', '0','1280')" onMouseOver="this.style.cursor='pointer'" onMouseOut="this.style.cursor='default'">
- </li>
- </ul>
- </div>
我们先使用通配符的形式,可以得到结果不是实际的12个.仔细分析了下,原来是有好多空的内容也采集进来了.
比如
- <ul class="ul2">
- <li class="l1 "></li>
- <li class="l2"></li>
- <li class="l3"></li>
- <li class="l9"></li>
- <li class="l4"></li>
- <li class="l7">50/50</li>
- <li class="l8">全价舱(Y)</li>
- <li class="l5" title="Y9">
- <div style="font-size:13px">¥1280</div>
- </li>
经过分析得知,只有一种情况才是我们需要的.我们看一下图,只有12条记录。
我们只要采集这12条记录就可以了。我们以li标签的class来代表要采集的字段,如下图所示.
- <ul class="ul2" id="prifltlistul9" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:3U8787<br>日期:2009-12-17">
- <li class="l1 pubFlights_3U">四川航空公司</li>
- <li class="l2">3U8787</li>
- <li class="l3">19:40</li>
- <li class="l9">21:20</li>
- <li class="l4"><a href="#">319</a></li>
- <li class="l7">50/50</li>
- <li class="l8">8折(H)</li>
- <li class="l5" title="H9">
- ;1020
- </li>
- <li class="l6">
我们发现,上边的内容都很有规律, l1,l2,l3,l4都是有值的,l5后边是有个title,值每个不一样,但是内容里没有div.如果我们使用通配符的方式,<li class="l2">开头</li>结尾,那么将会采集到空的值,在循环时,这个空值将会被做为结果呈现出来,导致采集重复.因为采集器默认的规则转化为正则后,是非贪婪模式的 *? ,所以会有空值.现在我们直接在采集器里获取,不能获取有空值,我们使用 +?.比如采集 l2,我们这样写
- <li class="l2">(?<content>[^<]+?)</li>
l3,l9,l4也可以使用同样的方法获取到值,并且是正确的.当我们把l1,l7,l8也这样写的时候,我们发现出错了,因为不符合条件的记录里这几个地方是有值的.
现在,我们要使用正则,给表达式增加一些条件,以便我们得到需要的结果.
我们发现,l1的前边是有特定值的,下边的是符合条件的
- <ul class="ul2" id="prifltlistul9" onMouseOver="showfltlist(this)" onMouseOut="hidefltlist(this)" values="起飞:重庆江北机场<br>抵达:深圳宝安机场<br>航班:3U8787<br>日期:2009-12-17">
- <li class="l1 pubFlights_3U">四川航空公司</li>
- <li class="l2">3U8787</li>
下边是不符合条件的.
- <div id="otherfltlist9" class="otherfltlist" style="display:none" onMouseOver="showfltlist($('prifltlistul9'))" onMouseOut="hidefltlist($('prifltlistul9'))">
- <ul class="ul2">
- <li class="l1 "></li>
我们可以通过l1前边的那个时间,或是l2不为空来获取l1,我使用前边有时间的正则.如下
- 日期[^>]*?>\s+<li class="l1 pubFlights[^"]+?">(?<content>[^<]*?)</li>
这样是可以得到需要的结果了.同样的,l7,l8前边有个</a></li>是特有的,可以利用这个获取到值.正则如下
- </a></li>\s+<li class="l7">(?<content>[^<]+?)</li>
- </a></li>\s+<li class="l7">[^<]+?</li>\s+<li class="l8">(?<content>[^<]+?)</
最后的是l5,我们就直接利用其后边没有div来获取.正则如下
- <li class="l5" title=[^>]+?>(?<content>[^<]+?)</li>
现在测试一下,可以采集到12条记录了,并且是完全正确的.最后的结果图.
如有朋友不会正则,可以先学习一下再看这个使用方法.规则下载机票.ljob
相关信息
- 关键字:
- 原文链接:http://www.caijibbs.com/show.php?id=121
- 将本文收藏到网摘:
