当前位置: 代码迷 >> C# >> C# 正则表达式 提取网页内容解决思路
  详细解决方案

C# 正则表达式 提取网页内容解决思路

热度:118   发布时间:2016-05-05 03:56:17.0
C# 正则表达式 提取网页内容
html 源码:

</span>
      </div>

      <p class="orange f14 mt10">        浜ょ?锛?/p>
                    <p class="bidc">
            http://www.jmbbs.com/forum.php?mod=viewthread&tid=3326260<br />
http://club.iweihai.cn/read-htm-tid-18259033.html
            </p>
              <div class="ntos clearfix" id="h_74674016">
        <span class="time" title="2015-04-28 17:07:22">2015-04-28</span>
        鍙備笌缂栧彿 #<a class="bidid" href="http://task.zhubajie.com/5554404/74674016.html">74674016</a>
          鏉ヨ嚜锛?a class='likt' href='http://www.zhubajie.com' target='_blank'>鐚?叓鎴掔綉</a>
        <span class="nobrowse">闆囦富鏈?祻瑙?/span>                <a href="javascript:;" act-type="workscomment" act-href="http://task.zhubajie.com/api/worksComment-wid-74674016.html" id="c_74674016">璇勮?</a>
                                <a href="javascript:;" act-type="report" act-href="http://u.zhubajie.com/report/index" act-data="cata=1&wid=74674016" class="report">涓炬姤</a>
            &nbsp;&nbsp;&nbsp;
                        
                
        <div class="desc-mov">
                                                    </div>
              </div>
            <div id="wid_74674016"></div>
      
      
     
    </dd>
  </dl>
  <dl class="js-alert ">
    <dt><a target="_blank" href="http://shop.zhubajie.com/5593955/"><a class="user-card" act-data="uid=5593955&type=" href="http://shop.zhubajie.com/5593955/" target="_blank"><img id="avatar" src="http://avatar.zbjimg.com/005/59/39/200x200_avatar_55.jpg!small" class="touxiangall"  border="0" onerror="this.onerror=null;this.src='http://t4.zbjimg.com/r/p/task/48.gif'" alt="gillian2" /></a></a>
                            
                </dt>
    <dd>
        <div class="works-state "></div>
      <div class="usertitle">
            <a target="_blank" href="http://shop.zhubajie.com/5593955/">gillian2</a>
            
            <span class="titlelinks">
                      <a href="http://shop.zhubajie.com/5593955/evaluation.html"><img src="http://t5.zbjimg.com/r/pic/zbj4.gif" alt="鐚?洓鎴? title="鑳藉姏绛夌骇锛氱尓鍥涙垝锛岃兘鍔涘?硷細4863"  align="absmiddle" /></a>
                 </span>
      </div>

      <p class="orange f14 mt10">        浜ょ?锛?/p>
                    <p class="bidc">
            http://www.hjsq.cn/thread-503945-1-1.html<br />
http://www.xgrb.cn/bbs/read-htm-tid-4607683.html
            </p>
              <div class="ntos clearfix" id="h_74674023">
        <span class="time" title="2015-04-28 17:07:33">2015-04-28</span>
        鍙備笌缂栧彿 #<a class="bidid" href="http://task.zhubajie.com/5554404/74674023.html">74674023</a>
          鏉ヨ嚜锛?a class='likt' href='http://www.zhubajie.com' target='_blank'>鐚?叓鎴掔綉</a>
        <span class="nobrowse">闆囦富鏈?祻瑙?/span>                <a href="javascript:;" act-type="workscomment" act-href="http://task.zhubajie.com/api/worksComment-wid-74674023.html" id="c_74674023">璇勮?</a>
                                <a href="javascript:;" act-type="report" act-href="http://u.zhubajie.com/report/index" act-data="cata=1&wid=74674023" class="report">涓炬姤</a>
            &nbsp;&nbsp;&nbsp;
        <div class="desc-mov">
                                                    </div>
              </div>
            <div id="wid_74674023"></div>
    </dd>
  </dl>
  <dl class="js-alert ">
    <dt><a target="_blank" href="http://shop.zhubajie.com/10184739/"><a class="user-card" act-data="uid=10184739&type=" href="http://shop.zhubajie.com/10184739/" target="_blank"><img id="avatar" src="http://avatar.zbjimg.com/010/18/47/200x200_avatar_39.jpg!small" class="touxiangall"  border="0" onerror="this.onerror=null;this.src='http://t4.zbjimg.com/r/p/task/48.gif'" alt="zbj绮惧噯鎺ㄥ箍" /></a></a>                           
                </dt>
    <dd>
        <div class="works-state "></div>
      <div class="usertitle">
            <a target="_blank" href="http://shop.zhubajie.com/10184739/">zbj绮惧噯鎺ㄥ箍</a>          
            <span class="titlelinks">
                      <a href="http://shop.zhubajie.com/10184739/evaluation.html"><img src="http://t5.zbjimg.com/r/pic/zbj4.gif" alt="鐚?洓鎴? title="鑳藉姏绛夌骇锛氱尓鍥涙垝锛岃兘鍔涘?硷細2459"  align="absmiddle" /></a>         
                 </span>
      </div>

      <p class="orange f14 mt10">        浜ょ?锛?/p>
                    <p class="bidc">
            http://bbs.gxsky.com/thread-11894333-1-1.html<br />
http://www.xq0757.com/read.php?tid=959922
            </p>
              <div class="ntos clearfix" id="h_74674049">
        <span class="time" title="2015-04-28 17:08:16">2015-04-28</span>
        鍙備笌缂栧彿 #<a class="bidid" href="http://task.zhubajie.com/5554404/74674049.html">74674049</a>
          鏉ヨ嚜锛?a class='likt' href='http://www.zhubajie.com' target='_blank'>鐚?叓鎴掔綉</a>
        <span class="nobrowse">闆囦富鏈?祻瑙?/span>                <a href="javascript:;" act-type="workscomment" act-href="http://task.zhubajie.com/api/worksComment-wid-74674049.html" id="c_74674049">璇勮?</a>
                                <a href="javascript:;" act-type="report" act-href="http://u.zhubajie.com/report/index" act-data="cata=1&wid=74674049" class="report">涓炬姤</a>
            &nbsp;&nbsp;&nbsp;                        
                        <div class="desc-mov">
                                                    </div>
              </div>
            <div id="wid_74674049"></div>
    </dd>
  </dl>
  <dl class="js-alert ">
    <dt><a target="_blank" href="http://shop.zhubajie.com/1441277/"><a class="user-card" act-data="uid=1441277&type=" href="http://shop.zhubajie.com/1441277/" target="_blank"><img id="avatar" src="http://avatar.zbjimg.com/001/44/12/200x200_avatar_77.jpg!small" class="touxiangall"  border="0" onerror="this.onerror=null;this.src='http://t4.zbjimg.com/r/p/task/48.gif'" alt="楂樿川閲忚?鍧涙帹骞? /></a></a>
                            
                </dt>
    <dd>
        <div class="works-state "></div>
      <div class="usertitle">
            <a target="_blank" href="ht


我要提取:<p class="bidc"> 与 </p> 之间的内容,如下列代码中的:

                    <p class="bidc">
            http://bbs.gxsky.com/thread-11894333-1-1.html<br />
http://www.xq0757.com/read.php?tid=959922
            </p>

http://bbs.gxsky.com/thread-11894333-1-1.html<br /> 和 http://www.xq0757.com/read.php?tid=959922


我的出毛病的代码【一】:

            Regex Reg = new Regex(@"<p\sclass=""bidc"">([^<]+)</p>", RegexOptions.IgnoreCase);
            MatchCollection matches = Reg.Matches(str);

            foreach (Match m in matches)
            {
                listBox1.Items.Add(m);
                sw.Write(m + "\r\n");
            }
            sw.Close();

这段代码只能取一行网址,两个标签只间如果有两行就什么也取不出来;
我的出毛病的代码【二】:


Regex Reg = new Regex(@"<p\sclass=""bidc"">([^$]+)</p>", RegexOptions.IgnoreCase);
            MatchCollection matches = Reg.Matches(str);

            foreach (Match m in matches)
            {
                listBox1.Items.Add(m);
                sw.Write(m + "\r\n");
            }
            sw.Close();

这段代码只能取一行网址,两个标签只的内容取到了,但是还把后面不需要的也取到了;

求高手指点!
------解决思路----------------------
(?<=<p class="bidc">)[\s\S]*?(?=</p>)