当前位置: 代码迷 >> ASP.NET >> 正则获取网页指定内容解决方案
  详细解决方案

正则获取网页指定内容解决方案

热度:620   发布时间:2013-02-25 00:00:00.0
正则获取网页指定内容
<Table border>
<FORM NAME="F1" ACTION="" METHOD="POST" TARGET="_top">
<TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>JCRB No.</TD>
<TD ALIGN="left">JCRB0120</FONT></TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Cell Name</TD>
<TD ALIGN="left">YG10007</TD></TR>
<TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Profile</TD>
<TD ALIGN="left"></TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Animal</TD>
<TD ALIGN="left">Chinese hamster</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Species</TD>
<TD ALIGN="left">Cricetulus griseus</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Sex</TD>
<TD ALIGN="left">F</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Age</TD>
<TD ALIGN="left"></TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Tissue</TD>
<TD ALIGN="left">lung</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Case History</TD>
<TD ALIGN="left"></TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Metastasis</TD>
<TD ALIGN="left"></TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Genetics</TD>
<TD ALIGN="left">human monomorphic N-acetyltransferase gene introduced CHL cell line</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Lifespan</TD>
<TD ALIGN="left">infinite</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Morphology</TD>
<TD ALIGN="left">epithelial-like</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Characteristics</TD>
<TD ALIGN="left">G418-resistant (500 ug/ml), pMAMneo co-transfected</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Classification</TD>
<TD ALIGN="left">artificial</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Establisher</TD>
<TD ALIGN="left">Watanabe,M. et al.</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Depositor</TD>
<TD ALIGN="left">Nohmi,T.</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Medium</TD>
<TD ALIGN="left">Eagle's minimal essential medium with 10% calf serum and 500 ug/ml G-418.</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Passage Method</TD>
<TD ALIGN="left">Cells are treated with 0.25% trypsin and 0.02% EDTA.</TD></TR><TR><TD ALIGN="left" BGCOLOR="Gainsboro" NOWRAP>Passage Cell No.</TD>
<TD ALIGN="left"></TD></TR></Table>



获取上段HTML中红色部分内容 注:下面那个内容与可能为空 ,请大神们支招。

------解决方案--------------------------------------------------------

C# code
 Regex re = new Regex(@"(?is)<table\s*border>.*?<TD\s*ALIGN=""left""\s*BGCOLOR=""Gainsboro""\s*NOWRAP>JCRB No.</TD>.*?<TD\s*ALIGN=""left"">([^<]*?)</FONT></TD>.*?<TD\s*ALIGN=""left""\s*BGCOLOR=""Gainsboro""\s*NOWRAP>Classification</TD>.*?<TD ALIGN=""left"">([^<]*?)</TD>", RegexOptions.None);Match ma = re.Match("你要提取的字符串");//ma.Groups[1].Value;    值:JCRB0120//ma.Groups[2].Value;    值:artificial
  相关解决方案