某数据库网页结构如下:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<meta http-equiv="refresh" content="60">
<head><title>
</title></head>
<body style="background-color:Black">
<form name="form1" method="post" action="showdataa.aspx?subid=2734" id="form1">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTUwNjI1NTI2OWRkdqjS+lG4RW26XPLuxnp4Q/glwSg=" />
<div style="color:Red; margin-left:20px; margin-right:auto;">
<p align="Left"><font name="realdata" style="font-size:10pt; margin-top:10px;">
<p style=" margin-left:-15px;">2015年9月6日 17:18</p>
<p >农业药械: </p>
<p style=" margin-top:-10px">瞬时流量:175.30 m3/h </p>
<p style=" margin-top:-10px">累计流量:79438 m3 </p>
<br />
<!-- -->
</font></p>
</div>
</form>
</body>
</html>
程序下载到html文件中,名称为1400.html,源代码如上,现在想提取 <p >农业药械: </p> 这一句,怎么编程?
------解决思路----------------------
Imports System.Text.RegularExpressions
Imports System.Text
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim strWeb As String = IO.File.ReadAllText(Application.StartupPath & "\1400.html", Encoding.Default)
MsgBox(strWeb)
Dim re As Regex
re = New Regex("<p>(.*?)</p>", RegexOptions.IgnoreCase)
If re.IsMatch(strWeb) Then
MsgBox(re.Match(strWeb).Groups(1).Value)
End If
re = New Regex("<p style=.*?top.*?>(.*?)</p>", RegexOptions.IgnoreCase)
If re.IsMatch(strWeb) Then
For Each mat As Match In re.Matches(strWeb)
MsgBox(mat.Groups(1).Value)
Next
End If
End Sub
End Class
拿去不谢,百度“正则表达式”