当前位置: 代码迷 >> 综合 >> Python BeautifulSoup,bs4,使用正则化进行查找
  详细解决方案

Python BeautifulSoup,bs4,使用正则化进行查找

热度:78   发布时间:2024-02-12 04:18:38.0

先给出网址内容

<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" id="hehe"><b>The Dormouse's story</b></p>
<p class="story" id="firstpara">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>

正则化搜索

from bs4 import BeautifulSoup
import re
html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" id="hehe"><b>The Dormouse's story</b></p> <p class="story" id="firstpara">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """
soup=BeautifulSoup(html,'lxml')
pid = soup.findAll(href=re.compile("^http:")) #这里也是使用re正则匹配
print(pid)

输出为

[
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
]