beautifulsoup in python_综合

概述

beautifulsoup用来从html文档或者xml网页中提取数据

常用的方法

find和findall

功能：
- find：用来找到符合条件的第一条记录
- findall：用来找到符合条件的所有记录
- 返回一个类型为bs4的对象

用法：

示例html文档

<!DOCTYPE html>
<html><head>
Geeks For Geeks
</head><body>
<div><p id="vinayak">King</p><p id="vinayak1">Prince</p><p id="vinayak2">Queen</p></div>
<p id="vinayak3">Princess</p></body></html>

find()

# Find example# Import the libraries BeautifulSoup
# and os
from bs4 import BeautifulSoup as bs
import os# Remove the last segment of the path
base=os.path.dirname(os.path.abspath(__file__))# Open the HTML in which you want to
# make changes
html=open(os.path.join(base, 'gfg.html'))# Parse HTML file in Beautiful Soup
soup=bs(html, 'html.parser')# Obtain the text from the widget after
# finding it
find_example=soup.find("p", {"id":"vinayak"}).get_text()# Printing the text obtained received
# in previous step
print(find_example)

findall

# find_all example# Import the libraries BeautifulSoup
# and os
from bs4 import BeautifulSoup as bs
import os# Remove the last segment of the path
base=os.path.dirname(os.path.abspath(__file__))# Open the HTML in which you want to
# make changes
html=open(os.path.join(base, 'gfg.html'))# Parse HTML file in Beautiful Soup
soup=bs(html, 'html.parser')# Construct a loop to find all the
# p tags
for word in soup.find_all('p'):# Obtain the text from the received# tagsfind_all_example=word.get_text()# Print the text obtained received# in previous stepprint(find_all_example)