概述
beautifulsoup用来从html文档或者xml网页中提取数据
常用的方法
find和findall
- 功能:
- find:用来找到符合条件的第一条记录
- findall:用来找到符合条件的所有记录
- 返回一个类型为bs4的对象
- 用法:
- 示例html文档
-
<!DOCTYPE html> <html><head> Geeks For Geeks </head><body> <div><p id="vinayak">King</p><p id="vinayak1">Prince</p><p id="vinayak2">Queen</p></div> <p id="vinayak3">Princess</p></body></html>
-
- find()
-
# Find example# Import the libraries BeautifulSoup # and os from bs4 import BeautifulSoup as bs import os# Remove the last segment of the path base=os.path.dirname(os.path.abspath(__file__))# Open the HTML in which you want to # make changes html=open(os.path.join(base, 'gfg.html'))# Parse HTML file in Beautiful Soup soup=bs(html, 'html.parser')# Obtain the text from the widget after # finding it find_example=soup.find("p", {"id":"vinayak"}).get_text()# Printing the text obtained received # in previous step print(find_example)
-
- findall
-
# find_all example# Import the libraries BeautifulSoup # and os from bs4 import BeautifulSoup as bs import os# Remove the last segment of the path base=os.path.dirname(os.path.abspath(__file__))# Open the HTML in which you want to # make changes html=open(os.path.join(base, 'gfg.html'))# Parse HTML file in Beautiful Soup soup=bs(html, 'html.parser')# Construct a loop to find all the # p tags for word in soup.find_all('p'):# Obtain the text from the received# tagsfind_all_example=word.get_text()# Print the text obtained received# in previous stepprint(find_all_example)
相关链接
-
- 示例html文档