当前位置: 代码迷 >> 综合 >> beautifulsoup in python
  详细解决方案

beautifulsoup in python

热度:91   发布时间:2023-11-26 19:11:58.0

概述

beautifulsoup用来从html文档或者xml网页中提取数据

常用的方法

find和findall

  • 功能:
    • find:用来找到符合条件的第一条记录
    • findall:用来找到符合条件的所有记录
    • 返回一个类型为bs4的对象
  • 用法:
    • 示例html文档
      • <!DOCTYPE html>
        <html><head>
        Geeks For Geeks
        </head><body>
        <div><p id="vinayak">King</p><p id="vinayak1">Prince</p><p id="vinayak2">Queen</p></div>
        <p id="vinayak3">Princess</p></body></html>
        

    • find()
      • # Find example# Import the libraries BeautifulSoup
        # and os
        from bs4 import BeautifulSoup as bs
        import os# Remove the last segment of the path
        base=os.path.dirname(os.path.abspath(__file__))# Open the HTML in which you want to
        # make changes
        html=open(os.path.join(base, 'gfg.html'))# Parse HTML file in Beautiful Soup
        soup=bs(html, 'html.parser')# Obtain the text from the widget after
        # finding it
        find_example=soup.find("p", {"id":"vinayak"}).get_text()# Printing the text obtained received
        # in previous step
        print(find_example)
        

    • findall
      • # find_all example# Import the libraries BeautifulSoup
        # and os
        from bs4 import BeautifulSoup as bs
        import os# Remove the last segment of the path
        base=os.path.dirname(os.path.abspath(__file__))# Open the HTML in which you want to
        # make changes
        html=open(os.path.join(base, 'gfg.html'))# Parse HTML file in Beautiful Soup
        soup=bs(html, 'html.parser')# Construct a loop to find all the
        # p tags
        for word in soup.find_all('p'):# Obtain the text from the received# tagsfind_all_example=word.get_text()# Print the text obtained received# in previous stepprint(find_all_example)
        

        相关链接