如何使用Selenium Python定位元素？

随着网络内容技术的不断变化，经常需要重新设计和重构网页或网站的内容。 Selenium与Python是一个良好的组合，有助于从网页中提取所需的内容。 Selenium是一个免费的开源自动化工具，用于评估多平台的Web应用程序。 Selenium测试脚本可以用各种计算机语言编写，如Java、C#、Python、NodeJS、PHP、Perl等。本Python Selenium文章使用两个不同的示例，给出了使用Selenium定位网页元素的方法。在这两个示例中，使用新闻网站提取内容。

定位要提取的元素-

打开要提取内容的网站。现在按右键并打开检查窗口。突出显示网页中的元素或部分，并从检查窗口查看其HTML设计规格。使用这些规格来定位元素。

示例1：使用Selenium与Python定位具有特定类名的div元素

算法

第1步 下载与Chrome相同版本的Chrome驱动程序。现在将该驱动程序保存在存储Python文件的相同文件夹中。
第2步 使用START_URL =“ https://www.indiatoday.in/science”.Import 用于解析的BeautifulSoup。使用“class”作为“story__grid”以定位div元素。
第3步 -指定网站URL，并启动驱动程序以获取URL。
第4步 -使用BeautifulSoup解析提取的页面。
第5步 -搜索所需类别的div标记。
第6步 -提取内容。通过将提取的内容包含在HTML标记内，将其打印并转换为html表单。
第7步 -编写输出HTML文件。运行程序。打开输出HTML文件并检查结果。

示例

from selenium import webdriver
from bs4 import BeautifulSoup
import time

START_URL =“https://www.indiatoday.in/science”

driver = webdriver.Chrome（“./chromedriver”）
driver.get（START_URL）
time.sleep（10）
def scrape（）：
   temp_l = []
   soup = BeautifulSoup（driver.page_source，“html.parser”）
   for div_tag in soup.find_all（“div”，attrs = {“class”，“story__grid”}）：
      temp_l.append（str（div_tag））
   print（temp_l）

   enclosing_start =“<html><head><link rel = 'stylesheet'”+“href ='styles.css'></head><body>”
   enclosing_end =“</body></ html>”

   with open（'restructuredarticle.html'，'w+'，encoding ='utf-16'）as f：
      f.write（enclosing_start）
      f.write（'\n'+'<p> EXTRACTED CONTENT START </p>'+'\ n'）
      for items in temp_l：
         f.write（'%s'% items）
      f.write（'\ n' + enclosing_end）

print（"文件已成功写入"）
   f.close（）
scrape（）

输出

在命令窗口中运行Python文件-

打开cmd窗口。首先，我们将在cmd窗口中检查输出。然后打开保存的html文件以查看提取的内容。

如何使用Selenium Python定位元素？

示例2：使用Selenium与Python定位具有特定类名的h4元素

步骤1 – 首先根据Chrome版本下载chromedriver。现在将该驱动程序保存在存储Python文件的同一文件夹中。
步骤2 – 使用START_URL =“https://jamiatimes.in/”来import BeautifulSoup解释器进行解析。使用“class”为“entry-title title”来定位h4元素。
步骤3 – 指定网站URL并启动驱动程序以获取URL。
步骤4 – 使用BeautifulSoup解析获取到的页面。
步骤5 – 查找所需类的h4标签。
步骤6 – 提取内容，打印并通过将提取的内容包含在HTML标记中将其转换为html格式。
步骤7 – 编写输出HTML文件。运行程序。打开输出HTML文件并检查结果。

示例

from selenium import webdriver
from bs4 import BeautifulSoup
import time

START_URL= "https://jamiatimes.in/"         
driver = webdriver.Chrome("./chromedriver")
driver.get(START_URL)
time.sleep(10)
def scrape():
   temp_l=[]
   soup = BeautifulSoup(driver.page_source, "html.parser")
   for h4_tag in soup.find_all("h4", attrs={"class", "entry-title title"}):        
      temp_l.append(str(h4_tag))
   enclosing_start= "<html><head><link rel='stylesheet' " +  "href='styles.css'></head> <body>" 
   enclosing_end= "</body></html>"
   with open('restructuredarticle2.html', 'w+',  encoding='utf-16') as f:
      f.write(enclosing_start)
      f.write('\n' + '<p> EXTRACTED CONTENT START </p>'+'\n')
      for items in temp_l: 
         f.write('%s' %items)
      f.write('\n' + enclosing_end)                 
   print("文件成功写入")
   f.close()
   print(temp_l)
scrape()