如何使用Python中的BeautifulSoup删除空标签?
BeautifulSoup是一个Python库,可以从HTML和XML文件中提取数据。使用BeautifulSoup,我们还可以删除HTML或XML文档中存在的空标签,并将给定的数据进一步转换为易于阅读的文件。
首先,我们将使用以下命令在本地环境中安装BeautifulSoup库:
pip install beautifulsoup4
阅读更多:Python 教程
示例
# 导入BeautifulSoup库
from bs4 import BeautifulSoup
# 获取HTML文档
html_object = """
<p>Python is an interpreted, high-level and general-purpose
programming language. Python's design
philosophy emphasizes code readability with its notable use of
significant indentation.</p>
"""
# 根据给定的HTML文档创建soup对象
soup = BeautifulSoup(html_object, "lxml")
# 迭代每行文档并提取数据
for x in soup.find_all():
if len(x.get_text(strip=True)) == 0:
x.extract()
print(soup)
输出
运行上面的代码将生成输出并通过删除其中的空标签将给定的HTML文档转换为易于阅读的代码。
<html><body><p>Python is an interpreted, high−level and general−purpose programming
language. Python's design
philosophy emphasizes code readability with its notable use of significant indentation.</p>
</body></html>