HTML 如何在Python中将HTML转换为Word文档（docx）

在本文中，我们将介绍如何使用Python将HTML文件转换为Word文档（docx）。Python提供了许多库和工具来处理文档和HTML文件，其中一个非常常用的库是python-docx。

阅读更多：HTML 教程

安装依赖库

在开始之前，我们需要安装python-docx库。可以通过以下命令在终端中安装该库：

pip install python-docx

转换HTML到Word文档

要将HTML文件转换为Word文档，我们首先需要解析HTML并提取其中的文本和格式。然后，我们可以使用python-docx库来创建一个新的Word文档，并将HTML内容添加到文档中。

以下是一个示例代码，展示了如何将HTML文件转换为Word文档：

from docx import Document
from bs4 import BeautifulSoup

# 从HTML文件中读取内容
with open('input.html', 'r') as file:
    html = file.read()

# 使用BeautifulSoup解析HTML
soup = BeautifulSoup(html, 'html.parser')

# 创建一个新的Word文档
doc = Document()

# 提取HTML中的文本和格式，并添加到Word文档中
for tag in soup.find_all(True):
    if tag.name == 'p':
        doc.add_paragraph(tag.get_text())
    if tag.name == 'h1':
        doc.add_heading(tag.get_text(), level=1)
    if tag.name == 'h2':
        doc.add_heading(tag.get_text(), level=2)
    if tag.name == 'h3':
        doc.add_heading(tag.get_text(), level=3)
    if tag.name == 'ul':
        items = tag.find_all('li')
        for item in items:
            doc.add_paragraph(item.get_text(), style='List Bullet')
    if tag.name == 'ol':
        items = tag.find_all('li')
        for item in items:
            doc.add_paragraph(item.get_text(), style='List Number')

# 保存Word文档
doc.save('output.docx')