Python 处理 Word 文档

Python 处理 Word 文档

要读取一个 Word 文档,我们需要使用名为 docx 的模块。首先按照下面的示例安装 docx。然后编写一个程序,使用 docx 模块中的不同函数来通过段落读取整个文件。

我们使用下面的命令将 docx 模块引入我们的环境中。

pip install docx
Python

在下面的示例中,我们通过将每行追加到一个段落中来读取Word文档的内容,并最后打印出所有段落的文本。

import docx

def readtxt(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

print (readtxt('path\Tutorialspoint.docx'))
Python

当我们运行上面的程序时,我们得到以下输出 –

Tutorials Point originated from the idea that there exists a class of readers who respond 
better to online content and prefer to learn new skills at their own pace from the comforts 
of their drawing rooms. 

The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated, 
we worked our way to adding fresh tutorials to our repository which now proudly flaunts 
a wealth of tutorials and allied articles on topics ranging from programming languages 
to web designing to academics and much more.
Python

阅读单个段落

我们可以使用paragraphs属性从Word文档中读取特定的段落。在下面的示例中,我们只读取了Word文档中的第二个段落。

import docx

doc = docx.Document('path\Tutorialspoint.docx')
print len(doc.paragraphs)

print doc.paragraphs[2].text
Python

运行上面的程序,我们得到以下输出 −

The journey commenced with a single tutorial on HTML in 2006 and elated by the response 
it generated, we worked our way to adding fresh tutorials to our repository 
which now proudly flaunts a wealth of tutorials and allied articles on topics 
ranging from programming languages to web designing to academics and much more.
Python

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程

登录

注册