Python – 词干提取算法

Python – 词干提取算法

在自然语言处理领域中,我们会遇到一些情况,例如两个或多个单词有一个公共词根。例如,agree、agreeing 和 agreeable 这三个单词都有相同的词根 agree。任何涉及这些单词的搜索都应将它们视为相同的单词,即词根单词。因此,将所有单词链接到它们的词根单词变得至关重要。NLTK 库有方法可以做到这种链接,并给出显示词根单词的输出。

NLTK 中有三种最常用的词干提取算法。它们给出略有不同的结果。下面的示例展示了所有三种词干提取算法及其结果的使用。

import nltk
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem import SnowballStemmer 

porter_stemmer = PorterStemmer()
lanca_stemmer = LancasterStemmer()
sb_stemmer = SnowballStemmer("english",)

word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns" 
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
print '***PorterStemmer****\n'
for w_port in nltk_tokens:
   print "Actual: %s  || Stem: %s"  % (w_port,porter_stemmer.stem(w_port))

print '\n***LancasterStemmer****\n'    
for w_lanca in nltk_tokens:
      print "Actual: %s  || Stem: %s"  % (w_lanca,lanca_stemmer.stem(w_lanca))
print '\n***SnowballStemmer****\n' 

for w_snow in nltk_tokens:
      print "Actual: %s  || Stem: %s"  % (w_snow,sb_stemmer.stem(w_snow))   

当我们运行上面的程序时,我们会得到以下输出:

***PorterStemmer****

Actual: Aging  || Stem: age
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: famou
Actual: crime  || Stem: crime
Actual: family  || Stem: famili
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transfer
Actual: his  || Stem: hi
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: one
Actual: of  || Stem: of
Actual: his  || Stem: hi
Actual: subalterns  || Stem: subaltern

***LancasterStemmer****

Actual: Aging  || Stem: ag
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: fam
Actual: crime  || Stem: crim
Actual: family  || Stem: famy
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transf
Actual: his  || Stem: his
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: on
Actual: of  || Stem: of
Actual: his  || Stem: his
Actual: subalterns  || Stem: subaltern

***SnowballStemmer****

Actual: Aging  || Stem: age
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: famous
Actual: crime  || Stem: crime
Actual: family  || Stem: famili
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transfer
Actual: his  || Stem: his
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: one
Actual: of  || Stem: of
Actual: his  || Stem: his
Actual: subalterns  || Stem: subaltern

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程