Python – Chunks 和Chinks

Chunking是基于单词的性质将相似的单词组合在一起的过程。在下面的示例中，我们通过定义文法来生成chunk。文法建议像名词和形容词等短语的顺序，这些短语将在创建chunk时遵循。如下图所示，chunk的图形输出。

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), 
("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}" 
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence) 
print(result)
result.draw()

当运行上面的程序时，我们得到以下输出-

Python - Chunks 和Chinks

改变文法，我们得到另一个输出，如下所示。

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
 ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]

grammar = "NP: {?*}" 

chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence) 
print(result)
result.draw()

当运行上面的程序时，我们得到以下输出-

Python - Chunks 和Chinks

Chinking

Chinking是从chunk中移除一系列标记的过程。如果一系列标记出现在chunk的中间位置，这些标记将被移除，在原位置留下两个chunk。

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]

grammar = r"""
  NP:
    {<.*>+}         # Chunk everything
    }+{      # Chink sequences of JJ and NN
  """
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence) 
print(result)
result.draw()