Python – Chunks 和Chinks
Chunking是基于单词的性质将相似的单词组合在一起的过程。在下面的示例中,我们通过定义文法来生成chunk。文法建议像名词和形容词等短语的顺序,这些短语将在创建chunk时遵循。如下图所示,chunk的图形输出。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
result.draw()
当运行上面的程序时,我们得到以下输出-
改变文法,我们得到另一个输出,如下所示。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
当运行上面的程序时,我们得到以下输出-
Chinking
Chinking是从chunk中移除一系列标记的过程。如果一系列标记出现在chunk的中间位置,这些标记将被移除,在原位置留下两个chunk。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = r"""
NP:
{<.*>+} # Chunk everything
}+{ # Chink sequences of JJ and NN
"""
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
当我们运行上面的程序时,我们得到以下输出-
正如您看到的,符合语法的部分从名词短语中分离出来作为独立的chunk。将非所需chunk中的文本提取出来的过程称为chinking。