Python 块和缺口
块化是根据单词的性质将相似的单词组合在一起的过程。在下面的示例中,我们通过定义一个语法来生成块。该语法建议在创建块时遵循名词、形容词等短语的顺序。下面是块的图形输出。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
result.draw()
当我们运行上述程序时,我们会得到以下输出 –
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
运行上述程序后,我们得到以下输出:
砌缝
砌缝是从块中移除一系列标记的过程。如果这些标记序列出现在块的中间,这些标记将被移除,原本存在的两个块将会保留。
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = r"""
NP:
{<.*>+} # Chunk everything
}+{ # Chink sequences of JJ and NN
"""
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
当我们运行上面的程序时,我们得到以下输出 –
正如你所看到的,符合语法要求的部分被从名词短语中剥离为独立的块。将不符合所需块的文本提取出来的过程称为chinking。