Part-of-speech tagging
Part-of-speech tagging is the process of marking up the words in a text with their corresponding parts of speech. The numerous approaches to tagging fall into one of two major categories, supervised and unsupervised. Supervised tagging entails the use of a pre-tagged corpus to initialize the parameters for the system, whereas unsupervised tagging techniques use an untagged corpus for their training data and produce the tagset by induction. These two categories can be further subdivided into rule-based, stochastic, and neural approaches. Some major algorithms for part-of-speech tagging include the Viterbi algorithm, Brill Tagger, and the Baum-Welch algorithm (also known as the forward-backward algorithm). Hidden Markov model and visible Markov model taggers can both be implemented using the Viterbi algorithm.
|
|