Statistical Proper Name Recognition in Polish Economic Texts (2011)

Abstract: In the paper we presented a Proper Name Recognition algorithm based on the Hidden Markov Model (HMM). Recognition of the Proper Names is treated as the basis for Named Entity Recognition problem in general. The proposed method is based on combining domain-depended method based on HMM with domain independent methods based on gazetteers and hand-written rules for recognition and post-processing that capture general properties of Polish PN structure. A large gazetteer with entries described morphologically was acquired from the web. HMM re-scoring mechanism was applied as a basis for integration of the different knowledge sources in PN recognition. Results of experiments on a domain corpus of Polish stock exchange reports used for training and testing are presented. A cross-domain evaluation on two other corpora is also presented. Adaptability of the method was analysed by applying the trained model to two other domain corpora.
Keywords: Proper Name Recognition, Named Entity Recognition, Machine Learning, Hidden Markov Model, Rule-Base Approach, Dictionary-Base Approach

Bibtex

@article{marcinczuk2011-cc, author = "Marcińczuk, Michał and Piasecki, Maciej", number = "2", title = "{S}tatistical {P}roper {N}ame {R}ecognition in {P}olish {E}conomic {T}exts", volume = "40", year = "2011", }
Joomla SEF URLs by Artio
Michał Marcińczuk aka. czuk
free counters
Free counters