Rich Set of Features for Proper Name Recognition in Polish Texts (2011)

Michał Marcińczuk, Michał Stanek, Maciej Piasecki and Adam Musiał (2011). "Rich Set of Features for Proper Name Recognition in Polish Texts". In: Proceedings of Security and Intelligent Information Systems International Joint Conferences, SIIS 2011, Warsaw, Poland, June 13-14, 2011, Revised Selected Papers

Abstract: In this paper we analyse the importance of data generalisation and usage of near context in the problem of proper name recognition. We present an extended set of features that encode data generalisation and linguistic information. To utilize the rich set of features we applied Conditional Random Fields (CRF) — a modern approach for sequence labelling. We present results of the evaluation on a single domain following cross-validation procedure and cross-domain by training and testing on different corpora. We show that the extended set of features improves the final results for CRF and also this approach outperforms Hidden Markov Models (HMM). On the single domain CRF obtained 92.53% of F-measure for 5 categories of proper names, and 67.72% and 72.62% of F-measure for other two corpora in cross-domain evaluation.
Keywords: Named Entity Recognition, Proper Name Recognition Machine Learning, Hidden Markov Model, Conditional Random Fields, Classifier Ensamble, Polish

BibTex

to be done

Rich Set of Features for Proper Name Recognition in Polish Texts - presentation Rich Set of Features for Proper Name Recognition in Polish Texts - extended astract
Joomla SEF URLs by Artio
Michał Marcińczuk aka. czuk
free counters
Free counters