摘要:Named Entity Recognition aims to identify and
to classify rigid designators in text such as proper names, biological species,
and temporal expressions into some predefined categories. There has been
growing interest in this field of research since the early 1990s. Named Entity
Recognition has a vital role in different fields of natural language processing such as
Machine Translation, Information Extraction, Question Answering System and
various other fields. In this paper, Named Entity Recognition for Nepali text,
based on the Support Vector Machine (SVM) is presented which is one of machine
learning approaches for the classification task. A set of features are extracted from training data set. Accuracy
and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are
tested with ten datasets for Nepali text. The strength of this work is the
efficient feature extraction and the comprehensive recognition techniques. The
Support Vector Machine based Named Entity Recognition is limited to use a
certain set of features and it uses a small dictionary which affects its
performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training
data and increase the rate of learning on the increment of
training size.
关键词:Support Vector Machine; Named Entity Recognition; Machine Learning; Classification; Nepali Language Text