期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2014
卷号:3
期号:12
页码:4299-4301
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Information Extraction is a technology that is innovative from the user's point of view in the current information-driven world. Rather than indicating which documents need to be read by a user, it extracts pieces of information that are salient to the user's needs. Links between the extracted information and the original documents are maintained to allow the user to reference context for example Named Entity Recognition(NER). It helps machine to recognize proper nouns (entities) in text and associating them with the appropriate types. Common types in NER systems are location, person name, date, address, etc. There are several NER systems in the world. Such as GATE, CRFClassifier, OpenNLP and Stanford NLP(Natural Language Processing ). The NER system works fast for limited amount of documents but drawback of this system is that it works slows for huge/large amount of data. To overcome the drawback of NER system, this paper, report the development of a NER which is based on Map Reduce, a distributed programming model. This development helps to achieve the fast extraction with better performance.
关键词:Big textual data; Distributed computing; ; Hadoop; MapReduce; Maxent Tagger; Named Entity ; Recognition (NER) ; Natural Language Processing (NLP).