期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2009
卷号:2009
出版社:ACL Anthology
摘要:Automatic image annotation is an attractive
approach for enabling convenient access
to images found in a variety of documents.
Since image captions and relevant
discussions found in the text can be useful
for summarizing the content of images, it
is also possible that this text can be used to
generate salient indexing terms. Unfortunately,
this problem is generally domainspecific
because indexing terms that are
useful in one domain can be ineffective
in others. Thus, we present a supervised
machine learning approach to image annotation
utilizing non-lexical features1 extracted
from image-related text to select
useful terms. We apply this approach to
several subdomains of the biomedical sciences
and show that we are able to reduce
the number of ineffective indexing terms.