期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2016
卷号:39
期号:2
页码:93
出版社:IEEE Computer Society
摘要:Recent prevalence of Web search engines, microblogging services as well as instant messaging toolsgive rise to a large amount of short texts including queries, tweets and instant messages. A betterunderstanding of the semantics embedded in short texts is indispensable for various Web applications.We adopt the entity-level semantic representation which interpretes a short text as a sequence of mention-enity pairs. A typical strategy consists of two steps: entity extraction to locate entity mentions, and entitylinking to identify their corresponding entities. However, it is never a trivial task to achieve high quality(i.e., complete and accurate) interpretations for short texts. First, short texts are noisy, containingmassive abbreviations, nicknames and misspellings. As a result, traditional entity extraction methodscannot detect every potential entity mentions. Second, entities are ambiguous, calling for entity linkingmethods to determine the most appropriate entity within certain context. However, short texts are length-limited, making it infeasible to disambiguate entities based on context similarity or topical coherence ina single short text. Furthermore, the platforms where short texts are generated are usually personalized.Therefore, it is necessary to consider user interest and its dynamics overtime when linking entities inshort texts. In this paper, we summarize our work on quality-aware semantic representations for shorttexts. We construct a comprehensive dictionary and extend traditional dictionary-based entity extractionmethod to improve recall of entity extraction. Meanwhile, we combine three novel features, namelycontent feature, social feature and temporal feature, to guarantee precision of entity linking. Empiricalresults on real-life datasets verify the effectiveness of our proposals.