文章基本信息

标题：A comparative study of the relevance of indirect and direct opinions in economic texts.
作者：Musat, Claudiu Christian ; Trausan-Matu, Stefan
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2010
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Generally the focus in opinion mining has been on analyzing occurrences of direct opinions in the studied texts. However there are domains where this approach is less adequate, where we need to broaden this perspective in order to detect opinions that only derive from context and are not expressed directly. We limit the analysis to economic documents. The paper continues by discussing the state of the art, the proposed system is described in Section 3, while results and conclusions follow in Sections 4 and 5.
关键词：Benchmarking;Benchmarks;Computational linguistics;Language processing;Natural language interfaces;Natural language processing

A comparative study of the relevance of indirect and direct opinions in economic texts.

Musat, Claudiu Christian ; Trausan-Matu, Stefan

1. INTRODUCTION

Generally the focus in opinion mining has been on analyzing occurrences of direct opinions in the studied texts. However there are domains where this approach is less adequate, where we need to broaden this perspective in order to detect opinions that only derive from context and are not expressed directly. We limit the analysis to economic documents. The paper continues by discussing the state of the art, the proposed system is described in Section 3, while results and conclusions follow in Sections 4 and 5.

2. STATE OF THE ART

One of the most influential works in the field was that of Pang and Lee (2008), where documents were classified according to the overall opinions expressed. Wiebe et al. (2005) discriminated between subjective and objective sentences. To achieve better granularity the focus shifted to determining the polarity of expressed opinions, ranging from a positive/negative dichotomy to more complex classifications based on opinion strength (Chen and Chiu, 2009). SentiWordNet--SWN--(Esuli and Sebastiani, 2006) was created to assign a polarity to all words in the WordNet (Fellbaum, 1998) dictionary. All the methods above rely on the presence of explicit opinions in the analyzed texts

Regarding economics, in a study regarding sentiment analysis on financial blogs, Ferguson et.al. (Ferguson et al. 2009) showed that using paragraph level annotations and a Bayesian classifier, more than 60% of the analyzed texts contained personal opinions. However their study was restrained to texts that explicitly referred to companies listed in the S&P 500 index. Mainstream media has been addressed by Ahmad et al. (2006). Furthermore opinions expressed in the press have been linked to relations between listed companies and to their stock prices (Ku et al. 2006). The lack of explicit opinions does not always imply the lack of an overall opinion. Amigo et al. (2009) showed that the mere mention of a product in a review can express positive sentiment towards it.

3. PROPOSED SYSTEM

We propose a method to extract opinions expressed in economic texts that are presented as facts. Economic predictions were analyzed as certain combinations of economic indicators such as "unemployment", "productivity" or "markets " and phrases (modifiers) that suggest a certain future evolution. The indicators are divided into two subsets. One that contains indices that rise along with the economy as a whole and one with indices which decline during a period of economic boom. We will call the first "positive" indicators, and the latter "negative". Modifiers are a collection of n-grams that indicate growth or decrease of the indicator they are referring to. As with the indicators above, the set of modifiers will be split into positive and negative.

The economic terms and phrases were obtained by expanding the EconomyWatch glossary, and the modifiers by expanding an initial handpicked set of words signaling an increase or decrease. The first expansion relies on WordNet (Fellbaum, 1998) synsets and WordNet terms whose glosses contain occurrences of terms in the initial set. The second extraction method employs a Contextual Network Graph (Ceglowski et al. 2003) that retrieves terms that tend to appear together with ones in the initial set in context.

In this experiment we used the publicly available articles from The Telegraph Finance section and its subsections. A total of 21106 articles published between 2007 and 2010 were processed and within the corpus we have found 499 articles that were labeled as having a negative bias and 216 as having a positive one.

The following task is to find the polarity of the author's economic forecast from co-occurrences of economic indicators and future state modifiers. A good example is that of recent discussions about the rise of unemployment. The term unemployment is itself a negative indicator. When the modifier attached to it is a positive one, such as "will grow" the overall prediction is negative. Likewise, if the modifier is negative, like "fell" the result is positive. A simple formula summarizes the above:

P(O) = P(I) * P(M) (1)

where

--P is the polarity of the construct

--O the resulting opinion

--I is the indicator involved

--And M the attached modifier

Not all pairs encountered are expressions of implicit opinions. The phrase "rising unemployment could prove harmful for the local economy" does not signal a prediction about the evolution of the indicator. While bearing in mind that oversimplification could significantly reduce the opinion extraction system's recall, we used two heuristics to avoid uncertainties similar to the mentioned one.

The first imposed limitation is that the indicator should be either a noun (or, if the indicator is not a unigram, an n-gram containing a noun) or an adjective attached to a neutral noun. Neutral associate nouns are very frequent in the encountered economic texts, forming constructs such as "economic data", "unemployment figures" or "mortality numbers".

The modifier is limited to the associated verb or, when the verb suggests a continuous state of fact, an adverb following that verb, as in "the recovery remains sluggish". For simplicity we will refer to the entire family of verbs related to the one in the previous example as continuers. In order to apply the heuristic adjustments above, a part-of-speech tagger was used.

Also, detecting negations of the selected indicators and modifiers is crucial for the accuracy of the system as the use of negated terms is often more prevalent than their use without negation.

A document in which the majority of pairs indicate the growth of a positive indicator or the decrease of a negative one will be labeled as having a positive polarity, whereas a document in which the majority of pairs indicate the growth of a negative indicator or the decrease of a positive one will have a negative polarity.

4. RESULTS

The results of the opinion extraction test phase and the precision and recall of the experiment are presented in Tables 2 and 3 next to the SentiWordNet results on the same corpus. Note that there are articles in the root sections of both the positive and negative batches, so that the sum of the articles in the subsections does not equal the total. Both the positive and negative sections contain three subsections: jobs, markets and comments. We denote the positive outcomes as "+", the negative ones as "-" and the neutrals as "=".

5. CONCLUSIONS

The SWN recall is consistently close to 100% because the polarity of all present words is considered and the sum is rarely close enough to 0 to be considered "neutral". However the precision is consistently smaller than the indirect opinion based system. Also, our system's recall is linked to the size of the training corpus, the worst results being obtained for the Jobs and Comment sections of the positive batch, where the number of articles is the lowest. Also we note that the comment sections for both the positive and negative articles present a lower accuracy compared to the Markets and Jobs sections because of the much larger number of subjects discussed, from personal finance to macroeconomics.

By switching focus from explicit opinions in economics to the quantification of implicit predictions of future economic outcomes we hope to have created a novel tool for opinion mining, complementary to existing ones. Future extensions include the processing of irony and metaphors, both widely used in financial articles have not been addressed in the current experiment. In addition to that, there are numerous cases where outside quotes are used, only to be disagreed with by the author. This too is a type of negation that needs to be included in the opinion extraction module. But perhaps The most important of all future improvements is the addition of more indicators and modifiers to improve the system's recall.

6. ACKNOWLEDGEMENTS

Supported by the EU Research Grant POSDRU 7713.

7. REFERENCES

Ahmad K.; Cheng D. & Almas Y. (2006). Multi-lingual sentiment analysis of financial news streams. First International Workshop on Grid Technology for Financial Modeling and Simulation: pp. 984-991

Amigo E.; Spina D. & Bernardino B. (2009). User Generated Content Monitoring System Evaluation. Proceedings of WOMSA 2009. pp 1-13

Ceglowski M. & Coburn A. (2003). Semantic Search of Unstructured Data using Contextual Network Graphs.

Chen L.S. & Chiu H.J. (2009). Developing a Neural Network based Index For Sentiment Classification. Proceedings of IAENG 2009: pp 744-749.

Cruz F.; Troyano J.; Ortega J. & Enriquez F. (2009). Domain Oriented Opinion Extraction Metodology. Proceedings of WOMSA. pp. 52-62

Esuli, A.; Sebastiani F. (2006). SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining, Proceedings of LREC, pp. 417-442, Genoa, Italy, may 2006

Felbaum C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press

Ferguson P.; O'Hare N.; Berminghamv A.; Tattersall S.; Sheridan.; Gurrin C. & Smeaton A. (2009). Exploring the use of Paragraph-level Annotations for Sentiment Analysis of Financial Blogs. Proceedings of WOMSA 2009. 42-52

Liu B. (2008). Opinion Mining. In proceedings of WWW-2008

Pang B. & Lee L. (2008). Opinion Mining and Sentiment Analysis, Foundation and Trends in Information Retrieval, 2: 1-135

Wiebe J.; Wilson T. & Cardie C. (2005). Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation, 39(2):165-210

Tab. 1. Relevant occurrences of key terms

Term POS Requirements

Indicator Noun/Noun Phrase None
Indicator Adjective Neutral Noun
Modifier Verb None
Modifier Adverb Continuer Verb

Tab. 2. Polarity identification test results

Corpus All + - = SWN SWN SWN
 + - =
Neg.

All 499 86 252 161 207 288 4
Jobs 54 1 35 18 19 35 0
Markets 253 44 128 81 100 152 1
Com. 153 32 73 48 60 91 2

Pos.

All 216 87 50 79 63 94 59
Jobs 14 4 3 7 4 7 3
Markets 152 71 31 50 42 69 41
Com. 34 11 9 14 11 15 8

Tab. 3. Polarity identification test precision and recall

Corpus Precision Recall SWN SWN
 Precision Recall

Total 71.37 66.43 53.83 91.19

Negative

All 74.56 67.74 58.18 99.20
Jobs 97.22 66.67 64.81 100.00
Markets 74.42 67.98 60.32 99.60
Comments 69.52 68.63 60.26 98.69

Positive

All 63.50 63.43 40.13 72.69
Jobs 57.14 50.00 36.36 78.57
Markets 69.61 67.11 37.84 73.03
Comments 55.00 58.82 42.31 76.47