Determining quality levels for improving maintenance processes.
Garais, Gabriel Eugen ; Enaceanu, Alexandru Serban
Abstract: The emergence known by online press companies requires
written filtered information for a better understanding and speed the
understanding of texts and messages that are posted. Testing the
readability of text in an online environment is important in the
maintenance and optimization process for managing and offering quality
content.
Key words: readability, maintenance, web content, text optimization
1. INTRODUCTION
Readability of text is defined as a document that can easily be
read and understood. Gunning Fog, Flesch Reading Ease, Flesch-Kincaid,
SMOG (simple measure of gobbledygook), Fry readability formula,
Automated Readability Index (ARI), Spache readability formula,
Dale-Chall readability formula, Coleman-Liau index represent
algorithmic-level models that are helping the site rank in a hierarchy
of degrees of readability and are useful in filtering and sorting of
certain information depending on the resulting interpretation of texts
(DuBay, 2007). In this article we are presenting a formula for testing
readability for the Romanian language usable in quality content
maintenance processes. This formula is integrated through use of
middleware technologies as API integration interfaces, application
servers and web servers (Botezatu, et all. 2009).
2. READABILITY INTERNATIONAL FORMULAS
The readability formulas needed to develop a Romanian formula are
presented in the next paragraphs and are tested in this article on two
texts
which contain web content written in Romanian language, noted with
Story A and Story B.
Story A--http://www.amosnews.ro/Story-29-50027
Story B--http://www.amosnews.ro/Story-29-50235
We present in table 1, common parameters which are needed at the
basis of the tested readability formulas.
Readability formulas are divided in two categories [L.sub.1] and
[L.sub.2] which are differentiated through the way of interpreting the
final result. There are results that:
--Are distributed on a 0 to 100 scale;
--Indicate the level of necessary education to understand the text.
The results of formulas that take account of number of syllables
[L.sub.1], is transposed on a 0 to 100 scale, in which 0 gives the text
a lower level of readability (a hard to understand text), and 100 gives
the text a high level of readability (text easy to understand).
Flesch Reading Ease Model is of [L.sub.1] category with levels from
0 to 100. As the score grows higher the document is easier to
understand. Web Sites must reach a level between 60 and 70 to be
understood by a number of many readers.
This calculation is based on the next elements:
--Average of sentence length;
--Average number of syllables;
--The amount of personal word used;
--The amount of personal sentences used in 100 words.
The model determines how much a person with average skills can read
and understand from a written message. The results are compared with
determined standards for the targeted audience considering that a
readable Ad contains 14 words in a sentence, 140 syllables at 100 words,
10 personal words an 43% personal sentences.
The method represents a way of verifying the communication
efficiency and it is advisable using this together with other pretested
processes. The formula is:
FRE = 206.835 - 1.015 (Ncv/Pr) - 84.6 (Tsilab/Ncv)
where:
FRE: Flesch Reading Ease readability formula
Tsilab: total number of syllables
Ncv: number of words
Pr: number of sentences
The coefficients 206.835, 1.015 and 84.6 (DuBay, 2007) are
multiplying coefficients chosen as a result of text tests on English
language. The coefficients are a consequence of a refinement process of
the amount of education degree of a person that reads and understands
the English language. The coefficient of 84.6 represents the amount of
importance assigned to the number of words within a text. The word
processors that use this algorithm are: Microsoft Word, Google Does,
Lotus WordPro, Kword.
The results after applying the Flesch Reading Ease formula on the
two stories A and B, demonstrates the calibration strictly for the
English language being impossible for the two stories to be on such a
low level on the 0-100 scale. The obtained result as they are can be
treated as if the persons who read these texts should at least have a
PhD diploma.
The researches on readability formulas show that there are formulas
for next languages: Italian, Spanish, French, Danish, Japanese (DuBay,
2007).
After some tests it has been observed that the only formulas that
are near as a result to the Romanian language are the formulas for the
Italian and Spanish language, as it should be reasonable because of the
lexical construction similarities between these languages.
The calibration of the Flesch Reading Ease formula for the Italian
language is of [L.sub.1] category. The formula is also known as the
Franchina-Vacca formula.
[FRE.sub.IT] = 217- 1.3 [N.sub.cvmed] - 0,6 [N.sub.sil100]
where:
[FRE.sub.IT]--FRE formula for the Italian language
[N.sub.cvmed]--average words on sentence
[N.sub.sil100]--syllables in 100 words
Applying the [FRE.sub.IT] formula on story A and B shows, as in
table 3, that this formula is closer to a normal level as those in table
2. So it is proved that using formulas of languages with a closer
lexical form to the Romanian language is preferable.
The amount of 0.6 is applied to the number of syllables identified
in 100 words chosen successively in the analyzed text and 1.3 is the
amount applied to the average number of words from the total number of
sentences.
The adjustment of Flesch Reading Ease formula for Spanish is
classified as a [L.sub.1] category. The adjusted formula is known as
Fernandez Huerta. The Spanish label comes from the name of the scientist
who adjusted the initial Flesch formula.
[FRE.sub.SP] = 206.84 - (0.60 * [N.sub.sil100]) - (1.02 *
[N.sub.cvmed])
where:
[FRE.sub.SP] - FRE formula adjusted for Spanish language
[N.sub.cvmed]--number of average words from a sentence
[N.sub.sil100]--number of syllables at 100 words
The result from table 4 is another prove of small gap between the
lexical form of the Romanian language and others to base a new
readability formula.
The models of determining readability with educational notations
are of [L.sub.2] category, which can be found in specialized literature
as: Gunning-Fog, Flesch-Kincaid Grade Level, SMOG, Fry, ARI, Spathe,
Dale-Chall, Coleman-Liau Index (Ferris & Hedgcock, 2009).
In this article it will be applied only one model of [L.sub.2]
category, the Gunning-Fog model that shows how many years of personal
education e person needs to understand with ease a specific text. A
lower number denotes a better understanding and at the other point of
interval, a higher number shows a more complex text and so making it
hard that such a text to be understood. In this case a number of 17 need
post-university education for a text to be understood. This test was
created for the English language and tests mainly the number of
syllables
from a word ignoring the numerical values. Testing this formula on
stories A and B gives results in table 5.
[NIV.sub.edu] = 0.4 * (Ncv/Pr) + (Cts/Ncv) * 100
where:
[NIV.sub.edu]--US education level
Ncv--Number of words
Cts--Number of words with more than 3 syllables
Pr--Number of sentences
It is suggested that the number of long words should not be more
than 10 to 15 at every 100 words so that texts can be understood with an
education equivalent to high school.
3. READABILITY FOR ROMANIAN LANGUAGE
After many tests on > 40.000 texts of different lengths and
complexity a formula was created to calculate Romanian language
readability through an empirical method based on standard L1 and L2
formulas. The formula that results from applying the rules in
determining proportions is:
[G.sub.cit] = 0.0158 * [L.sub.txt] * Nivgr/Freis
where:
[G.sub.cit]--readability formula for texts written in Romanian
language
FRELs--average of [L.sub.1] relations
NIV gr--average of [L.sub.2] relations
[L.sub.txt]--text length measured in number of characters
This formula determines based on readability formulas how easier or
harder other texts are. From the developers point of view they have
access to a table of contents which suggests them quality and quantity
values. The text supervisors use the [G.sub.cit] indicator in an
automated way through filtering and calculations of an algorithm which
shows them not only final results but also the intermediate stages so
that they can make better decisions about keeping or improving the
quality of texts that are published on the web site. The necessity of
this formula comes from maintenance processes that require better
contents.
4. CONCLUSION
There is not a standard for what is a quality text, but there are
target audiences and for this, using the right tool can improve the
experience of that target readers. The Romanian readability formula is
determined empirical and must be refined in the years to come. The next
step of research contains further testing for refining the formula for a
reliable public use.
5. REFERENCES
Dana R. Ferris, John Hedgcock (2009)--Teaching Readers of English:
Students, Texts, and Contexts, Taylor & Francis, ISBN:
978-041-5999-64-9
William H. DuBay (2007)--Unlocking Language: The Classic Studies in
Readability, BookSurge Publishing, ISBN: 978-141-966-176-1
Botezatu Cornelia, Botezatu Cezar, George Carutasu, (2009) Software
integration--necessity for integrated managemement systems, Annals of
DAAAM for 2009 & Proceedings of the 20th International DAAAM
Symposium, pp 123-124, ISSN 1726-9679
**** (2010) http://www.utexas.edu--Texas--Austin University,
Accessed on: 2010-08-18
**** (2006) http://www.wordscount.info/hw/smog.jsp--Smog
Calculator, Accessed on: 2010-11-20
Tab. 1. Analyzed parameters to calculate the readability
formulas
Measured parameter Story A Story B
Characters 12903 528
Letters 10466 413
Phrases 109 7
Words 2120 91
Distinct words 932 49
Average words / sentence 19.45 13
Average syllables / word 2.02 1.87
Words with [greater than or equal to] 3 syllables 611 25
Total count of syllables 4275 170
Percent of words [greater than or equal to] 3 28.82 27.47
syllables
Tab. 2. The results after applying the Flesch Reading Ease
formula on story A and B
Story A Story B
16.5 35.6
Tab. 3. The results of [FRE.sub.IT] formula on stories A and B
Story A Story B
70.07 88
Tab. 4. The results of applying the [FRE.sub.sp] formula on stories A
and B
Story A Story B
80.06 86.9
Tab. 5. The Gunning--Fog formula results on stories A and B
Story A Story B
18.5 12.7