Predicting project success in construction using an evolutionary Gaussian process inference model.
Cheng, Min-Yuan ; Huang, Chin-Chi ; Van ROY, Andreas Franskie 等
Introduction
The primary task of performance control is to ensure that project
goals are achieved and to provide feedback on the status of each phase
of construction. However, post-implementation performance evaluation is
resource-intensive, time consuming and is impotent in its influence on
the success of the project's implementation. It also does not
provide the benefits of real-time monitoring of the current construction
status.
Traditional methods of project control are commonly based on the
experience and habits of those in management. The subjectivity of the
choice of these methods often leads to error. This is especially
prominent in the management of larger construction projects as
predicting a number of possible issues from a huge set of data become
more difficult. In recent years, there have been many studies dedicated
to improving project success. Khosravi and Afshari (2011) proposed a
success measurement model for construction projects to determine how
successful projects were after their closing phase. There have also been
many academic assessments of Critical Success Factors within
construction projects (Chan et al. 2004; Griffith et al. 1999; Sanvido
et al. 1992).
The time series method is widely used in construction to make
predictions based on historical data. In order to preserve past
experience and to resolve the issue of huge datasets in project control,
the "Continuous Assessment of Project Performance" (CAPP)
system was developed by the Construction Industry Institute (CII) and
was used to collect and compile project information and analyse the
differences between successful and unsuccessful project progress
s-curves (Russell et al. 1997). Statistical analyses using this system
were undertaken by various studies to confirm the significance levels of
known factors that influence project performance and to investigate
whether there are other key factors that may influence the success of a
project. Even though CAPP is useful in analysing these factors, it is
not able to accurately predict the end result of a project. Ko and Cheng
(2007) proposed to build prediction models using an Evolutionary Fuzzy
Neural Inference Model (EFNIM), but in practice the required
calculations are time and system resource consuming, making it difficult
to update prediction models. For this reason, this study adopted the
Evolutionary Gaussian Process Inference Model (EGPIM) to solve this
issue.
The EGPIM features a short training time and precise predictions,
making it suitable for application as a dynamic prediction model to
provide construction managers with information about the project in real
time to aid their decision making. The dynamic prediction model that
this study used to calculate the success of a project is based on
information that was collected from CII's database of historical
information. The CAPP was first used to perform a statistical analysis
of the influential factors, thus confirming the key factors that
influence project success. A time series was then applied to organize
the cases from the database. With that done, the EGPIM was applied to
these cases for training before going on to predict the success of new
projects. The resultant prediction is able to assist those in project
management to efficiently control project performance, expedite the
discovery of potential problems in the field as well as remedy these
problems during construction.
With these benefits in mind, a database was created using the CAPP
research results. A time series was then applied to this data for
sorting and the EGPIM was applied to build a dynamic prediction model
for the success of a project. It was verified that the time series
predictions of the EGPIM were very precise and the current project
performance was monitored in real time so that management personnel can
handle the project more efficiently.
1. Review of approaches
1.1. Gaussian process regression
Gaussian process (GP), an artificial technique actively developed
in recent years, has been applied in the fields of chemistry,
construction, and medicine, among others (Brahim-Belhouari, Bermak
2004). In the field of construction, GP has primarily been applied in
regression and classification prediction. Yan et al. (2011) proposed a
GP machine learning-based model to classifying surrounding rocks. Su and
Xiao (2011) combined the Gaussian process (GP) and importance sampling
method (ISM) in a new method to analyse slope reliability that obtained
highly accurate results.
Along with other AI techniques, GP gives a statistical advantage
and is easy to learn (Chu, Ghahramani 2005; Kocijan et al. 2004); thus,
based on probability theorem, Gaussian Process can not only make
predictions on unknown input data, but can also provide prediction
accuracy based on the predictions (estimation variances), which highly
elevates the statistical significance in prediction (Bonilla et al.
2009). GP can be regarded as a combination of random variances, of which
capricious and limited numbers of random variances all obey Gaussian
distribution:
F (X) = {f ([X.sub.1]), f ([X.sub.2]), ..., f([X.sub.N])} ~ N([mu],
K), (1)
where: u is the mean of variances; and K is covariance matrix. X is
the collection of data input factors of N dimensions [X.sub.1],
[X.sub.2], ..., [X.sub.N], GP can be described via mean function m(X) in
fX) and covariance function k(X,X') in a random process.
f (X) ~ GP (m (X), k (X, X')). (2)
In real situations, however, data prediction is often accompanied
by noise, and therefore, when the value Y is calculated by the
estimation of the function, an error parameter [epsilon] should be
considered. Likewise, e also coincides with the Gaussian distribution. Y
is calculated as follows:
Y = F(X) + [epsilon]. (3)
Denoting the training set as {X, Y}, new input data is [X.sub.*],
and desire output is [Y.sub.*]
Joint distribution calculated under Gaussian distribution; [theta]
represents the parameters in the joint distribution:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (4)
where: k = [[k ([X.sub.*], [X.sub.1]) ... k ([X.sub.*],
[X.sub.N])].sup.T] is the n x 1 vector formed from the covariance
between [X.sub.*] and the training input X. The scalar [kappa] =
k([X.sub.*], [X.sub.*]), [[sigma].sup.2] is variance.
Hence, the conditional of probability distribution can also be
calculated with expected value together with noise:
[Y.sub.*]|Y, X, [theta], [[sigma].sup.2] ~ N(m ([X.sub.*]),
v([X.sub.*])). (5)
In the end, based on conditional probability distribution, the mean
m([X.sub.*]) and variance v([X.sub.*]) of expected value Y* can be
calculated.
m([X.sub.*]) = [k.sup.T] [(K + [[sigma].sup.2]I).sup.-l] Y; (6)
v ([X.sub.*]) = K + [[sigma].sup.2]-[k.sup.T][(K +
[[sigma].sup.2]I).sup.-1]k. (7)
1.2. Bayesian inference
Apart from model information and data information, Bayesian
inference also utilizes the distribution information of unknown
parameters (Markvardsen 2004). This kind of information existed prior to
the experiment, and is expressed with the probability distribution of
unknown parameters, so it is generally called "prior".
The general model is: prior + sample information => posterior
Bayesian theorem aims to use known information to construct the
posterior probability density of system status variances, which means
utilizing the model to predict the prior estimated density of the
status, and then using the latest observation information to rectify and
thus get probability density. Using observation information to calculate
status variances, we can trust in the accuracy of different values, and
receive the best estimation of the model (Chamberlain, Imbens 2003; Seng
2008). The Bayesian inference commonly used in probability reasoning
(Mahdavi Adeli et al. 2011) and engineering is also often used in
reliability analysis (Der Kiureghian 2008; Maes 2007) and Bayesian
networks (Perelman, Ostfeld 2012).
1.3. Particle Swarm Optimization algorithm (PSO)
The Particle Swarm Optimization (PSO) algorithm is a relatively new
algorithm derived by Kennedy and Eberhart (1995) from a simplified
social model simulation. PSO algorithms mimic mechanisms used by birds
to share information in flight. The particle concept requires members in
groups without mass and volume and with designated speed and
acceleration. The first version of PSO added neighboring speed values
and considered multi-dimensional search and distance-based acceleration.
Inertia weight, introduced later, enhanced the algorithm's
exploitation and exploration and paved the way to form a standard
version of the algorithm (Clerc, Kennedy 2002). PSO is often applied in
engineering to solve multi-objective decision-making (Azadnia, Zahraie
2010) and optimization (Li et al. 2010) tasks. In recent years, PSO has
been increasingly associated with other AI tools to develop numerous new
optimization methods (Yan, Zhang 2011; Zhao et al. 2006).
2. Evolutionary Gaussian process inference model
This model is founded on historical data and formed with Gaussian
process, in combination with Particle Swarm Optimization (PSO) and
Bayesian inference. In this model, GP is used to reveal the intricate
relationship between variance input and output. Bayesian inference
structure gives the posterior probability of the entire function, and
serves as the reference for parameter optimization. PSO is used to
search the best hyper-parameter GP and required Bayesian analysis; the
structure is shown in Fig. 1. The model includes three parts.
A. Data input
Collecting and arranging input data x and data y, X is the
collection of data input factor of N dimensions [X.sub.1], [X.sub.2],
..., [X.sub.N]; and Y is the collection of m pieces of desire [Y.sub.1],
[Y.sub.2], ..., [Y.sub.m]. Thus, any [Y.sub.l] is the reflection of the
desire value of case input value ([X.sub.1l], [X.sub.2l], ...,
[X.sub.Nl]} (Money et al. 2012).
The corresponding function value of any input factor [X.sub.j] is
f([X.sub.j]): F(X) = {f([X.sub.i]), f([X.sub.2]), ..., f([X.sub.n])};
F(x) is the function congregation to demonstrate the relationship
between x and y, and here the Gaussian process is used to describe
function distribution. Assuming function F(X) coincides with Gaussian
distribution, and to make the work easier, the expected value m(X) is 0,
the probability is shown as:
P(F) = 1/[(2[pi]).sup.N/2] [[absolute value of (K)].sup.1/2] exp
[-[1/2] [F.sup.T][K.sup.-1]F] ~ N(0, K), (8)
where: K is the matrix constructed from the covariance function k =
(X, X'); and the equation above the probability of the set function
F is regarded to be controlled by the covariance matrix K.
[FIGURE 1 OMITTED]
B. Gaussian process and Bayesian inference
(1) Covariance matrix and parameter.
After determining the stationary pattern, covariance function is
chosen to construct the covariance matrix. The parameter model and
quantity vary according to the differences of functions, and this study
adopts the most common Squared Exponential covariance function.
[K.sub.SE]([X.sub.i], [X.sub.j]) =
[[sigma].sup.2.sub.f]exp[-[1/2][([X.sub.i]-[X.sub.j]/[r.sub.i]).sup.2] +
[[sigma].sup.2.sub.n] [[delta].sub.ij], (9)
where: [[sigma].sub.f] (signal variance)--controls the volatility
of the entire function; on (noise)--indicates the errors of the entire
function; [r.sub.l] (length-scale)--shows the relationship between
variances [X.sub.l] and [X.sub.j] in function space; [[sigma].sub.f]
[[sigma].sub.n], [r.sub.1], [r.sub.2], ..., [r.sub.n] represent the
hyper-parameters in the matrix. In this paper, we use 0 to represent the
aggregation of hyper-parameters (Fig. 1).
(2) Bayesian inference and posterior probability.
According to chosen covariance function, and utilizing Bayesian
theorem, the posterior probability of the entire function P (F|X ,Y) is
inferred.
P(F|X,Y) = P(Y|F. X)P(F)/P(Y|X). (10)
To maximize the posterior probability P(F|X,Y) minimizing the
Negative Log-Marginal Likelihood (NLML) and combining PSO are approaches
employed with the goal of having the most likely hyper-parameter during
the minimization process.
C The optimization of hyper-parameter
PSO is applied to EGPIM to optimize the hyper-parameter in function
space, and comprises the best function in the model
(1) Initial stage.
PSO parameter was set up, and the particle groups, particle speed
and positions were then randomly started to initiate and proceed with
iteration:
--group scale m;
--maximum speed [V.sub.max];
--acceleration constant [c.sub.1] and [c.sub.2];
--maximum inertia weight [W.sub.max];
--minimum inertia weight [W.sub.min];
--maximum iteration times [Iter.sub.max];
--terminate accuracy requirement NLML (Negative Log Marginal
Likelihood),
where: group scale m represents number of particles; [V.sub.max] is
the maximum particle velocity; [c.sub.1] and [c.sub.2] are acceleration
constants that are also called learning factors. Usually, [c.sub.1] =
[c.sub.2] = 2; [W.sub.max] is the final inertia weight and [W.sub.min]
is initial inertia weight, used to calculate inertia weight; Itermax
sets the maximum number of particle swarm optimization times; NLML is
the fitness value of the PSO. In general, iterative termination is
defined as when either the maximum number of iterative times and/or some
minimum fitness value is reached.
(2) Optimization stage.
We used a fitness calculation of particles to discriminate between
good and bad particles. The adaptation value depended on NLML. In
practice, prior knowledge is insufficient to fix appropriate values for
the hyper-parameters that define the covariance. We therefore gave prior
distributions to the hyper-parameters and based predictions on a sample
of values from their posterior distribution. Sampling from the posterior
distribution requires computation of log likelihood based on the
datasets, which is:
-log P (Y|X) = 1/2 [Y.sup.T] [(K (X, X) + [[sigma].sup.2]I).sup.-1]
Y) + 1/2 log [absolute value of (K(X, X) + [[sigma].sup.2]) + N/2
log2[pi]. (11)
The calculation of particle search speed and direction is conducted
as follows:
Particle speed calculation:
[V.sup.t+1.sub.id] = [W.sup.t+1] x [V.sup.t.sub.id] + [c.sub.1] x
rand() x ([pbest.sub.id]-[S.sup.t.sub.id]) + +[c.sub.2] x rand() x
([gbest.sub.id]-[S.sup.t.sub.id]). (12)
Particle weight:
w = [w.sub.max]-[[W.sub.max]-[W.sub.min]/[iter.sub.max]] x iter.
(13)
New search direction calculation:
[S.sup.t+1.sub.id] = [S.sup.t.sub.id] + [V.sup.t+1.sub.id], (14)
where: [V.sup.t.sub.id] is the velocity of particle i at iteration
t in dimension d; [V.sup.t+1.sub.id] is the new updated particle
velocity; position of [S.sup.t.sub.id] i is the current location;
[S.sup.t+1.sub.id] is the new updated particle location; [pbest.sub.id]
is the optimization found by the particle itself, which are the extrema
of body; [gbest.sub.id] is the optimization of the whole swarm, which is
the global extrema; Rand ( ) are the random numbers within (0, 1); and
[c.sub.1] and [c.sub.2] are called learning factors.
w is the weighting efficient, with a value between 0.1 to 0.9.
Through constant learning and renewing of location and speed, particles
gradually fly into the optimum location of space until the searching
process ends. The final output, gbest, is the best optimization.
(3) Termination stage.
After a continuous search in function space, the best global
solution is gbest. If the fitness value > global solution, then the
search will continue. The conditions for search ending are:
--Coincides with the requirement accuracy (NLML);
--Reaches search Itermax. Otherwise, the search is continued.
3. Prediction of project success using EGPIM
The EGPIM proposed herein adopts a proactive approach that utilizes
time series data to predict a single ongoing project outcome at
different stages of completion, given by percentages. The implementation
process follows Roy's (2009) methods, as shown in Figure 2.
3.1. The implementation process
This seven-step process is divided into two parts, the first being
steps 1 through to 6 and the second being step seven, which applies the
EGPIM to make predictions on project success. The following details the
method of each step:
(1) Assign project type as the project parameter.
Fifty four historical projects from the CAPP system database with
diverse data characteristics were used for this study. The process
project type was chosen as the project parameter for this study in order
to gain a more complete understanding of the factors that influence
projects. This type of project typically covers about 64% of project
data in the CAPP database, with the best factors identified by CAPP for
predictive ability.
(2) Identify influencing factors.
This study adopted the CAPP software's recommendation that the
variable level of significance should be set below 0.10. This
significance level represents the statistical difference between project
outcomes and factors considered to have a predictive ability for project
success. CAPP software analysed 76 factors from the project data set
with 11 factors being identified as significant (as shown in Table 1).
[FIGURE 2 OMITTED]
(3) Data normalization.
Based on data analysis, CAPP normalized the project data from 0-100
percent completion into 30 reporting periods. It also identified that
actual owner expenditure factors have the greatest impact on predicting
project outcome. As per our study objectives, owner expenditure factors
were chosen as the factor to be normalized for all process projects.
Corresponding with 30 reporting periods, the normalized data for owner
expenditures provided the basic data to generate s-curve graphs.
(4) Choose the project with the most complete data.
A proactive approach was used by this study to predict the outcome
of a single ongoing project. To distinguish the project from other
process projects in the database, only one project was chosen as the
'assessment project'. The study required the chosen project to
have complete data for all 11 of the time-dependent factors for success
identified by CAPP. Of the 34 process projects, Project 233 fulfilled
these requirements.
(5) Generate the average s-curves based on the factors to gain
optimal predictive ability.
There are four project outcome categories in the CAPP system,
namely "successful", "on time or on budget",
"less than successful", and "disastrous". All
project outcomes were recorded within the CAPP database upon project
completion. The outcomes of the projects that were examined in this
study are listed in Table 2. Average s-curves were then generated based
on these four project outcomes using generated normalized data. Since
the three projects in the 'disastrous' category did not have
data on actual owner expenditure factors, we were unable to plot an
average s-curve for this category. Four different zones representing
each of the project outcome ranges were then created proportionally
within those three average s-curve lines (Fig. 3). As an example, zone
0.667 (for on time or on budget) was formed by two limit lines (upper
and lower). For the lower limit, the line can be drawn based on average
values for the actual owner expenditure percentage between the average
of all successful projects and the average of all on-time or on-budget
projects. The same approach also applies to the upper-limit line, as
well as to the rest of the limit lines. This zone apportionment may
later be used to determine the project outcome degree as it relates to
the assessment of ongoing projects at every completion interval up until
total project completion.
(6) Collect training and testing patterns.
Each of the 11 factors identified by CAPP software as significant
was employed as input patterns. Output data was derived from the project
outcome at every completion interval that tracks along the zone path of
the average s-curve graphs for Project 233. To replicate a proactive
approach, three different sets of training patterns were collected at
50%, 67%, and 90% completions, with the two adjacent completion
percentage data increments for every training pattern data set used as
testing data. In Table 3, testing data extracted for the 50% completion
training pattern were at 53% and 57% completion.
Similar arrangements were applied to the 67% and 90% completions.
[FIGURE 3 OMITTED]
(7) Search for predictive solution and comparison.
The proposed AI system, EGPIM, was applied to predict project
outcome based on factors identified in the three different learning sets
(i.e. 50%, 67%, and 90% completion). The performance of the proposed
system was evaluated using RMSE and an average error percentage.
3.2. Results
In order to highlight the potential and effectiveness of the
proposed system, EGPIM was compared against Evolutionary Fuzzy Support
Vector Machine Inference Model (ESFIM), support vector machines (SVM)
and against the original Gaussian process (GP). In this study, as
suggested parameter settings for SVMs by (Hsu, Lin 2002) and the GP were
established by conjugate gradients to find good hyper-parameter
settings. Table 4 shows the average RMSEs achieved by EGPIM, SVMs, and
GP. The accuracy obtained by EGPIM was significantly better than that
obtained by either SVM or GP; Although EFSIM obtained slightly better
results at the 50% and 67% completion stages, EGPIM earned significantly
better results than EFSIM at the 90% completion stage. Table 5 shows a
detailed error percentage for the three percentage completions.
Conclusion
This paper presented an implementation of an EGPIM to predict a
project outcome path and to determine the likely project outcome based
on identified time-dependent factors. CII's proprietary CAPP
software and database were employed to extract time-dependent factors
identified to be significantly associated with predicting a
project's outcome.
This study used historical case studies to examine EGPIM's
ability to predict a project's outcome. The results showed that
EGPIM has an excellent predictive capability. EGPIM's performance
was also demonstrated to be better than both SVMs and the GP in
practical applications.
These results highlight its suitability for construction projects,
as well as displaying its potential benefits to project managers. Since
decisions must be made for many events throughout a construction
project, project managers can use our model to compile the data and use
its predictions as a reference to help them make such important and
complex decisions.
This model holds great potential as a predictive tool when used
proactively to assess project outcome, giving project managers a better
chance to take actions necessary to ensure projects are accomplished
successfully.
Acknowledgement
The authors would like to thank the Construction Industry Institute
for their kind permission to use, analyse, extract, and publish data
from their CAPP database.
References
Azadnia, A.; Zahraie, B. 2010. Application of multi-objective
particle swarm optimization in operation management of reservoirs with
sedimentation problems, in Providence, RI, 2260-2268. May 16-20, 2010.
Bonilla, E. V.; Chai, K. M. A.; Williams, C. K. I. 2009. Multitask
Gaussian process prediction, in 21s' Annual Conference on Neural
Information Processing Systems, NIPS 2007, December 3-6, 2007,
Vancouver, BC, Canada.
Brahim-Belhouari, S.; Bermak, A. 2004. Gaussian process for
nonstationary time series prediction, Computational Statistics and Data
Analysis 47(4): 705-712. http://dx.doi.org/10.1016/j.csda.2004.02.006
Chamberlain, G.; Imbens, G. W. 2003. Nonparametric applications of
Bayesian inference, Journal of Business & Economic Statistics 21(1):
12-18. http://dx.doi.org/10.1198/073500102288618711
Chan, A. P. C.; Scott, D.; Chan, A. P. L. 2004. Factors affecting
the success of a construction project, Journal of Construction
Engineering and Management 130(1): 153-155.
http://dx.doi.org/10.1061/(ASCE)0733-9364 (2004) 130:1(153)
Cheng, M.-Y.; Wu, Y.-W.; Wu, C.-F. 2010. Project success prediction
using an evolutionary support vector machine inference model, Automation
in Construction 19(3): 302-307.
http://dx.doi.org/10.1016Zj.autcon.2009.12.003
Chu, W.; Ghahramani, Z. 2005. Gaussian processes for ordinal
regression, Journal of Machine Learning Research 6: 1019-1041.
Clerc, M.; Kennedy, J. 2002. The particle swarm--explosion,
stability, and convergence in a multidimensional complex space,
Evolutionary Computation, IEEE Transactions 6(1): 58-73.
Der Kiureghian, A. 2008. Analysis of structural reliability under
parameter uncertainties, Probabilistic Engineering Mechanics 23(4):
351-358. http://dx.doi.org/10.1016/j.probengmech.2007.10.011
Griffith, A. F.; Gibson, G. E. Jr; Hamilton, M. R.; Tortora A. L.;
Wilson, C. T. 1999. Project success index for capital facility
construction projects, Journal of Performance of Constructed Facilities
13(1): 39-15. http://dx.doi.org/10. 1061/(ASCE)0887-3828(1999)13:1(39)
Hsu, C. W.; Lin, C. J. 2002. A simple decomposition method for
support vector machines, Machine Learning 46(1--3): 291-314.
http://dx.doi.org/10.1023/A:1012427100071
Kennedy, J.; Eberhart, R. 1995. Particle swarm optimization, in
Perth, Aust, 1942-1948: IEEE.
Khosravi, S.; Afshari, H. 2011. A success measurement model for
construction projects, in International Conference on Financial
Management and Economics Singapore, IACSIT Press.
Ko, Ch.-H.; Cheng, M.-Y. 2007. Dynamic prediction of project
success using artificial intelligence, Journal of Construction
Engineering and Management 133(4): 316-324.
http://dx.doi.org/10.1061/(ASCE)0733-9364(2007) 133:4(316)
Kocijan, J.; Murray-Smith, R.; Rasmussen, C. E.; Girard, A. 2004.
Gaussian process model based predictive control, in Proceedings of the
2004 American Control Conference (AAC), June 30-July 2, 2004, Boston,
MA, United states, 2214-2219.
Li, T.; Fu, Q.; Meng, F. 2010. Research on partial least-squares
regression model based on particle swarm optimization and its
application, in 2nd International Workshop on Intelligent Systems and
Applications (ISA), May 22-23, 2010, Wuhan, 1-4.
http://dx.doi.org/10.1109/IWISA.2010.5473428
Maes, M. A. 2006, Exchangeable condition states and Bayesian
reliability updating, keynote address, in Proceedings, 13th IFIP WG7.5
Working Conference on Reliability and Optimization of Structural
Systems, October, Kobe, Japan, Taylor and Francis, 27-42.
Mahdavi Adeli, M.; Deylami, A.; Banazadeh, M.; Alinia, M. M. 2011.
A Bayesian approach to construction of probabilistic seismic demand
models for steel moment-resisting frames, Scientia Iranica 18(4 A):
885-894.
Markvardsen, A. J. 2004. Bayesian probability theory applied to the
space group problem in powder diffraction, in AIP Conference Proceedings
735(1): 219-226. http://dx.doi.org/10.1063/F1835216
Money, E. S.; Reckhow, K. H.; Wiesner, M. R. 2012. The use of
Bayesian networks for nanoparticle risk forecasting: model formulation
and baseline evaluation, Science of the Total Environment 426: 436-445.
http://dx.doi.org/10.1016/j.scitotenv.2012.03.064
Perelman, L., Ostfeld, A. 2012. Bayesian networks for estimating
contaminant source and propagation in a water distribution system using
cluster structure, in Proceedings of the 12th International Conference,
Water Distribution System Analysis 2010, 426-435.
Roy, A. F. V. 2009. Evolutionary fuzzy decision model for
construction management using weighted support vector machine: PhD
Thesis, Department of Construction Engineering, National Taiwan
University of Science and Technology.
Russell, J. S.; Jaselskis, E. J.; Lawrence S, P. 1997. Continuous
assessment of project performance, Journal of Construction Engineering
and Management 123(1): 64-71.
http://dx.doi.org/10.1061/(ASCE)0733-9364(1997) 123:1(64)
Sanvido, V.; Grobler, F.; Parfitt, K.; Guvenis, M.; Coyle, M. 1992.
Critical success factors for construction projects, Journal of
Construction Engineering and Management 118(1): 94-111.
http://dx.doi.org/10.1061/(ASCE)0733 9364(1992)118:1(94)
Seng, K. N. K.. 2008. Non-linear dynamics identification using
Gaussian process prior models within a Bayesian context: PhD Thesis,
Department of Electronic Engineering, National University of Ireland
Maynooth.
Su, G. S.; Xiao, Y. L. 2011. Gaussian process method for slope
reliability analysis, Yantu Gongcheng Xuebao/Chinese Journal of
Geotechnical Engineering 33(6): 916-920.
Yan, K.-Z.; Zhang, Z. 2011. Research in analysis of asphalt
pavement performance evaluation based on PSO-SVM, in 2011 International
Conference on Civil Engineering and Transportation, ICCET 2011, October
14-16, 2011, Jinan, China, 203-207.
Yan, Z.; Su, G.; Yan, L. 2011. Classification of surrounding rocks
in tunnel based on Gaussian process machine learning, in 2011
International Conference on Electric Technology and Civil Engineering
(ICETCE), April 22-24, 2011, Lushan, 3971-3974.
Zhao, F.; Zhang, Q.; Yang, Y. 2006. A scheduling holon modeling
method with petri net and its optimization with a novel PSO-GA
algorithm, in 2006 10th International Conference on Computer Supported
Cooperative Work in Design, CSCWD 2006, May 3-5, 2006, Nanjing, China,
1302-1307.
Min-Yuan CHENG, Chin-Chi HUANG, Andreas Franskie Van ROY
The National Taiwan University of Science and Technology, Taipei,
Taiwan, R.O.C.
Received 16 Mar 2012; 01 Jim 2012
Corresponding author: Andreas Franskie Van Roy
E-mail:
[email protected]
Min-Yuan CHENG. Professor of Construction Engineering, at the
National Taiwan University of Science and Technology, Taiwan. His
research interests include geographic information system, construction
automation, management information system, and construction management
process reengineering.
Chin-Chi HUANG. PhD, candidate of Construction Engineering, at the
National Taiwan University of Science and Technology, Taiwan. Her
research interests include construction management, project management
and decision analysis.
Andreas Franskie Van ROY. Professor of Civil Engineering, at the
Parahyangan Catholic University, Indonesia. His research interests
include construction automation, management information system, and
applications of artificial intelligence.
Table 1. Description of 11 time-dependent factors with levels of
significance
No Factors Column I.D. Significance
in CAPP level
1 Actual design % complete C5_16 0.01
2 Actual owner expenditure C3_10 0.01
3 Invoiced construction costs C2_14 0.02
4 Designer planned effort hours C2_13 0.01
5 Actual invoices for material and C3_28 0.01
equipment
6 Paid construction costs C3_14 0.01
7 Cost of owner project commitments C2_24 0.01
8 Recordable incident rate (by period) C2_38 0.01
9 Cost of change orders C2_17 0.02
10 Quantity of change orders C3_17 0.01
11 Actual overtime work C3_41 0.02
Table 2. The four quantitative values associated with project
outcomes
Degree of project outcome Value
Successful 1
On time or on budget 0.6667
Less-than-successful 0.3333
Disastrous 0
Table 3. Example learning and testing data applied to 50% completion
data for Project 233 (11 time-dependent variables and 1 output)
Input patterns
Quantitative
% project status 1 2 3
per period
Completion (Output) C5_16 C3_10 C2_14
Training 0 0 0 0 0
set 3 0 0.0714 0.0579 0
7 0 0.1224 0.0831 0
10 0 0.1837 0.0864 0
... ... ... ... ...
37 0.6667 0.8061 0.2872 0.0526
40 0.3333 0.9082 0.353 0.0897
43 0.6667 0.949 0.3923 0.1629
47 0.6667 0.9694 0.4293 0.1943
50 0.6667 0.9745 0.4598 0.2358
Training 53 0.66667 0.9796 0.4903 0.2772
set 57 0.66667 0.9898 0.5593 0.3991
Input patterns
% 4 5 6 7
Completion C2_13 C3_28 C3_14 C2_24
Training 0 0 0 0 0
set 3 0.0824 0 0 0
7 0.1418 0 0 0
10 0.2172 0 0 0
... ... ... ... ...
37 0.8396 0.0478 0.0345 0
40 0.8846 0.0954 0.0345 0
43 0.9224 0.2108 0.0897 0
47 0.9843 0.2119 0.0897 0
50 0.9879 0.2564 0.1263 0
Training 53 0.9915 0.3008 0.1629 0
set 57 0.9957 0.4388 0.1943 0
Input patterns
% 8 9 10 11
Completion C2_38 C2_17 C3_17 C3_41
Training 0 0 0 0 0
set 3 0 0 0 0
7 0 0 0 0
10 0 0 0 0
... ... ... ... ...
37 0 0.2977 0.339 0
40 0 0.3065 0.3559 0
43 0 0.312 0.3729 0
47 0 0.7582 0.7458 0
50 0.5 0.7582 0.7458 0
Training 53 1 0.7582 0.7458 0
set 57 1 0.7711 0.7627 0
Notes: Quantitative project statuses are assigned based on the four
project outcome categories of: successful (1), on-time or on-budget
(0.6667), less-than-successful (0.3333), and disastrous (0)._
Table 4. RMSE and average error percentage comparisons between EGPIM,
SVMs, and GP
% Completion
50%
RMSE 1 2 3 4
Average 0.0121 0.0081 0.1083 0.0687
Error (%) 1.21 0.8 10.81 6.84
C parameter -- 31 1 --
g parameter -- 0.0109 0.0909 --
[[sigma].sub.f]: 0.5778 0.987
[r.sub.1]: 1.3205 0.3126
[r.sub.2]: 2.5328 0.7454
[r.sub.3]: 1.6658 0.372
[r.sub.4]: 2.8168 1.0099
[r.sub.5]: 2.7209 0.708
Hyper- [r.sub.6]: 4.0550 -- 0.1338
parameter [r.sub.7]: 1.8488 0.4473
[r.sub.8]: 3.6237 1.0095
[r.sub.9]: 1.7727 0.7697
[r.sub.10]: 0.4882 0.1729
[r.sub.11]: 1.5168 0.6029
[[sigma].sub.n]: 2.3069 2.1071
% Completion
67%
RMSE 1 2 3 4
Average 0.0047 0.0039 0.3874 0.1402
Error (%) 6.56 0.3 38.74 13.83
C parameter -- 31 1 --
g parameter -- 0.5739 0.0909 --
[[sigma].sub.f]: 2.2979 1.1025
[r.sub.1]: 2.2110 0.8124
[r.sub.2]: 1.8518 1.4358
[r.sub.3]: 2.1463 1.0954
[r.sub.4]: 1.3067 1.5725
[r.sub.5]: 1.8320 0.9787
Hyper- [r.sub.6]: 1.0766 -- 0.2114
parameter [r.sub.7]: 0.6040 0.2721
[r.sub.8]: 5.3495 2.769
[r.sub.9]: 1.9695 0.7211
[r.sub.10]: 3.4762 0.2961
[r.sub.11]: 2.2946 0.9199
[[sigma].sub.n]: 1.1164 2.4139
% Completion
90%
RMSE 1 2 3 4
Average 1.86E-09 0.0001 0.016 0.0229
Error (%) 1.87e-07 0.01% 1.56 2.29
C parameter -- 31 1 --
g parameter -- 0.0734 0.0909 --
[[sigma].sub.f]: 1.8150 0.8386
[r.sub.1]: 1.6023 0.8848
[r.sub.2]: 7.0094 0.484
[r.sub.3]: 6.2906 0.8107
[r.sub.4]: 1.3197 0.3407
[r.sub.5]: 8.0565 0.9121
Hyper- [r.sub.6]: 3.7408 -- 0.5418
parameter [r.sub.7]: 2.2483 0.0299
[r.sub.8]: 1.5403 1.7214
[r.sub.9]: 3.0087 0.258
[r.sub.10]: 6.4939 0.3811
[r.sub.11]: 0.9605 0.9836
[[sigma].sub.n]: 11.5585 2.26
Notes: 1. EGPIM; 2. EFSIM (Quadratic time function); 3.SVM; 4. GP.
Table 5. Detailed error percentage comparisons for 50%, 67%, and 90%
completion
% Predicted Predicted
Completion % Desire 1 2 3 4
Completion
50% 53% 0.6667 0.6562 0.6595 0.5522 0.6055
57% 0.6667 0.6531 0.65773 0.5650 0.5911
Average Error %
67% 70% 0 0.0654 0.0034 0.3938 0.1149
73% 0 0.0658 0.0043 0.3809 0.1616
Average Error %
90% 93% 0 1.81E-09 0.0001 0.0189 0.0218
97% 0 1.92E-09 0.0001 0.0123 0.024
Average Error %
% Predicted Error Percentage *
Completion % 1 2 3 4
Completion
50% 53% 1.05 0.72 11.45 6.12
57% 1.36 0.90 10.17 7.56
1.21 0.81 10.81 6.84
67% 70% 6.54 0.34 39.38 11.49
73% 6.58 0.43 38.09 16.16
6.56 0.39 38.74 13.83
90% 93% 1.81E-07 0.01 1.89 2.18
97% 1.92E-07 0.01 1.23 2.4
1.87E-07 0.01 1.56 2.29
* Note: Error Percentage =[Predicted--Desire] x 00%
1 EGPIM
2 EFSIM with quadratic time function
3 SVM
4 GP