摘要:We propose a multi-layer data mining architecture for web services discovery using wordembedding and clustering techniques to improve the web service discovery process. The proposedarchitecture consists of five layers: web services description and data preprocessing; word embeddingand representation; syntactic similarity; semantic similarity; and clustering. In the first layer, weidentify the steps to parse and preprocess the web services documents. In the second layer, Bagof Words with Term Frequency–Inverse Document Frequency and three word-embedding modelsare employed for web services representation. In the third layer, four distance measures, namely,Cosine, Euclidean, Minkowski, and Word Mover, are considered to find the similarities betweenWeb services documents. In layer four, WordNet and Normalized Google Distance are employedto represent and find the similarity between web services documents. Finally, in the fifth layer,three clustering algorithms, namely, affinity propagation, K-means, and hierarchical agglomerativeclustering, are investigated for clustering of web services based on observed similarities in documents.We demonstrate how each component of the five layers is employed in web services clustering usingrandomly selected web services documents. We conduct experimental analysis to cluster webservices using a collected dataset consisting of web services documents and evaluate their clusteringperformances. Using a ground truth for evaluation purposes, we observe that clusters built based onthe word embedding models performed better than those built using the Bag of Words with TermFrequency–Inverse Document Frequency model. Among the three word embedding models, thepre-trained Word2Vec’s skip-gram model reported higher performance in clustering web services.Among the three semantic similarity measures, path-based WordNet similarity reported higherclustering performance. By considering the different word representations models and syntactic andsemantic similarity measures, we found that the affinity propagation clustering technique performedbetter in discovering similarities among Web services.