文章基本信息

标题：Multi-Layer Web Services Discovery Using Word Embedding and Clustering Techniques
本地全文：下载
作者：Waeal J. Obidallah ; Bijan Raahemi ; Waleed Rashideh 等
期刊名称：Data
印刷版ISSN：2306-5729
出版年度：2022
卷号：7
期号：5
页码：1-21
DOI：10.3390/data7050057
语种：English
出版社：MDPI Publishing
摘要：We propose a multi-layer data mining architecture for web services discovery using wordembedding and clustering techniques to improve the web service discovery process. The proposedarchitecture consists of ﬁve layers: web services description and data preprocessing; word embeddingand representation; syntactic similarity; semantic similarity; and clustering. In the ﬁrst layer, weidentify the steps to parse and preprocess the web services documents. In the second layer, Bagof Words with Term Frequency–Inverse Document Frequency and three word-embedding modelsare employed for web services representation. In the third layer, four distance measures, namely,Cosine, Euclidean, Minkowski, and Word Mover, are considered to ﬁnd the similarities betweenWeb services documents. In layer four, WordNet and Normalized Google Distance are employedto represent and ﬁnd the similarity between web services documents. Finally, in the ﬁfth layer,three clustering algorithms, namely, afﬁnity propagation, K-means, and hierarchical agglomerativeclustering, are investigated for clustering of web services based on observed similarities in documents.We demonstrate how each component of the ﬁve layers is employed in web services clustering usingrandomly selected web services documents. We conduct experimental analysis to cluster webservices using a collected dataset consisting of web services documents and evaluate their clusteringperformances. Using a ground truth for evaluation purposes, we observe that clusters built based onthe word embedding models performed better than those built using the Bag of Words with TermFrequency–Inverse Document Frequency model. Among the three word embedding models, thepre-trained Word2Vec’s skip-gram model reported higher performance in clustering web services.Among the three semantic similarity measures, path-based WordNet similarity reported higherclustering performance. By considering the different word representations models and syntactic andsemantic similarity measures, we found that the afﬁnity propagation clustering technique performedbetter in discovering similarities among Web services.
关键词：web services clustering;web services discovery;word embedding;clustering;semanticsimilarity;syntactic similarity