首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Multi-Layer Web Services Discovery Using Word Embedding and Clustering Techniques
  • 本地全文:下载
  • 作者:Waeal J. Obidallah ; Bijan Raahemi ; Waleed Rashideh
  • 期刊名称:Data
  • 印刷版ISSN:2306-5729
  • 出版年度:2022
  • 卷号:7
  • 期号:5
  • 页码:1-21
  • DOI:10.3390/data7050057
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:We propose a multi-layer data mining architecture for web services discovery using wordembedding and clustering techniques to improve the web service discovery process. The proposedarchitecture consists of five layers: web services description and data preprocessing; word embeddingand representation; syntactic similarity; semantic similarity; and clustering. In the first layer, weidentify the steps to parse and preprocess the web services documents. In the second layer, Bagof Words with Term Frequency–Inverse Document Frequency and three word-embedding modelsare employed for web services representation. In the third layer, four distance measures, namely,Cosine, Euclidean, Minkowski, and Word Mover, are considered to find the similarities betweenWeb services documents. In layer four, WordNet and Normalized Google Distance are employedto represent and find the similarity between web services documents. Finally, in the fifth layer,three clustering algorithms, namely, affinity propagation, K-means, and hierarchical agglomerativeclustering, are investigated for clustering of web services based on observed similarities in documents.We demonstrate how each component of the five layers is employed in web services clustering usingrandomly selected web services documents. We conduct experimental analysis to cluster webservices using a collected dataset consisting of web services documents and evaluate their clusteringperformances. Using a ground truth for evaluation purposes, we observe that clusters built based onthe word embedding models performed better than those built using the Bag of Words with TermFrequency–Inverse Document Frequency model. Among the three word embedding models, thepre-trained Word2Vec’s skip-gram model reported higher performance in clustering web services.Among the three semantic similarity measures, path-based WordNet similarity reported higherclustering performance. By considering the different word representations models and syntactic andsemantic similarity measures, we found that the affinity propagation clustering technique performedbetter in discovering similarities among Web services.
  • 关键词:web services clustering;web services discovery;word embedding;clustering;semanticsimilarity;syntactic similarity
国家哲学社会科学文献中心版权所有