A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures

Abdul-Jabbar, Safa and George, Loay (2017) A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures. ARO-The Scientific Journal of Koya University, 5 (2). pp. 6-18. ISSN 24109355

[img] Archive
ARO.10180-VOL5.No2.2017.ISSUE09-PP6-18.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (859kB)

Abstract

This paper aims to introduce an optimized Damerau– Levenshtein and dice-coefficients using enumeration operations (ODADNEN) for providing fast string similarity measure with maintaining the results accuracy; searching to find specific words within a large text is a hard job which takes a lot of time and efforts. The string similarity measure plays a critical role in many searching problems. In this paper, different experiments were conducted to handle some spelling mistakes. An enhanced algorithm for string similarity assessment was proposed. This algorithm is a combined set of well-known algorithms with some improvements (e.g. the dice-coefficient was modified to deal with numbers instead of characters using certain conditions). These algorithms were adopted after conducting on a number of experimental tests to check its suitability. The ODADNN algorithm was tested using real data; its performance was compared with the original similarity measure. The results indicated that the most convincing measure is the proposed hybrid measure, which uses the Damerau–Levenshtein and dice-distance based on n-gram of each word to handle; also, it requires less processing time in comparison with the standard algorithms. Furthermore, it provides efficient results to assess the similarity between two words without the need to restrict the word length.

Item Type: Article
Uncontrolled Keywords: Word classification, Word clustering, String distance, String matching operation, String similarity metric
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Aro-The Scientific Journal of Koya University > VOL 5, NO 2 (2017)
Depositing User: Dr Salah Ismaeel Yahya
Date Deposited: 23 Oct 2017 20:17
Last Modified: 17 Apr 2018 08:01
URI: http://eprints.koyauniversity.org/id/eprint/111

Actions (login required)

View Item View Item