Web Page Ranking Based on Text Content and Link Information Using Data Mining Techniques

Naamha, Esraa Q. and Abdulmunim, Matheel E. (2024) Web Page Ranking Based on Text Content and Link Information Using Data Mining Techniques. ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 12 (1). pp. 29-40. ISSN 2410-9355

[img] Text (Research Article)
ARO.11397.VOL12.NO1.2024.ISSUE22-PP29-40.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (712kB)
Official URL: http://dx.doi.org/10.14500/aro.11397

Abstract

Thanks to the rapid expansion of the Internet, anyone can now access a vast array of information online. However, as the volume of web content continues to grow exponentially, search engines face challenges in delivering relevant results. Early search engines primarily relied on the words or phrases found within web pages to index and rank them. While this approach had its merits, it often resulted in irrelevant or inaccurate results. To address this issue, more advanced search engines began incorporating the hyperlink structures of web pages to help determine their relevance. While this method improved retrieval accuracy to some extent, it still had limitations, as it did not consider the actual content of web pages. The objective of the work is to enhance Web Information Retrieval methods by leveraging three key components: text content analysis, link analysis, and log file analysis. By integrating insights from these multiple data sources, the goal is to achieve a more accurate and effective ranking of relevant web pages in the retrieved document set, ultimately enhancing the user experience and delivering more precise search results the proposed system was tested with both multi-word and single-word queries, and the results were evaluated using metrics such as relative recall, precision, and F-measure. When compared to Google’s PageRank algorithm, the proposed system demonstrated superior performance, achieving an 81% mean average precision, 56% average relative recall, and a 66% F-measure.

Item Type: Article
Additional Information: Afolabi, I.T., Makinde, O.S., and Oladipupo, O.O., 2019. Semantic web mining for content-based online shopping recommender systems. International Journal of Intelligent Information Technologies, 15(4), pp.41-56. DOI: https://doi.org/10.4018/IJIIT.2019100103 Al-Anzi, F., and Abuzeina, D., 2020. Enhanced latent semantic indexing using cosine similarity measures for medical application. International Arab Journal of Information Technology, 17(5), pp.742-749. DOI: https://doi.org/10.34028/iajit/17/5/7 Alhaidari, F., Alwarthan, S., and Alamoudi, A., 2020. User preference based weighted page ranking algorithm. In: ICCAIS 2020-3rd International Conference on Computer Applications and Information Security, pp.1-6. DOI: https://doi.org/10.1109/ICCAIS48893.2020.9096823 Ali, F., and Khusro, S., 2021. Content and link-structure perspective of ranking webpages: A review. Computer Science Review, 40, p.100397. DOI: https://doi.org/10.1016/j.cosrev.2021.100397 Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., and Kochut, K., 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. Journal of Intelligent Information Systems, 2017, 1(1), pp.1-13. Ghani, W.A., and Hussain, A., 2021. Applying similarity measures to improve query expansion. Iraqi Journal of Science, 62(6), pp.2053-2063. DOI: https://doi.org/10.24996/ijs.2021.62.6.31 Guwta, M., 2021. Information Retrieval for Silt’e Text Using Latent Semantic Indexing. M.C. Thesis. Bahir Dar University. Hazarika, D., Konwar, D., and Bora, D.J., 2020. Sentiment Analysis on Twitter by Using TextBlob for Natural Language Processing. In: Proceedings of the International Conference on Research in Management and Technovation 2020. Vol. 24, pp.63-67. DOI: https://doi.org/10.15439/2020KM20 Ilo, P.I., Nkiko, C., Izuagbe, R., and Furfuri, I.M.M., 2023. Course Guide Lis 303 Information Retrieval (Cataloguing ii). National Open University of Nigeria, Nsukka.Thakur, N., Mehrotra, D., Bansal A., and Bala M., 2019. Comparative analysis of ranking functions for retrieving information from medical repository. Malaysian Journal of Computer Science, 32(1), pp.18-30. DOI: https://doi.org/10.22452/mjcs.vol32no1.2 Jain, S., Jain, S.C., and Vishwakarma, S.K., 2020. Analysis of text classification with various term weighting schemes in vector space model. International Journal of Innovative Technology and Exploring Engineering, 9(10), pp.390-393. DOI: https://doi.org/10.35940/ijitee.D1938.0891020 Jain, S., Vishwakarma, S., and Jain, S.C., 2023. Analysis of term weighting schemes in vector space model for text classification. Journal of Integrated Science and Technology, 11(2), p.469. Joby, P.P., 2020. Expedient information retrieval system for web pages using the natural language modelling. Journal of Artificial Intelligence and Capsule Networks, 2(2), pp.100-110. DOI: https://doi.org/10.36548/jaicn.2020.2.003 Kleinberg, J.M., 2011. Authoritative sources in a hyperlinked environment. In: The Structure and Dynamics of Networks. Princeton University Press, Princeton, pp.514-542. DOI: https://doi.org/10.1515/9781400841356.514 Lu, J., Henchion, M., and Namee, B.M., 2020. Diverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasks. In: LREC 2020-12th International Conference on Language Resources and Evaluation, Conference Proceedings. Vol. 2, pp.6740-6744. Mustafa, A.B., Ghulam, S.K., Naadiya, M., and Sheeba, M., 2022. Web content mining techniques for structured data: A review. Sindh Journal of Headways in Software Engineering, 1(1), pp.1-10. Nassar, M.O., Kanaan, G., and Awad, H.A.H., 2010. Comparison between Different Global Weighting Schemes. In: Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010. Vol. I, pp.690-692. Patel, S.H., and Desai, A.A., 2019. Link analysis to discover relevant documents using information retrieval. International Journal of Computer Applications, 178(10), pp.23-27. DOI: https://doi.org/10.5120/ijca2019918827 Payal, L.S., 2020. A study of different web mining types. Anveshana’s International Journal of Research in Engineering and Applied Sciences, 5(3), pp.30-33. Phyu, A.P., and Thu, E.E., 2021. Short survey of data mining and web mining using cloud computing. International Journal of Advanced Networking and Applications, 12(05), pp.4725-4731. DOI: https://doi.org/10.35444/IJANA.2021.12509 Qi, Q., Hessen, D.J., and van der Heijden, P.G.M., 2023. Improving Information Retrieval Through Correspondence Analysis Instead of Latent Semantic Analysis. Journal of Intelligent Information Systems, 2023, 1(1), pp.1-44. DOI: https://doi.org/10.1007/s10844-023-00815-y Rathi, R.N., and Mustafi, A., 2023. The importance of term weighting in semantic understanding of text: A review of techniques. Multimedia Tools and Applications, 82(7), pp.9761-9783. DOI: https://doi.org/10.1007/s11042-022-12538-3 Reddy, K.P., Reddy, T.R., Naidu, G.A., and Vardhan, B.V., 2018. Impact of similarity measures in information retrieval. International Journal of Computational Engineering Research, 8(6), pp.54-59. Robert, B., and Brown, E.B., 2004. The PageRank Citation Ranking: Bringing Order to the Web. Vol. 1, University of Pennsylvania, Philadelphia, PA, pp.1-14. Shahmirzadi, O., Lugowski, A., and Younge, K., 2019. Text Similarity in Vector Space Models: A Comparative Study. In: Proceeding-18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, pp.659-666. DOI: https://doi.org/10.1109/ICMLA.2019.00120 Sharma, D., Shukla, R., Giri, A.K., and Kumar, S., 2019. A Brief Review on Search ENGINE Optimization. In: Proceedings of the 9th International Conference On Cloud Computing, Data Science and Engineering, Confluence 2019, pp.687-692. DOI: https://doi.org/10.1109/CONFLUENCE.2019.8776976 Sharma, P.S., Yadav, D., and Garg, P., 2020. A systematic review on page ranking algorithms. International Journal of Information Technology, 12(2), pp.329-337. DOI: https://doi.org/10.1007/s41870-020-00439-3 Sharma, P.S., Yadav, D., and Thakur, R.N., 2022. Web page ranking using web mining techniques: A comprehensive survey. Mobile Information Systems, 2022, p.7519573. DOI: https://doi.org/10.1155/2022/7519573 Tyagi, N., and Gupta, S.K., 2018. Web structure mining algorithms: A survey. Advances in Intelligent Systems and Computing, 654, pp.305-317. DOI: https://doi.org/10.1007/978-981-10-6620-7_30 Wang, J., and Dong, Y., 2020. Measurement of text similarity: A survey. Information, 11(9), p.421. DOI: https://doi.org/10.3390/info11090421 Wu, H., and Gu, X., 2014. Reducing Over-weighting in Supervised Term Weighting for Sentiment Analysis. In: COLING 2014-25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers, pp.1322-1330. Xing, W., and Ghorbani, A., 2004. Weighted PageRank Algorithm. In: Proceedings-Second Annual Conference on Communication Networks and Services Research, pp.305-314. DOI: https://doi.org/10.1109/DNSR.2004.1344743 Zheng, W., and Fang, H., 2010. ARetrieval System based on Sentiment Analysis. HCIR. [Preprint].
Uncontrolled Keywords: Information retrieval, JSON API, Programmable (CSE), Search engine, World Wide Web, Web page ranking
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: ARO-The Scientific Journal of Koya University > VOL 12, NO 1 (2024)
Depositing User: Dr Salah Ismaeel Yahya
Date Deposited: 02 Sep 2024 06:57
Last Modified: 02 Sep 2024 06:57
URI: http://eprints.koyauniversity.org/id/eprint/467

Actions (login required)

View Item View Item