Time Series-Based Spoof Speech Detection Using Long Short-Term Memory and Bidirectional Long Short-Term Memory

Mirza, Arsalan R. and Al-Talabani, Abdulbasit K. (2024) Time Series-Based Spoof Speech Detection Using Long Short-Term Memory and Bidirectional Long Short-Term Memory. ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 12 (2). pp. 119-129. ISSN 2410-9355

Text (Research Article)
ARO.11636.VOL12.NO2.2024.ISSUE-PP119-129.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (1MB)

Official URL: http://dx.doi.org/10.14500/aro.11636

Abstract

Detecting fake speech in voice-based authentication systems is crucial for reliability. Traditional methods often struggle because they can't handle the complex patterns over time. Our study introduces an advanced approach using deep learning, specifically Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) models, tailored for identifying fake speech based on its temporal characteristics. We use speech signals with cepstral features like Mel-frequency cepstral coefficients (MFCC), Constant Q cepstral coefficients (CQCC), and open-source Speech and Music Interpretation by Large-space Extraction (OpenSMILE) to directly learn these patterns. Testing on the ASVspoof 2019 Logical Access dataset, we focus on metrics such as min-tDCF, Equal Error Rate (EER), Recall, Precision, and F1-score. Our results show that LSTM and BiLSTM models significantly enhance the reliability of spoof speech detection systems.

Item Type:	Article
Additional Information:	Abdul, Z.K., and Al-Talabani, A.K., 2022. Mel frequency cepstral coefficient and its applications: A review. IEEE Access, 10, pp. 122136-122158. DOI: https://doi.org/10.1109/ACCESS.2022.3223444 Adiban, M., Sameti, H., and Shehnepoor, S., 2020. Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge. Computer Speech and Language, 64, pp. 1-10. DOI: https://doi.org/10.1016/j.csl.2020.101105 Ahmed, N., Khan, J., Sheta, N., Tarek, R., Zualkernan, I., and Aloul, F., 2022. Detecting Replay Attack on Voice-Controlled Systems using Small Neural Networks. In: 2022 IEEE 7th Forum on Research and Technologies for Society and Industry Innovation, RTSI 2022, pp.50-54. DOI: https://doi.org/10.1109/RTSI55261.2022.9905158 Bai, Z., and Zhang, X.L., 2021. Speaker recognition based on deep learning: An overview. Neural Networks, 140, pp. 65-99. DOI: https://doi.org/10.1016/j.neunet.2021.03.004 Chakravarty, N., and Dua, M., 2023. Data augmentation and hybrid feature amalgamation to detect audio deep fake attacks. Physica Scripta, 98(9), p. 096001. DOI: https://doi.org/10.1088/1402-4896/acea05 Dave, N., 2013. Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology, 1(6), pp. 1-5. Devesh, K., Pavan, K.V., Ayush, A., and Mahadeva Prasanna, S.R., 2022. Fake Speech Detection Using OpenSMILE Features. Springer International Publishing, Berlin. Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., and Truong, K.P., 2016. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), pp. 190-202. DOI: https://doi.org/10.1109/TAFFC.2015.2457417 Eyben, F., Wöllmer, M., and Schuller, B., 2010. OpenSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: MM’10-Proceedings of the ACM Multimedia 2010 International Conference, pp.1459-1462. DOI: https://doi.org/10.1145/1873951.1874246 Hassan, F., and Javed, A., 2021. Voice Spoofing Countermeasure for Synthetic Speech Detection. In: 2021 International Conference on Artificial Intelligence, ICAI 2021, pp. 209-212. DOI: https://doi.org/10.1109/ICAI52203.2021.9445238 Hochreiter, S., and Schmidhuber, J., 1997. Long short-term memory. Neural Computation, 9(8), pp. 1735-1780. Jiang, Z., Huang, H., Yang, S., Lu, S., and Hao, Z., 2009. Acoustic Feature Comparison of MFCC and CZT-Based Cepstrum for Speech Recognition. In: 5th International Conference on Natural Computation, ICNC 2009, 1(200808003), pp.55-59. DOI: https://doi.org/10.1109/ICNC.2009.587 Kamble, M.R., Sailor, H.B., Patil, H.A., and Li, H., 2020. Advances in anti-spoofing: From the perspective of ASVspoof challenges. APSIPA Transactions on Signal and Information Processing, 9, e2. DOI: https://doi.org/10.1017/ATSIP.2019.21 Karo, M., Yeredor, A., and Lapidot, I., 2024. Compact time-domain representation for logical access spoofed audio. IEEE/ACM Transactions on Audio Speech and Language Processing, 32, pp.946-958. DOI: https://doi.org/10.1109/TASLP.2023.3341000 Kinnunen, T., Delgado, H., Evans, N., Lee, K.A., Vestman, V., Nautsch, A., Todisco, M., Wang, X., Sahidullah, M., Yamagishi, J., and Reynolds, D.A., 2020. Tandem assessment of spoofing countermeasures and automatic speaker verification: Fundamentals. IEEE/ACM Transactions on Audio Speech and Language Processing, 28, pp. 2195-2210. DOI: https://doi.org/10.1109/TASLP.2020.3009494 Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., and Lee, K.A., 2017. The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. In: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, 2017-August, pp.2-6. DOI: https://doi.org/10.21437/Interspeech.2017-1111 Kumari, T.R.J., and Jayanna, H.S., 2015. Comparison of LPCC and MFCC Features and GMM and GMM-UBM Modeling for Limited Data Speaker Verification. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research, IEEE ICCIC 2014, pp. 95-103. DOI: https://doi.org/10.1109/ICCIC.2014.7238329 McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., and Nietok, O., 2015. Librosa: Audio and Music Signal Analysis in Python. In: Proceedings of the 14th Python in Science Conference, (Scipy), pp.18-24. DOI: https://doi.org/10.25080/Majora-7b98e3ed-003 Nautsch, A., Wang, X., Evans, N., Kinnunen, T., Vestman, V., Todisco, M., Delgado, H., Sahidullah, M., Yamagishi, J., and Lee, K.A., 2021. ASVspoof 2019: Spoofing countermeasures for the detection of synthesized, converted and replayed speech. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(2), pp. 252-265. DOI: https://doi.org/10.1109/TBIOM.2021.3059479 Novoselov, S., Kozlov, A., Lavrentyeva, G., Simonchik, K., and Shchemelinin, V., 2016. STC Anti-Spoofing Systems for the ASVspoof 2015 Challenge. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp.5475-5479. DOI: https://doi.org/10.1109/ICASSP.2016.7472724 Patel, T.B., and Patil, H.A., 2015. Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.2062-2066. DOI: https://doi.org/10.21437/Interspeech.2015-467 Rahmeni, R., Aicha, A.B., and Ayed, Y.B., 2020. Acoustic features exploration and examination for voice spoofing counter measures with boosting machine learning techniques. Procedia Computer Science, 176, pp. 1073-1082. DOI: https://doi.org/10.1016/j.procs.2020.09.103 Siami-Namini, S., Tavakoli, N., and Namin, A.S., 2019. The Performance of LSTM and BiLSTM in Forecasting Time Series. In: Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, pp.3285-3292. DOI: https://doi.org/10.1109/BigData47090.2019.9005997 Tian, X., Xiao, X., Chng, E.S., and Li, H., 2017. Spoofing Speech Detection using Temporal Convolutional Neural Network. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. DOI: https://doi.org/10.1109/APSIPA.2016.7820738 Todisco, M., Delgado, H., and Evans, N., 2016. A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients. In: Odyssey 2016: Speaker and Language Recognition Workshop, pp.283-290. DOI: https://doi.org/10.21437/Odyssey.2016-41 Todisco, M., Delgado, H., and Evans, N., 2017. Constant Q cepstral coefficients: Aspoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, pp. 516-535. DOI: https://doi.org/10.1016/j.csl.2017.01.001 Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Yamagishi, J., Evans, N., Kinnunen, T., and Aik Lee, K., 2019. ASVSpoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, pp.1008-1012. DOI: https://doi.org/10.21437/Interspeech.2019-2249 Wang, X., Yamagishi, J., Todisco, M., Delgado, H., Nautsch, A., Evans, N., Sahidullah, M., Vestman, V., Kinnunen, T., Lee, K.A., Juvela, L., Alku, P., Peng, Y.H., Hwang, H.T., &... Ling, Z.H., 2020. ASVspoof 2019: Alarge-scale public database of synthetized, converted and replayed speech. Computer Speech and Language, 64, 101114. DOI: https://doi.org/10.1016/j.csl.2020.101114 Wei, C., Pang, R., and Kuo, C.C.J., 2024. AGreen Learning Approach to Spoofed Speech Detection. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.12956-12960. DOI: https://doi.org/10.1109/ICASSP48485.2024.10448336 Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J., Hanilci, C., Sahidullah, M., and Sizov, A., 2015. ASVspoof 2015: The First Automatic Speaker Verification Spoofing and Countermeasures Challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.2037-2041. DOI: https://doi.org/10.21437/Interspeech.2015-462 Wu, Z., Yamagishi, J., Kinnunen, T., Hanilçi, C., Sahidullah, M., Sizov, A., Evans, N., Todisco, M., and Delgado, H., 2017. ASVspoof: The automatic speaker verification spoofing and countermeasures challenge. IEEE Journal on Selected Topics in Signal Processing, 11(4), pp. 588-604. DOI: https://doi.org/10.1109/JSTSP.2017.2671435 Yang, J., Das, R.K., and Li, H., 2020. Significance of subband features for synthetic speech detection. IEEE Transactions on Information Forensics and Security, 15(c), pp. 2160-2170. DOI: https://doi.org/10.1109/TIFS.2019.2956589 Zhou, J., Hai, T., Jawawi, D.N.A., Wang, D., Ibeke, E., and Biamba, C., 2022. Voice spoofing countermeasure for voice replay attacks using deep learning. Journal of Cloud Computing, 11(1), 51. DOI: https://doi.org/10.1186/s13677-022-00306-5
Uncontrolled Keywords:	Bidirectional Long Short-Term Memory, Constant Q cepstral coefficients, Countermeasure Spoofing, Long Short-Term Memory, Mel-frequency cepstral coefficients, Open-source speech and music interpretation by large-space extraction
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	ARO-The Scientific Journal of Koya University > VOL 12, NO 2 (2024)
Depositing User:	Dr Salah Ismaeel Yahya
Date Deposited:	07 May 2025 08:33
Last Modified:	07 May 2025 08:33
URI:	http://eprints.koyauniversity.org/id/eprint/507

Actions (login required)

View Item