A Comparative Study of Deep Learning Approaches for Arabic Language Processing
Mahmoud Mohamed, Khaled Alosman |Pages: XXX-XXX|

Abstract— Arabic is a difficult language for natural language processing (NLP) because of its complicated morphology, dialectal differences and the limited annotated resources. Although deep learning algorithms have reached state-of-the-art results in many NLP tasks, comprehensive comparative studies for Arabic remains scarce. This paper addresses this gap by systematically evaluating three prominent deep learning architectures – namely Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs) and Transformers – across five essential Arabic NLP tasks: i) mention of sentiment analysis, ii) named entity recognition, iii) machine translation,     iv) text classification and v) dialect identification. We differ the performance of models trained from scratch with fine-tuned versions of AraBERT, a powerful Transformer-based model pre-trained on a large Arabic corpus. Our experiments employ the Arabic datasets already existing in literature and utilizes accuracy, F1-score and BLEU as the evaluation metrics. The results are indicative of the supremacy of Transformer-based models with regard to AraBERT that shows the highest scores in each task. Notably, AraBERT attains 95. 2% accuracy on sentiment analysis, which is higher than the accuracies of RNNs and CNNs. These improvements also become apparent in other tasks, with AraBERT ending up as the best among RNN, CNN and others. A 3-point difference for 3 BLEU in machine translation and 2. 3% F1-score on dialect recognition. This extensive assessment, in turn, highlights the advantages and disadvantages of deep learning architectures for Arabic NLP. The excellent AraBERT representation also demonstrates how transfer learning and synergy between Transformer architectures and large-scale pre-training can significantly help Arabic language technology development.


DOI: https://doi.org/10.5455/jjee.204-1711016538