Arabic Sentiment Analysis using Apache Spark


Article PDF :

Veiw Full Text PDF

Article type :

Original article

Author :

Mohamed A. Ahmed

Volume :

4

Issue :

1

Abstract :

People express their feelings and emotions on social media platforms including Twitter. Twitter blogging system currently includes huge Arabic user-generated contents. These Arabic data are rapidly increasing in volume. Arabic language has special characteristics, syntax, grammar and morphological rules. In this research, we are investigating the development of a big data system to analyze the emotion of Arabic-related contents. This application could be useful for many monitoring, marketing, recommendation and decision support systems. We make use of machine learning algorithm to achieve this goal. We particularly incorporate and compare the supervised-based Naïve-Bayes and Logistic Regression algorithms as the main machine learning engines that processes the Arabic-language tweets in order to classify them if the new tweets as either positive, negative or neutral. Before applying the Naïve-Bayes and logistic regression processing algorithms, we apply a pre-processing phase of the input tweets to generate numerical features instead of input text and emojis to make it suitable for the processing algorithms. We have developed a customized pre-processing pipeline that includes several Arabic NLP (ANLP) preparation steps and that suits Arabic language and the contents of real-life tweets. In addition, these two machine learning algorithms have several hyper-parameters that could affect the performance and accuracy of the algorithm. Therefore, we have utilized cross-validation techniques to evaluate and detect the best possible hyper-parameters combination that results in the best accuracy results of classification outputs. Experiments show promising results using the designed system. We will present some experiment results that show the accuracy of the system against real-life Arabic tweets data in terms of f1-score, weighted precision and weighted recall evaluation metrics.

Keyword :

Big Data, Machine Learning, Arabic NLP, Emojis, Cross-Validation
Journals Insights Open Access Journal Filmy Knowledge Hanuman Devotee Avtarit Wiki In Hindi Multiple Choice GK