sentiment analysis research topics

Sentiment Analysis Projects & Topics For Beginners [2024]

Are you studying sentiment analysis and want to test your knowledge? If you are, then you’ve come to the right place. In this article, we’re discussing sentiment analysis project ideas with which you can test your knowledge and showcase your understanding.

We know how tricky it is to find great project ideas. We also know how beneficial it is to complete projects. With projects, you can strengthen your knowledge, enhance your portfolio, and bag better roles.

The following article will talk about some of the best Sentiment Analysis Python project ideas. It will also shed some light on the various types, importance, and applications of Sentiment analysis in today’s world. By the end of it, you’ll be encouraged to work on prominent sentiment analysis and capstone project ideas.

Join Best Machine Learning Course online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.

So without further ado, let’s get started.

What is Sentiment Analysis?

Sentiment analysis is a kind of data mining where you measure the inclination of people’s opinions by using NLP (natural language processing), text analysis, and computational linguistics. We perform sentiment analysis mostly on public reviews, social media platforms, and similar sites. Following are the main types of sentiment analysis :

Fine-grained

Fine-grained sentiment analysis gives precise results to what the public opinion is about the subject. It classified its results in different categories such as: Very Negative, Negative, Neutral, Positive, Very Positive.

Detecting Emotion

This kind of sentiment analysis identifies emotions such as anger, happiness, sadness, and others. Many times, you’ll use lexicons to recognize emotions. However, lexicons have drawbacks too, and in those cases, you’d need to use ML algorithms .

Based on Aspect

In aspect-based sentiment analysis, you look at the aspect of the thing people are talking about. Suppose you have reviews of a smartphone, you might want to see what the people are talking about its battery life or its screen size.

Multilingual

Sometimes organizations need to analyze the text of different languages. This form of sentiment analysis is considerably challenging and requires a lot of effort because you’d need many resources.

Sentiment analysis has many applications in various industries. As it helps in understanding public opinion, companies use sentiment analysis in doing market research and figuring out if their customers like a particular product (or service) or not. Then, according to the findings of the sentiment analysis, the organization can modify the respective product or service and achieve better results.

Must Read : Free nlp online course !

All in all, it helps companies in understanding their customers better. Companies can serve their customers better when they know where they lag and where they excel.

Why Is Sentiment Analysis Required?

Before we delve into the various sentiment analysis project ideas, such as Twitter sentiment analysis project idea, or sentiment analysis of IBDb reviews, let’s take a look at some of the reasons why Sentiment analysis is important.

In this technology-driven world, a majority portion of the data that we come across is unstructured. Whether it is in the form of emails, texts, or documents, the said data need to be properly structured and then analyzed further. This is where sentiment analysis comes into play. It not only helps to store data in an efficient and cost-friendly manner, but you can also solve certain real-time issues with the help of the same.

Various Approaches Used In Sentiment Analysis

Broadly, there are three main approaches to sentiment analysis. They are-

Rule-based approach- Unlike the other approaches, the rule-based approach is quite easy to comprehend. It basically counts the total number of negative and positive words present in the data set. Following this, if the result indicates that the number of positive words is more than the number of negative words, then the sentiment is positive, and vice versa.
Automatic Approach- In this approach, the data set is initially trained, following which predictive analysis is done. After completion of this stage, words are extracted from the text. This can be done with the help of various techniques, some of which might include Linear Regression, Support Vector, and Naive Bayes, among others.
Hybrid Approach- As the name suggests, this approach is basically an amalgamation of both the rule-based approach and the automatic approach. It delivers more accurate results when compared to the other approaches.

Applications of Sentiment Analysis

There is a wide range of applications for sentiment analysis. Some of them have been discussed in the following list.

Social Media- The comments on popular social media sites such as Instagram, Facebook, and Twitter are analyzed and then furthermore categorized into different segments, such as positive, negative, and neutral.
Customer Service- One of the perfect examples might include the comment section in the Google Playstore application, wherein comments from 1 to 5, are usually selected with the help of the various sentiment analysis approaches.
Marketing Sector- The marketing industry has benefited a lot from sentiment analysis. It has helped brand owners to understand the review of a product or service, and whether it has been categorized as good or bad by the consumers.

In the following points, we’ve discussed some prominent sentiment analysis project ideas, pick one according to your interests and expertise:

Sentiment Analysis Project Ideas

The following are our sentiment analysis projects. Our list has projects for all skill levels so that you can choose comfortably:

1. Analyze Amazon Product Reviews

Amazon is the biggest e-commerce store on the planet. This means it also has one of the largest product selections available. Many times, companies want to understand the public opinion on their product and figure out what’s responsible for the same. For that purpose, they perform sentiment analysis on their product reviews.

It helps them in recognizing the primary issues with their products (if there are any). Some products have thousands of reviews on Amazon while some others only have a few hundred.

It is one of the most sentiment analysis projects because the demand for such expertise is very high. Companies want experts to analyze their product reviews for market research.

You can get the dataset for this project here: Amazon Product Reviews Dataset .

Working on this project will make you familiar with many aspects of sentiment analysis. If you’re a beginner, you can start with a small product and analyze reviews of the same. On the other hand, if you’re looking for a challenge, you can take a popular product and analyze its reviews.

2. Rotten Tomatoes and Their Reviews

Rotten Tomatoes is a review site where you’ll find an aggregate of critics’ opinions on movies and shows. You can find reviews on nearly every show, TV series, or drama there. Admittedly, it’s also a great place to get data from.

You can perform sentiment analysis on the reviews present on this site as a part of your sentiment analysis projects. The entertainment sector takes critic reviews very seriously. By analyzing critic reviews, a production company can understand why its particular title succeeded (or failed). Critic reviews influence the commercial success of a title considerably as well.

With sentiment analysis, you can figure out what’s the general opinion of critics on a particular movie or show. This project is an excellent way for you to figure out how sentiment analysis can help entertainment companies such as Netflix.

You can get the dataset for this project here: Rotten Tomatoes dataset .

3. Twitter Sentiment Analysis

Twitter is a great place for performing sentiment analysis. You can get public opinion on any topic through this platform. This is one of the intermediate-level sentiment analysis project ideas. You should have some experience in performing opinion mining (another name for sentiment analysis) before you work on this task. As it’s a popular project idea, we’ve discussed in a little more detail:

Prerequisites

You should have a basic knowledge of programming. You can either be familiar with Python or R (it’d be great if you’re familiar with both). However, it’s not necessary to have expert-level knowledge of programming. Apart from programming, you should also know how to split datasets and use the RESTful API because you’ll have to use Twitter API here. You should also be familiar with the Naive Bayes Classifier as we’ll be using it to classify our data later in the project.

This project isn’t easy, and it’ll take a little time (downloading data from twitter takes hours).

Working on the Project

First, you’ll need to get authorized credentials from Twitter to use the Twitter API. It takes some time to authorize a Twitter Developer Account, but once you have it, you can go to your dashboard and ‘Create an app’.

After you have the necessary credentials, you can create the function and build a test set. Twitter has a limit on the number of requests one can make through their API, which they have added this limit for security reasons. The ceiling is 180 requests in 15 minutes. You can keep the test set to have 100 tweets.

After creating the test set, you’ll have to build the training set by using Twitter API, which is the hardest part of this project. Make sure that you save the tweets you gather from the API in a CSV file for future use.

After preparing the training set, you only have to preprocess the tweets present in the datasets. Remember, emojis, images, and other non-textual components don’t affect the polarity of sentiment analysis. To include pictures and other parts in your sentiment analysis, you’ll have to use Deep Learning. Make sure that you remove all the duplicate characters and typos from your data. Data cleaning is vital to get the best results possible.

After cleaning the data, you can use the Naive Bayes Classifier for analyzing the dataset available. Finally, you’ll have to test your model and see if it’s producing the desired results or not.

This Twitter sentiment analysis project can help you gain practical experience in handling real-world data, applying sentiment analysis techniques, and interpreting results, making it a valuable project for those looking to enhance their skills in data science and natural language processing.

As you may have realized, this project will take some effort. But performing sentiment analysis on Twitter is a great way to test your knowledge of this subject. It’ll be a great addition to your portfolio (or CV) as well.

Read more: Sentiment Analysis Using Python: A Hands-on Guide

4. Reviews of Scientific Papers

If you’re interested in using knowledge of machine learning and data science for research purposes, then this project is perfect for you. You can perform sentiment analysis on reviews of scientific papers and understand what leading experts think about a particular topic. Such a finding can help you research them accordingly.

Here’s the dataset so you can get started on this project: Machine Learning Dataset . The dataset we’ve shared here has N = 405 instances. And it’s stored in JSON format. Working on this project will make you familiar with the applications of machine learning in scientific research. The dataset has some reviews in Spanish and some in English.

5. Analyze IMDb Reviews

IMDb is an entertainment review website where people leave their opinions on different movies and shows. You can perform sentiment analysis on the reviews present there as well. Just like the Rotten Tomatoes project we discussed previously, this one will help you learn about the applications of data science and machine learning in the entertainment industry.

Reviews of shows and movies help production companies in understanding why their title failed (or succeeded).

The dataset for this project is quite old and small. But it’s an excellent way for a beginner to test his/her skills on a new dataset. Here’s a link to the dataset: IMDb reviews dataset .

6. Analyze a Company’s Reputation (News + Social Media)

You can pick a company you like and perform a detailed sentiment analysis on it. You can also choose a trending topic and cover it in your sentiment analysis for a more precise result. We can discuss the example of Uber here. They are one of the most prominent startups in the world and have a global customer base. You can perform a sentiment analysis to understand public opinion on this company.

To find the public opinion on Uber, we’ll first start by getting data from the relevant sources, which in this case are Uber’s Facebook page and Twitter page. By analyzing the conversations between the users there, we can figure out the overall brand perception in the market. You’ll need categories to separate different datasets. In this example, you can use Payment, Service, Cancel, Safety, and Price.

Now that we know what we want to work on and where we have to go, we can get started.

Best Machine Learning and AI Courses Online



To Explore all our courses, visit our page below.

Sentiment Analysis on Facebook

We’ll first begin with their Facebook page. It has more than 30,000 comments, and after we perform the analysis under the categories we mentioned previously (Payment, Service, Cancel, Safety, and Price) we found that most of the positive comments were about the Price section. On the other hand, the category with the highest percentage of negative feedback was service. However, while performing this analysis, we also kept in mind that Facebook’s comments are filled with spam, suggestions, news, and various other pieces of information.

For sentiment analysis, we only have to look at opinions.

So, we removed all the unnecessary categories, and as expected, our results changed. Now, negative comments held a majority in all sections, and their ratio in respective categories changed. In Price related comments, the percentage of negative comments rose by 20%.

That’s why it’s essential to perform data cleaning. It helps you get accurate results.

In-demand Machine Learning Skills

Sentiment Analysis on Twitter

We’ve already discussed the sentiment analysis of tweets in this article. So we’ll follow a similar approach here and analyze people’s tweets where they tag Uber or reply to their tweets. Here, the category with the highest percentage of positive tweets was Payment, and the second-highest was Safety. This also shows how different social media give different results.

However, we would have to perform data cleaning here as well. For that purpose, we’ll remove tweets with unrelated intent (spam, news, marketing, etc.). You’d notice how much the percentage of different categories changes here too.

In our case, Payment saw a decline of 12% in its share of positive tweets and Safety became the category with the highest percentage of positive responses. Apart from that, Safety lost around 2-4% in its share of positive tweets. With this data, you can also find out what are the most popular topics among people when they talk about Uber on these platforms.

So, on Twitter, we found that the most popular categories were payment, Cancel, and service.

You should know that brands take this data very seriously. It helps them figure out what problems they need to work on and how they can solve the same. These tweets are, after all, feedback of customers. In this case, Uber can use the findings of these tweets to understand which parts of its services have faults and how they can fix them.

Sentiment Analysis of News

To understand the public opinion on any organization, you’ll have to analyze the news about it as well. In our example, we’ll check the news articles about Uber. After we analyze the content present in those news articles, we’ll segregate our findings in the categories mentioned above (Payment, Service, Cancel, Safety, and Price).

Apart from that, we’ll also classify different articles according to their popularity. The more popular an article is, the more it’ll affect public opinion. You can measure the popularity of every article according to the number of shares they have. A column with higher shares would undoubtedly be more popular than one with fewer shares.

Also Read: Top 4 Data Analytics Project Ideas: Beginner to Expert Level

The Results

In our example, we looked at Uber and the public opinion on this company. After we’ve analyzed Facebook, Twitter, and news, we’d know whether the general sentiment on Uber is positive, negative, or neutral.

You can follow this approach to create sentiment project analysis ideas. You can start with a small company that doesn’t have a high online presence and performs sentiment analysis on multiple channels to understand if it’s perceived positively or negatively. If you want to increase the challenge, you can make it more complicated and perform analysis for a major company (like we did in our example).

7. Hate Speech Detection Model

Apart from the Twitter sentiment analysis project topic, the hate speech detection model is yet another very interesting area to explore in sentiment analysis python . Hate speech basically refers to any kind of communication or language used against a person, or a group, based on their sexuality, race, color, and religion, among other factors. This includes all kinds of verbal, in verbal, written or behavioral communication. The main task of the hate speech detection model is to identify and classify the hate speech from a given text. The same can be achieved by training the model on data, which is used to classify sentiments.

Along with these, you can further explore many sentiment analysis and capstone project ideas following your enrollment in relevant degree or certificate programs, such as the ones offered on upGrad.

Popular AI and ML Blogs & Free Courses




AI & ML Free Courses

8. Education Course Reviews

Analyzing sentiments in reviews for online courses or educational resources can be a valuable sentiment analysis projects for final year . It involves delving into student perspectives on learning experiences. By collecting and processing diverse datasets from platforms like upGrad, this project can help young learners understand the concept better.

Most importantly, this is a sentiment analysis project using Machine Learning and NLP. Students can learn how to employ sentiment analysis techniques, such as Natural Language Processing and Machine Learning, to categorize reviews as positive, negative, or neutral. The analysis extends to identifying common themes and topics within the reviews, shedding light on aspects like engaging content or challenges with instructional clarity. A noteworthy aspect of the project involves exploring potential correlations between sentiment expressions and academic performance metrics, such as grades or completion rates.

The findings can contribute meaningful insights into the factors influencing student satisfaction and success in online education. Ethical considerations, including data privacy, are essential throughout the project, and the results can inform recommendations for improving the quality of online courses and educational resources.

All in all, this project on sentiment analysis supported by visualizations and comprehensive reporting provides a nuanced understanding of the student experience in virtual learning environments.

9. Social Media Influencer Impact

Analyzing sentiments in social media posts related to influencers involves studying the emotions and opinions expressed by users regarding specific influencers. The is a great example of a sentiment analysis NLP project that entails collecting a dataset of social media posts, comments, and mentions related to various influencers across social media platforms. NLP techniques are then applied to analyze the textual content, categorizing sentiments as positive, negative, or neutral.

The impact on followers is a key aspect of sentiment analysis projects like these. By correlating sentiment analysis results with engagement metrics, such as likes, shares, and comments, researchers can gauge how influencers affect their audience. It’s essential to explore the reasons behind sentiment shifts, such as the influencer’s content, behavior, or external factors.

This project on sentiment analysis has practical applications for marketing and brand management. Businesses and influencers can use the findings to understand their online presence, identify areas for improvement, and tailor content to better resonate with their audience.

10. Sports Match Analysis

This sentiment analysis project begins by collecting a dataset of social media content related to specific sports matches, encompassing platforms like Twitter, Facebook, or dedicated sports forums. Natural Language Processing (NLP) techniques are then applied to analyze the textual content, categorizing sentiments as positive, negative, or neutral.

The analysis extends to exploring fan reactions, team sentiments, and the overall sentiment landscape during different phases of a sports match. Researchers may investigate how events within a match, such as goals, controversial plays, or game-changing moments, influence public opinions. For instance, spikes in positive sentiment may occur when a team scores, while negative sentiments may arise in response to referee decisions or unfavorable outcomes.

Visualizations, such as sentiment trend graphs or heatmaps, can be used to illustrate the ebb and flow of sentiments over time. Learners working on this sentiment analysis project can get enough knowledge coupled with hands-on experience to contribute to social media strategies for sports marketing, fan engagement. They can even inform sports commentators and analysts about the impact of events on public sentiment.

Movie Trailer Reactions

Analyzing sentiments expressed in comments or social media posts related to movie trailers offers a captivating project that provides valuable insights into audience anticipation and excitement for upcoming films. This sentiment analysis using Machine Learning project involves collecting and preprocessing textual data from various platforms, such as YouTube or Twitter, where movie trailers are shared. By applying sentiment analysis techniques and potentially incorporating emotion analysis, the goal is to categorize audience reactions as positive, negative, or neutral.

Word clouds or sentiment distribution charts are the key visualizations of this project. They can then be used to present a comprehensive overview of audience sentiments. For a more advanced project, predictive modeling may be explored to estimate a movie’s potential success based on the sentiments expressed in the trailer reactions.

This sentiment analysis using Machine Learning project report offers filmmakers and movie studios actionable insights into the effectiveness of their promotional campaigns, audience engagement levels, and trends within the film industry, making it a valuable endeavor for learners interested in the intersection of data analysis and the entertainment industry.

Financial News Sentiment Analysis

Analyzing sentiments in financial news articles and social media posts related to the stock market or specific companies constitutes a dynamic project with multifaceted components. The project involves collecting a diverse dataset encompassing financial news and social media content, followed by meticulous text preprocessing to ensure data quality.

Employing sentiment analysis techniques, such as natural language processing or machine learning models, allows the categorization of sentiments into positive, negative, or neutral, providing insights into the overall sentiment landscape surrounding particular companies or the stock market. The analysis extends to examining the impact of sentiments on market trends and investor behavior, exploring correlations between sentiment trends and stock price movements, and identifying influential events in the financial world.

Additionally, the project delves into the influence of social media sentiments on investor decisions and may even include building predictive models for estimating stock price movements based on sentiment analysis results.

Ultimately, this sentiment analysis project with source code offers practical applications for investors, financial analysts, and companies, providing a nuanced understanding of how sentiments shape the complex landscape of financial markets.

Why Should Learners Take Up Sentiment Analysis Projects?

Taking up a project on sentiment analysis can be highly beneficial to both beginners and final yearv students, These projects empower them with practical skills, industry relevance, and the ability to make a worthy impact, fostering a holistic learning experience. Here are a few reasons why you should give these projects a go:-

Practical application of skills: A sentiment analysis project provide learners with a hands-on opportunity to apply theoretical concepts and skills learned in areas such as natural language processing, machine learning, and data analysis. Engaging in a real-world project allows learners to bridge the gap between theory and practical implementation.
Skill development: Working on a sentiment analysis project with source code can help you develop a diverse set of skills, including data collection, data preprocessing, machine learning model implementation, and result interpretation. These skills are highly transferable and applicable in various domains.
Understanding data context: Sentiment analysis projects often involve analyzing text data from diverse sources. For example, a sentiment analysis Python project can help students learn the nuances of that language, context, and cultural variations present in real-world data. Understanding these complexities is crucial for accurate sentiment analysis.
Problem-solving and critical thinking: You will find numerous sentiment analysis projects with source code that will require you to formulate research questions, design methodologies, and make decisions on data preprocessing and model selection. Engaging in such projects enhances problem-solving skills and encourages critical thinking.
Portfolio building: Completing sentiment analysis projects for final year can help expert learners, ready to step into their professional lives build a strong portfolio showcasing their practical skills. A portfolio is a valuable asset when applying for jobs or pursuing further education, as it demonstrates hands-on experience and the ability to work on real-world problems.
Industry relevance: Sentiment analysis is widely used across industries for customer feedback analysis, market research, and brand management. By working on a sentiment analysis Python project, NLP project, ML project and the like, you can gain insights into industry-relevant applications of their skills, making them more attractive to potential employers.
Stay updated with technology: The field of sentiment analysis is dynamic, with ongoing advancements in techniques and tools. Engaging in an sentiment analysis NLP project or any sentiment analysis project using Machine Learning or Python, can help learners keep abreast of the latest developments in the field and encourages a mindset of continuous learning.
Communication and presentation skills: Summarizing and presenting the findings of sentiment analysis projects require effective communication skills. Learners have the opportunity to practice articulating complex technical concepts in a clear and concise manner.
Personal interest and motivation: Sentiment analysis projects can be chosen based on personal interests, making the learning process more engaging and motivating. Learners are more likely to invest time and effort when working on projects that align with their passions.
Contribution to knowledge: Sentiment analysis projects can contribute to the broader understanding of sentiment patterns in various domains. Learners have the opportunity to make meaningful contributions to research and gain a sense of accomplishment.

Final Thoughts

Sentiment Analysis is an essential topic in machine learning. It has numerous applications in multiple fields. If you want to learn more about this topic, then you can head to our blog and find many new resources.

On the other hand, if you want to get a comprehensive and structured learning experience, also if you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s Executive PG Programme in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Pavan Vadapalli

Something went wrong

Our Trending Machine Learning Courses

Advanced Certificate Programme in Machine Learning and NLP from IIIT Bangalore - Duration 8 Months
Master of Science in Machine Learning & AI from LJMU - Duration 18 Months
Executive PG Program in Machine Learning and AI from IIIT-B - Duration 12 Months

Machine Learning Skills To Master

Artificial Intelligence Courses
Tableau Courses
NLP Courses
Deep Learning Courses

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

Sentiment analysis is becoming a crucial tool for monitoring and understanding client sentiment as they share their opinions and emotions more openly than ever before. Brands can know what makes clients satisfied or frustrated by automatically evaluating customer feedback, such as comments in survey replies and social media dialogues. This allows them to customize products and services to match their customers' demands. For example, employing sentiment analysis to examine 4,000+ surveys about your business could help you figure out if customers like your pricing and customer service.

Even humans struggle to effectively interpret sentiments, making sentiment analysis one of the most difficult tasks in nlp. Every utterance is made at some moment in time, in some location, by and to some people, and so on. All statements are made in context. People convey their negative attitudes using positive phrases in irony and sarcasm, which can be difficult for robots to recognize without a detailed knowledge of the situation in which an emotion was expressed. Another difficulty worth tackling in sentiment analysis is how to handle comparisons. Another issue to overcome in order to undertake effective sentiment analysis is defining what we mean by neutral.

When working on a classification problem, it's critical to pick the test and training corpora wisely. Domain knowledge is required for a set of features to act in the classification process. In most data science situations, using a classification method on a cleaned corpora rather than a noisy corpus is advised. Keywords that appear infrequently in the corpus do not usually have a role in text classification. These infrequent characteristics can be removed, resulting in improved model performance. It's generally a good idea to reduce terms to their simplest versions. Lemmatization is the name for this method.

Explore Free Courses

Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in Canada through this course.

Advance your career in the field of marketing with Industry relevant free courses

Build your foundation in one of the hottest industry of the 21st century

Master industry-relevant skills that are required to become a leader and drive organizational success

Build essential technical skills to move forward in your career in these evolving times

Get insights from industry leaders and career counselors and learn how to stay ahead in your career

Kickstart your career in law by building a solid foundation with these relevant free courses.

Stay ahead of the curve and upskill yourself on Generative AI and ChatGPT

Build your confidence by learning essential soft skills to help you become an Industry ready professional.

Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in USA through this course.

Suggested Blogs

Career Opportunities in Artificial Intelligence: List of Various Job Roles

by Pavan Vadapalli

26 Jun 2024

Gini Index for Decision Trees: Mechanism, Perfect & Imperfect Split With Examples

by MK Gurucharan

24 Jun 2024

Random Forest Vs Decision Tree: Difference Between Random Forest and Decision Tree

21 Jun 2024

Top 10 Challenges in Artificial Intelligence in 2024

18 Jun 2024

Top 5 Natural Language Processing (NLP) Projects & Topics For Beginners [2024]

30 May 2024

Top 8 Exciting AWS Projects & Ideas For Beginners [2024]

25 May 2024

45+ Best Machine Learning Project Ideas For Beginners [2024]

by Jaideep Khare

21 May 2024

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, sentiment analysis.

1334 papers with code • 39 benchmarks • 93 datasets

Sentiment Analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral". Given the text and accompanying labels, a model can be trained to predict the correct sentiment.

Sentiment Analysis techniques can be categorized into machine learning approaches, lexicon-based approaches, and even hybrid methods. Some subcategories of research in sentiment analysis include: multimodal sentiment analysis, aspect-based sentiment analysis, fine-grained opinion analysis, language specific sentiment analysis.

More recently, deep learning techniques, such as RoBERTa and T5, are used to train high-performing sentiment classifiers that are evaluated using metrics like F1, recall, and precision. To evaluate sentiment analysis systems, benchmark datasets like SST, GLUE, and IMDB movie reviews are used.

Benchmarks Add a Result

--> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->

Trend	Dataset	Best Model	Paper	Code	Compare
		T5-11B
		RoBERTa-large with LlamBERT
		Heinsen Routing + RoBERTa Large
		XLNet
		VLAWE
		XLNet
		MA-BERT
		AnglE-LLaMA-7B
		BERT large
		BERT large
		InstructABSA
		W2V2-L-LL60K (pipeline approach, uses LM)
		BERTweet
		UDALM: Unsupervised Domain Adaptation through Language Modeling
		RoBERTa-large 355M + Entailment as Few-shot Learner
		k-RoBERTa (parallel)
		CalBERT
		LSTMs+CNNs ensemble with multiple conv. ops
		RobBERT v2
		AEN-BERT
		RuBERT-RuSentiment
		xlmindic-base-uniscript
		LSTMs+CNNs ensemble with multiple conv. ops
		FiLM
		Space-XLNet
		fastText, h=10, bigram
		CNN-LSTM
		CNN-LSTM
		Random
		RoBERTa-wwm-ext-large
		RoBERTa-wwm-ext-large
		AraBERTv1
		AraBERTv1
		AraBERTv1
		Naive Bayes
		SVM
		RCNN
		lstm+bert
		CalBERT

Most implemented papers

Bert: pre-training of deep bidirectional transformers for language understanding.

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

Convolutional Neural Networks for Sentence Classification

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.

Universal Language Model Fine-tuning for Text Classification

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

Bag of Tricks for Efficient Text Classification

facebookresearch/fastText • EACL 2017

This paper explores a simple and efficient baseline for text classification.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

A Structured Self-attentive Sentence Embedding

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

Deep contextualized word representations

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

Domain-Adversarial Training of Neural Networks

Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.

Survey on sentiment analysis: evolution of research methods and topics

Published: 06 January 2023
Volume 56 , pages 8469–8510, ( 2023 )

Cite this article

Jingfeng Cui ORCID: orcid.org/0000-0001-8306-0727 1 , 2 ,
Zhaoxia Wang ORCID: orcid.org/0000-0001-7674-5488 3 ,
Seng-Beng Ho ORCID: orcid.org/0000-0003-4839-1509 1 &
Erik Cambria ORCID: orcid.org/0000-0002-3030-1280 4

15k Accesses

37 Citations

1 Altmetric

Explore all metrics

Sentiment analysis, one of the research hotspots in the natural language processing field, has attracted the attention of researchers, and research papers on the field are increasingly published. Many literature reviews on sentiment analysis involving techniques, methods, and applications have been produced using different survey methodologies and tools, but there has not been a survey dedicated to the evolution of research methods and topics of sentiment analysis. There have also been few survey works leveraging keyword co-occurrence on sentiment analysis. Therefore, this study presents a survey of sentiment analysis focusing on the evolution of research methods and topics. It incorporates keyword co-occurrence analysis with a community detection algorithm. This survey not only compares and analyzes the connections between research methods and topics over the past two decades but also uncovers the hotspots and trends over time, thus providing guidance for researchers. Furthermore, this paper presents broad practical insights into the methods and topics of sentiment analysis, while also identifying technical directions, limitations, and future work.

A Comprehensive Study of Sentiment Analysis in Big Data Applications

Introduction to Sentiment Analysis Covering Basics, Tools, Evaluation Metrics, Challenges, and Applications

A Structural Topic Modeling-Based Bibliometric Study of Sentiment Analysis Literature

Avoid common mistakes on your manuscript.

1 Introduction

Web 2.0 has driven the proliferation of user-generated content on the Internet. This content is closely related to the lives, emotions, and opinions of users. Therefore, analysis of this user-generated data is beneficial for monitoring public opinion and assisting in making decisions. Sentiment analysis, as one of the most popular applications of text-based analytics, can be used to mine people’s attitudes, emotions, appraisals, and opinions about issues, entities, topics, events, and products (Cambria et al. 2022a , b , c , d ; Injadat et al. 2016 ; Jiang et al. 2017 ; Liang et al. 2022 ; Oueslati et al. 2020 ; Piryani et al. 2017 ). Sentiment analysis can help us interpret emotions in unstructured texts as positive, negative, or neutral, and even calculate how strong or weak the emotions are. Today, sentiment analysis is widely used in various fields, such as business, finance, politics, education, and services. This analytical technique has gained broad acceptance not only among researchers but also among governments, institutions, and companies (Khatua et al. 2020 ; Liu et al. 2012 ; Sánchez-Rada and Iglesias 2019 ; Wang et al. 2020b ). It helps policy leaders, businessmen, and service people make better decisions.

The majority of user-generated content data is unstructured text, which increases the great difficulty of sentiment analysis. Since 2000, researchers have been exploring techniques and methods to enhance the accuracy of such analysis. The popularity of social media platforms has brought people around the world closer together. With the continuous advancement of technology, the research topics, application fields, and core methods and technologies of sentiment analysis are also constantly changing.

Comparing and analyzing papers from specific disciplines can help researchers gain a comprehensive understanding of the field. There have been many surveys on sentiment analysis (Nair et al. 2019 ; Obiedat et al. 2021 ; Raghuvanshi and Patil 2016 ). However, there is a lack of adequate discussion on the connections between research methods and topics in the field, as well as on their evolution over time. In 1983, Callon et al. proposed co-word analysis (Callon et al. 1983 ). It can effectively reflect the correlation strength of information items in text data. Co-word analysis based on the frequency of co-occurrence of keywords used to describe papers can reveal the core contents of the research in specific fields. An evolutionary analysis of the associations between core contents is helpful for a comprehensive understanding of the research hotspots and frontiers in the field (Deng et al. 2021 ). It can provide guidance for researchers, especially those who are new to the field, and help them determine research directions, avoid repetitive research, and better discover and grasp the research trends in this field (Wang et al. 2012 ). To fill in the gap in existing research, we conduct keyword co-occurrence analysis and evolution analysis with informetric tools to explore the research hotspots and trends of sentiment analysis.

The main contributions of this survey are as follows:

Using keyword co-occurrence analysis and the informetric tools, the paper presents a survey on sentiment analysis, explores and discovers useful information.

A keyword co-occurrence network is constructed by combining the paper title, abstract, and author keywords. Through the keyword co-occurrence network and community detection algorithm, the research methods and topics in the field of sentiment analysis, along with their evolution in the past two decades, are discussed.

The paper summarizes the research hotspots and trends in sentiment analysis. It also highlights practical implications and technical directions.

The remainder of this paper is organized as follows: In Sect. 2 , we summarize and analyze the existing surveys on sentiment analysis and present the research purpose and methodologies of this paper. Section 3 details the survey methodology, including the collection and processing of scientific publications, visualization, and analysis using different methods and tools. In Sect. 4 , we analyze the results obtained from the keyword co-occurrence analysis and evolution analysis, along with the research hotspots and trends in sentiment analysis identified through the analysis results. Finally, in Sect. 5 , we summarize the research conclusions as well as the practical implications and technical directions of sentiment analysis. We also clarify the limitations of this paper and make suggestions for future work.

2 Existing surveys on sentiment analysis

Sentiment analysis is a concept encompassing many tasks, such as sentiment extraction, sentiment classification, opinion summarization, review analysis, sarcasm detection or emotion detection, etc. Since the 2000s, sentiment analysis has become a popular research field in natural language processing (Hussein 2018 ). In the existing surveys, the researchers mainly conducted specific analyses of the tasks, technologies, methods, analysis granularity, and application fields involved in the sentiment analysis process.

2.1 Surveys on contents and topics of sentiment analysis

When research on sentiment analysis was still in its infancy, the contents and topics of surveys mainly focused on sentiment analysis tasks, analysis granularity, and application areas. Kumer et al. reviewed the basic terms, tasks, and levels of granularity related to sentiment analysis (Kumar and Sebastian 2012 ). They also discussed some key feature selection techniques and the applications of sentiment analysis in business, politics, recommender systems and other fields. Nassirtoussi et al. explored the application of sentiment analysis in market prediction (Nassirtoussi et al. 2014 ). Medhat et al. analyzed the improvement of the algorithms proposed in 2010–2013 and their application fields (Medhat et al. 2014 ). Ravi et al. analyzed the papers related to opinion mining and sentiment analysis from 2002 to 2015. Their study mainly discussed the necessary tasks, methods, applications, and unsolved problems in the field of sentiment analysis (Ravi and Ravi 2015 ).

Existing surveys of the applications of sentiment analysis have focused more on the domains of market research, medicine, and social media in recent years. Rambocas et al. examined the application of sentiment analysis in marketing research from three main perspectives, including the unit of analysis, sampling design, and methods used in sentiment detection and statistical analysis (Rambocas and Pacheco 2018 ). Cheng et al. summarized techniques based on semantic, sentiment, and event extraction, as well as hybrid methods employed in stock forecasting (Cheng et al. 2022 ). Yue et al. categorized and compared a large number of techniques and approaches in the social media domain. That study also introduced different types of data and advanced research tools, and discussed their limitations (Yue et al. 2019 ). In the context of the COVID-19 epidemic, Alamoodi et al. reviewed and analyzed articles on the occurrence of different types of infectious diseases in the past 10 years. They reviewed the applications of sentiment analysis from the identified 28 articles, summarizing the adopted techniques such as dictionary-based models, machine learning models, and mixed models (Alamoodi et al. 2021b ); Alamoodi et al. also conducted a review of the applications of sentiment analysis for vaccine hesitancy (Alamoodi et al. 2021a ). Researchers also reviewed the application of sentiment analysis in the fields of election prediction (Brito et al. 2021 ), education (Kastrati et al. 2021 ; Zhou and Ye 2020 ) and service industries (Adak et al. 2022 ).

Quite a number of research works investigated sentiment analysis works in non-English languages. Sentiment analysis in Chinese (Peng et al. 2017 ), Arabic (Al-Ayyoub et al. 2019 ; Boudad et al. 2018 ; Nassif et al. 2021 ; Oueslati et al. 2020 ), Urdu (Khattak et al. 2021 ), Spanish (Angel et al. 2021 ), and Portuguese (Pereira 2021 ) were conducted. They mainly reviewed the classification frameworks of the sentiment analysis process, supported language resources (dictionaries, natural language processing tools, corpora, ontologies, etc.), and deep learning models used (CNN, RNN, and transfer learning) for each of the languages involved.

2.2 Surveys on methods of sentiment analysis

Before machine learning technology became mature, researchers were particularly concerned about feature extraction methods. For example, Feldman summarized methods for extracting preferred entities from indirect opinions and methods for dictionary acquisition (Feldman 2013 ). Asghar et al. reviewed the natural language processing techniques for extracting features based on part of speech and term position; statistical techniques for extracting features based on word frequency and decision tree model; and techniques for combining part of speech tagging, syntactic feature analysis, and dictionaries (Asghar et al. 2014 ). Koto et al. discussed the best features for Twitter sentiment analysis prior to 2014 by comparing 9 feature sets (Koto and Adriani 2015 ). They found that the current best features for sentiment analysis of Twitter texts are AFINN (a list of English terms used for sentiment analysis manually rated by Finn Årup Nielsen) (Nielsen 2011 ) and Senti-Strength (Thelwall et al. 2012 ). Taboada sorted out the characteristics of words, phrases, and sentence patterns in sentiment analysis from the perspective of linguistics (Taboada 2016 ). Besides, Schouten and Frasinar conducted a comprehensive and in-depth critical evaluation of 15 sentiment analysis web tools (Schouten and Frasincar 2015 ). Medhat et al. ( 2014 ) and Ravi et al. (Ravi and Ravi 2015 ) also analyzed the early algorithms for sentiment analysis.

In the study by Schouten et al., the authors focused on aspect-level sentiment analysis, combing the techniques of aspect-level sentiment analysis before 2014, such as frequency-based, syntax-based, supervised machine learning, unsupervised machine learning, and hybrid approaches. They concluded that the latest technology was moving beyond the early stages (Schouten and Frasincar 2015 ). As research into sentiment analysis became more and more popular and there was important progress made in the development of deep learning technologies, researchers started to pay more attention to the techniques and methods of sentiment analysis. Deep learning methods in particular became the focus of discussions among researchers.

Prabha et al. analyzed various deep learning methods used in different applications at the level of sentence and aspect/object sentiment analysis, including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Long Short-term Memory (LSTM) (Prabha and Srikanth 2019 ). They discussed the advantages and disadvantages of these methods and their performance parameters. Ain et al. introduced deep learning techniques such as Deep Neural Network (DNN), CNN and Deep Belief Network (DBN) to solve sentiment analysis tasks like sentiment classification, cross-lingual problems, and product review analysis (Ain et al. 2017 ). Zhang et al. investigated deep learning and machine learning techniques for sentiment analysis in the contexts of aspect extraction and categorization, opinion expression extraction, opinion holder extraction, sarcasm analysis, multimodal data, etc. (Zhang et al. 2018 ). Habimana et al. compared the performance of deep learning methods on specific datasets and proposed that performance could be improved using models including Bidirectional Encoder Representations from Transformers (BERT), sentiment-specific word embedding models, cognitive-based attention models, and commonsense knowledge (Habimana et al. 2020 ). Wang et al. reviewed and discussed existing analytical models for sentiment classification and proposed a computational emotion-sensing model (Wang et al. 2020b ).

Some researchers also discussed web tools (Zucco et al. 2020 ), fuzzy logic algorithms (Serrano-Guerrero et al. 2021 ), transformer models (Acheampong et al. 2021 ), and sequential transfer learning (Chan et al. 2022 ) for sentiment analysis.

2.3 Overall survey methodology

With the increase in the popularity of sentiment analysis research, more related research results began to accumulate. Researchers needed to systematically organize and analyze results from a large number of publications to perform literature reviews. They used different survey methodologies to conduct surveys of a large number of papers.

Content analysis is a powerful approach to characterizing the contents of each study by carefully reading its content and manually identifying, coding, and organizing key information in it. A literature review is formed as a result of the repeated use of this approach (Elo and Kyngäs 2008 ; Stemler 2000 ). Content analysis has been used for different studies and systematic reviews (Qazi et al. 2015 , 2017 ). For example, Birjali et al. have studied the most commonly used classification techniques in sentiment analysis from a large amount of literature and introduced the application areas and sentiment classification processes, including preprocessing and feature selection (Birjali et al. 2021 ). They conducted a comprehensive analysis of the papers, discovering that supervised machine learning algorithms are the most commonly used techniques in the field. A complete review of methods and evaluation for sentiment analysis tasks and their applications was conducted by Wankhade et al. ( 2022 ). They compared the strengths and weaknesses of the methods, and discussed the future challenges of sentiment analysis in terms of both the methods and the forms of the data. Although this method can review the research contents and penetrate into the cores of the papers most systematically, it requires a considerable amount of manpower and time for in-depth literature reading.

The systematic literature review guideline proposed by Kitchenham and Charters has gradually attracted the attention of researchers (Kitchenham 2004 ; Kitchenham and Charters 2007 ; Sarsam et al. 2020 ). This review process is divided into six stages: research question definition, search strategy formulation, inclusion and exclusion criteria definition, quality assessment, data extraction, and data synthesis. Researchers can eliminate a large number of retrieved papers by using this standard process and finally conducting further analysis and research on a small number of papers. Kumar et al. reviewed context-based sentiment analysis in social multimedia between 2006 and 2018. From the 573 papers retrieved in the initial search, they finally selected 37 papers to use in discussing sentiment analysis techniques (Kumar and Garg 2020 ). This approach was also used by Kumar et al. in their research on sentiment analysis on Twitter using soft computing techniques. They selected 60 articles out of 502 for follow-up analysis (Kumar and Jaiswal 2020 ). Zunic et al. selected 86 papers from 299 papers retrieved in the period 2011–2019 to discuss the application of sentiment analysis techniques in the field of health and well-being (Zunic et al. 2020 ); Ligthart et al. followed Kitchenham’s guideline and identified 14 secondary studies. They provided an overview of specific sentiment analysis tasks and of the features and methods required for different tasks (Ligthart et al. 2021 ). Obiedat (Obiedat et al. 2021 ), Angel (Angel et al. 2021 ) and Lin (Lin et al. 2022 ) also all followed this guideline to select literature for further analysis. This method can reduce the amount of literature that requires in-depth reading, but in the case of a large amount of literature, more effort is still required to search and screen the material than in traditional literature review methods (Kitchenham and Charters 2007 ).

There are also a few authors who have used informetric methods to review papers. Piryani et al. conducted an informetric analysis of research on opinion mining and sentiment analysis from 2000 to 2015 (Piryani et al. 2017 ). The authors used social network analysis, literature co-citation analysis, and other methods in the paper. They analyzed publication growth rates; the most productive countries, institutions, journals, and authors; and topic density maps and keyword bursts, among other elements. To a certain extent, they interpreted core authors, core papers, areas of research focus in this field, and the current state of national cooperation. In order to explore the application of sentiment analysis in building smart societies, Verma collected 353 papers published between 2010 and 2021 (Verma 2022 ). Using a topic analysis perspective combined with the Louvain algorithm, the author identified four sub-topics in the research field. Similarly, Mantyla et al. employed LDA techniques and manual classification to explore the topic structures of sentiment analysis articles (Mäntylä et al. 2018 ). The informetric methods use natural language processing technologies to intuitively conduct topic mining and analysis of a large number of papers. Through topic clustering, the literature is organized and analyzed, which reduces the time researchers spend on reading the literature in depth. These methods are suitable for exploring research topics and trends in the field.

2.4 Summary of advantages and disadvantages of the existing surveys

In the following, we discuss the advantages and disadvantages of the existing surveys from a number of different points of view.

2.4.1 From the point of view of the contents and topics of sentiment analysis

As summarized in Table 1 , the researchers organized the literature and conducted depth investigations of the contents and topics of sentiment analysis. They reviewed the tasks of sentiment analysis (e.g., different text granularity, opinion mining, spam review detection, and emotion detection), the application areas of sentiment analysis (e.g. market, medicine, social media, and election prediction), and different languages for sentiment analysis, such as Chinese, Spanish, and Arabic (Adak et al. 2022 ; Al-Ayyoub et al. 2019 ; Alamoodi et al. ( 2021a , b ); Alonso et al. 2021 ; Angel et al. 2021 ; Boudad et al. 2018 ; Brito et al. 2021 ; Cheng et al. 2022 ; Hussain et al. 2019 ; Kastrati et al. 2021 ; Khattak et al. 2021 ; Koto and Adriani 2015 ; Kumar and Sebastian 2012 ; Ligthart et al. 2021 ; Medhat et al. 2014 ; Nassif et al. 2021 ; Nassirtoussi et al. 2014 ; Oueslati et al. 2020 ; Peng et al. 2017 ; Pereira 2021 ; Rambocas and Pacheco 2018 ; Ravi and Ravi 2015 ; Schouten and Frasincar 2015 ; Sharma and Jain 2020 ; Yue et al. 2019 ; Zhou and Ye 2020 ). They summarized the methods and application prospects of sentiment analysis under different contents and topics. As the field has grown, new topics have emerged, and knowledge from other fields has been gradually integrated into it. In recent years, the popularity of social media has aroused increasing interest in sentiment analysis research, and the number of papers published, especially those related to different topics of sentiment analysis, has grown rapidly. However, the existing surveys cover a short time range, and there has not been a survey dedicated to the evolution of research contents or topics of sentiment analysis. There have also been few survey works analyzing the connections between topics and methods, or their evolution (e.g., how the contents and topics of sentiment analysis have changed over time).

2.4.2 From the point of view of the methods of sentiment analysis

Some researchers reviewed different techniques and methods of sentiment analysis in different application areas and tasks. They analyzed and discussed sentiment analysis methods based on lexicons, rules, part of speech, term position, statistical techniques, supervised and unsupervised machine learning methods, as well as deep learning methods like LSTM, CNN, RNN, DNN, DBN, BERT, and other hybrid approaches (Acheampong et al. 2021 ; Ain et al. 2017 ; Alamoodi et al. 2021b ; Asghar et al. 2014 ; Chan et al. 2022 ; Cheng et al. 2022 ; Feldman 2013 ; Habimana et al. 2020 ; Koto and Adriani 2015 ; Kumar, Akshi and Sebastian 2012 ; Medhat et al. 2014 ; Prabha and Srikanth 2019 ; Ravi and Ravi 2015 ; Schouten and Frasincar 2015 ; Serrano-Guerrero et al. 2021 ; Taboada 2016 ; Wang et al. 2020b ; Yue et al. 2019 ; Zhang et al. 2018 ; Zucco et al. 2020 ). These researchers also compared the advantages and disadvantages of each method. As summarized in Table 1 , even though existing surveys analyze the techniques and methods of sentiment analysis, providing good insights, there has not been a survey that analyzes the evolution of research methods over time. There have also been few survey works that focuses on the connections between topics and methods of sentiment analysis, and their evolution over time.

2.4.3 From the point of view of the overall survey methodology

The survey methods used have mainly been the content analysis method, Kitchenham and Charters' guideline, and the informetric methods. As summarized in Table 1 , the content analysis method can effectively analyze the contents of research papers in depth, but it does not address the issue of the evolution of the research methods and topics (Bengtsson 2016 ; Birjali et al. 2021 ; Elo and Kyngäs 2008 ; Krippendorff 2018 ; Qazi et al. 2015 , 2017 ; Wankhade et al. 2022 ). Although the number of papers that need to be read in depth can be reduced by following Kitchenham and Charters' guideline, more effort is needed to search and screen literature than in traditional literature review methods (Angel et al. 2021 ; Kitchenham 2004 ; Kitchenham and Charters 2007 ; Kumar and Garg 2020 ; Ligthart et al. 2021 ; Lin et al. 2022 ; Obiedat et al. 2021 ; Sarsam et al. 2020 ; Zunic et al. 2020 ). The informetric methods are best suited to investigating the research methods and topics of sentiment analysis (Bar-Ilan 2008 ; Mäntylä et al. 2018 ; Piryani et al. 2017 ; Santos et al. 2019 ; Verma 2022 ). There are three surveys using informetric techniques and tools that are well suited for analysis of a large number of papers over many years (Mäntylä et al. 2018 ; Piryani et al. 2017 ; Verma 2022 ). However, the evolution of research methods and topics of sentiment analysis over time has not been studied with informetric methods. There have also been few survey works that leverages keyword co-occurrence analysis and community detection to analyze the connections between research methods and topics, and their evolution over time.

Therefore, to address the gaps in the existing surveys, this study presents a survey on the research methods and topics, and their evolution over time. It combines keyword co-occurrence analysis and informetric analysis tools to reveal the methods and topics of sentiment analysis and their evolution in this field from 2002 to 2022.

The following section, Sect. 3 , describes our proposed survey methodology in detail.

3 The proposed survey methodology

This section describes our proposed survey methodology, including collection of scientific publications, processing of scientific publications, as well as visualization and analysis using different methods and tools. The overall scheme of this survey (Fig. 2 ) is also presented in the end of Sect. 3 to better visualize and summarize the proposed survey methodology in this research.

3.1 Collection of scientific publications

We collected research data from the Web of Science platform. We used keywords such as "sentiment analysis," "sentiment mining," and "sentiment classification" to search for relevant papers as data samples. In examining the retrieved papers, we found that some paper topics, paper types, and publication journals were not related to sentiment analysis, so we excluded them. The papers we included were mainly related to the sentiment analysis of texts. We excluded papers on sentiment analysis related to image processing, video processing, speech processing, biological signal processing, etc. Therefore, the retrieval strategy was as follows:

Topic Search (TS) = ("sentiment analy*" or "sentiment mining" or "sentiment classification") And Abstract (AB) = "sentiment" NOT TS = ("face image*" or "speech recognition" or "speech emotion" or "physiological signal*" or "music emotion*" or "facial feature extraction" or "video emotion" or "electroencephalography " or "biosignal*" or "image process*") NOT Title = ("facial" or "speech" or "sound*" or "face" or "dance" or "temperature" or "image*" or "spoken" or "electroencephalography" or "EEG" or "biosignal*" or "voice*" not AB = "facial."

The results in conferences are given the same relevance as journal papers. We chose four databases in the Web of Science: two conference citation databases (Conference Proceedings Citation Index—Social Sciences & Humanities [CPCI-SSH], and Conference Proceedings Citation Index—Science [CPCI-S]), and two journal citation databases (Science Citation Index Expanded [SCI-Expanded] and Social Sciences Citation Index [SSCI]). Given the various forms of words such as "analyzing" and "analysis," a truncated search technique (marked with an asterisk) was used to prevent the omission of relevant papers. The time frame of the retrieved papers was from January 2002 to January 2022, and the publication types of the papers included "article," "conference paper," "review," and "edited material." A total of 9,714 papers were obtained from the four databases above. These included 3,809 articles, 5,633 proceeding papers, 267 reviews, and 5 pieces of editorial material from 2002 to 2022. Overall, there were 104 papers from January 2022. The number of papers each year from 2002 to 2021 is shown in Fig. 1 .

The number of papers each year from 2002 to 2021

3.2 Processing of scientific publications

In this process, our purpose was to extract the key contents of the papers, which are used to analyze the research methods and topics in the field of sentiment analysis. Due to their limited number, the author keywords in each paper often cannot fully represent the key content of the paper. We found that combining the title and abstract could better reflect the core information. Therefore, we synthesized the title, abstract, and author keywords of each paper to extract keywords that represented the main research method and topic of the paper involved using KeyBERT Footnote 1 . KeyBERT is a keyword extraction technique that uses BERT embedding to create keywords and key phrases that most closely resemble document content (Grootendorst and Warmerdam 2021 ). The specific keyword extraction process was as follows:

First, we used KeyBERT to extract 8 keywords and eliminated keywords with a weight lower than 0.3. We then combined the extracted keywords with the author keywords and removed duplicates. After that, we standardized the whole collection of keywords and merged synonyms. Finally, we counted the number of keywords and removed meaningless terms like "sentiment analysis," "sentiment classification," and "sentiment mining."

After statistical analysis, we obtained 41,827 keywords with a total word frequency of 88,104. As there were 9,714 papers and 41,827 keywords, we found that most of the keywords with word frequency below 10 were not representative of the research contents of sentiment analysis. As a result, a total of 685 representative keywords were reserved for subsequent analysis. These keywords appeared a total of 30,801 times. Table 2 shows the keywords with word frequency in the top 50.

High-frequency keywords generally represent research hotspots. We therefore extracted high-frequency keywords to serve as the basis for the subsequent analysis. We found that most of the keywords with word frequency 18 and lower, such as "ranking," "mask," "experience," "affect," "online forum," and so on, were not relevant to sentiment analysis. Therefore, the keywords with a word frequency higher than 18 were reserved for analysis. These keywords appeared 25,429 times in the collected data, accounting for close to 83% of all the keywords. We obtained 275 keywords, which were used to analyze the main methods and topics of sentiment analysis.

3.3 Visualization and analysis using different methods and tools

3.3.1 analytical methods.

Keywords are the core natural language vocabulary to express the subject, content, ideas, and research methods of the literature (You et al. 2021 ). Keywords represent the topics of the domain, and cluster analysis of these words can reflect the structure and association of topics. Keyword co-occurrence analysis counts the number of occurrences of a set of keywords in the same document. The strength and number of associations between research contents can be obtained through keyword co-occurrence analysis. Dividing research methods and topics into sub-communities helps researchers to analyze hotspots and trends in methods and topics, as well as to obtain sub-fields of sentiment analysis research (Ding et al. 2001 ).

3.3.2 Visualization and analysis tools

BibExcel Footnote 2 is a software tool for analyzing bibliographic data or any text-based data formatted in a similar way (Persson 2017 ). The tool generates structured data files that can be read by Excel for subsequent processing (Persson et al. 2009 ). Our processing steps are as follows. First, we imported the standardized bibliographic data into BibExcel. This tool can help structure the data. Second, we checked and corrected the data and used BibExcel to count the number of co-occurrences of keywords.

We then used Pajek Footnote 3 software to visualize the keyword co-occurrence network and divided the sub-communities. Pajek is a large and complex network analysis tool (Batagelj and Andrej 2022 ; Batagelj and Mrvar 1998 ). It can calculate certain indicators to reveal the state and properties of the network involved. In addition, Pajek’s Louvain community detection algorithm can help divide the keyword co-occurrence network into sub-communities, which represent sub-fields of sentiment analysis (Blondel et al. 2008 ; Leydesdorff et al. 2014 ; Rotta and Noack 2011 ). The Louvain community-detection algorithm unfolds a complete hierarchical community structure for the network. It has an advantage in subdividing different areas of study: multiple knowledge structures and details can be shown in one network (Deng et al. 2021 ).

After that, we applied VOSviewer Footnote 4 to optimize the visualization of sub-communities (Van Eck and Waltman 2010 ; VOSviewer 2021 ; Perianes-Rodriguez et al. 2016 ; Waltman and Van Eck 2013 ; Waltman et al. 2010 ). VOSviewer can help display the core keywords in each sub-community and the correlation between keywords. It can also reflect the closeness of the association between sub-communities. Finally, we used Excel to count the frequency of keywords for each year and to map the evolution of research methods and topics in the field of sentiment analysis.

3.3.3 Graphical representation of the overall scheme of this survey

This paper proposes and conducts a new research survey on sentiment analysis. The graphical representation of the overall scheme of this survey is shown in Fig. 2 . The main scheme includes four modules: Module A, Collection of scientific publications; Module B, Processing of scientific publications; Module C, Visualization and analysis through different methods and tools, and Module D, Result analysis and discussions based on various aspects.

Graphical representation of the overall scheme of this survey. Module A: Collection of scientific publications; Module B: Processing of scientific publications; Module C: Visualization and analysis using different methods and tools; Module D: Result analysis and discussions considering various aspects

In Module A, scientific publications are collected from the Web of Science (WOS) platform, as has been detailed in Sect. 3.1 Collection of scientific publications above. Module B, Processing of scientific publications, has been detailed in Sect. 3.2 above. It performs a data processing procedure to obtain key information, which includes all the representative keywords and high-frequency keywords. The title, abstract and keywords of the papers are used to extract such key information using KeyBERT (Grootendorst and Warmerdam 2021 ). Such key information is analyzed and visualized through different methods, including different visualization tools, as introduced in Sect. 3.3 (Module C), Visualization and analysis using different methods and tools, above.

In Module C, the number of co-occurrences of keywords is obtained using BibExcel (Persson 2017 ), the co-occurrences of keywords are analyzed and visualized using Pajek (Blondel et al. 2008 ; Leydesdorff et al. 2014 ; Rotta and Noack 2011 ) and VOSviewer (Van Eck and Waltman 2010 ; VOSviewer 2021 ; Perianes-Rodriguez et al. 2016 ; Waltman and Van Eck 2013 ; Waltman et al. 2010 ). The keyword community network and the keyword community evolution are analyzed and visualized using these tools, as described in Sect. 3.3 (Module C), Visualization and analysis using different methods and tools. According to the visualization and analysis results obtained in Module C, Module D, Result analysis and discussions, will be detailed in Sect. 4 .

In the following section, Sect. 4 (Module D), results are analyzed and discussed considering various aspects, including the research methods and topics of sentiment analysis in each community, the evolution of research methods and topics along with the research hotspots and trends over time.

4 Results and analysis through various aspects

4.1 research methods and topics of sentiment analysis, 4.1.1 overall characteristic analysis.

The high-frequency keywords were presented in Table 2 . These keywords can be regarded as the main research contents in the field of sentiment analysis. "Twitter" ranks at the top. It is followed by "opinion mining," "natural language processing," "machine learning," and so on. The high-frequency keywords cover the topics of the studies, the contents of the studies, and the techniques and methods used. Based on these keywords, we used Pajek’s Louvain method to construct a keyword co-occurrence network to represent the research methods and topics as shown in Fig. 3 . The keyword co-occurrence network is divided into six communities. The research methods and topics of the six communities include social media platforms (C1), machine learning methods (C2), natural language processing and deep learning methods (C3), opinion mining and text mining (C4), Arabic sentiment analysis (C5), and others, such as domain sentiment analysis and transfer learning, etc. (C6).

Keyword community network

In Fig. 3 , the size of the node represents the number of keywords. The thickness of the line between the nodes represents the number of collaborations between keywords. The top 20 keywords in each community are sorted in descending order, as shown in Table 3 . The keyword co-occurrence network features of the six sub-communities are described in Table 4 . The number of nodes shows the number of keywords in each community, and the number of links shows the correlations between the keywords.

As shown in Table 4 , we can see from the number of links between sub-communities that there is a strong correlation between them, especially the link between C3 and C4, which has 1306 lines. The reason may be that the research methods of C4 focus on "opinion mining" and "text mining," while those of C3 focus on "natural language processing" and "deep learning," and C3 provides more technical support for C4 research. In C5 and C6, the research methods and topics are scattered. Their internal links are also low, but the connections with C3 and C4 are relatively high. The contents of C5 and C6 may include some emerging research methods and topics. We will present a specific analysis on the methods and topics of each sub-community in the next subsection.

4.1.2 Analysis on research methods and topics of sub-communities

4.1.2.1 analysis on research methods and topics of the c1 community.

Figure 4 shows the keyword co-occurrence network of the C1 community. The research methods and topics of the C1 community focus on three areas: "social media," "topic models," and "covid-19." In the context of big data, web 2.0 technology provides users with a way to express reviews and opinions of services, events, and people. Various social media platforms, such as Twitter, YouTube, and Weibo, have a large amount of users’ emotional data (Momtazi 2012 ). Compared to traditional news media, information on social media spreads more quickly, and people are able to express their feelings more freely. It is important to analyze the emotions generated by the information shared and published on social media (Abdullah and Zolkepli 2017 ; Wang et al. 2014 ). Researchers have been extracting text data from social media platforms for years to detect unexpected events (Bai and Yu 2016 ; Preethi et al. 2015 ), improve the quality of products (Abrahams et al. 2012 ; Isah et al. 2014 ; Myslin et al. 2013 ), understand the direction of public opinion (Fink et al. 2013 ; Groshek and Al-Rawi 2013 ), and so on.

The keyword co-occurrence network for the C1 community

Users’ sentiments are often associated with the topics, and the accuracy of sentiment analysis can be improved through the introduction of topic models (Li et al. 2010 ). Among them, the Latent Dirichlet Allocation (LDA) method is cited most frequently. Previous studies found that the LDA method can be effective in subdividing topics and identifying the sentiments of the contents. This method is quite general, and there are also many improved models based on this one that can be applied to any type of web text, helping to enhance the accuracy of sentiment polarity calculation (Chen et al. 2019 ; Liu et al. 2020 ).

As the COVID-19 pandemic has unfolded, a large number of individuals, media and governments have been publishing news and opinions about the COVID-19 crisis on social media platforms. This has resulted in a lot of sentiment analysis studies focusing on COVID-19-related texts exploring the impact of the epidemic on people’s lives (Sari and Ruldeviyani 2020 ; Wang, T. et al. 2020a ), physical health (Berkovic et al. 2020 ; Binkheder et al. 2021 ) and mental health (Yin et al. 2020 ), and so on. Therefore, we can see many related keywords, such as "infodemiology," "healthcare," and "mental health."

4.1.2.2 Analysis on research methods and topics of the C2 community

The contents of the C2 community mainly focus on "machine learning," "text classification," "feature extraction," and "stock market" (see Fig. 5 ). Most keywords are related to the research methods of sentiment analysis. Machine learning approaches have expanded from topic recognition to more challenging tasks such as sentiment classification. It is very important to explore and compare machine learning methods applied to sentiment classification (Li and Sun 2007 ). Methods like Support Vector Machine (SVM) and Naive Bayes models are widely used (Altrabsheh et al. 2013 ; Dereli et al. 2021 ; Shofiya and Abidi 2021 ; Tan et al. 2009 ; Wang and Lin 2020 ) and are used as benchmarks for the comparisons of models proposed by many researchers (Kumar et al. 2021 ; Sadamitsu et al. 2008 ; Waila et al. 2012 ; Zhang et al. 2019 ). Many algorithms, such as random forest (Al Amrani et al. 2018 ; Fitri et al. 2019 ; Sutoyo et al. 2022 ), tf-idf (Arafin Mahtab et al. 2018 ; Awan et al. 2021 ; Dey et al. 2017 ), logistic regression (Prabhat and Khullar 2017 ; Qasem et al. 2015 ; Sutoyo et al. 2022 ), and n-gram (Ikram and Afzal 2019 ; Singh and Kumari 2016 ; Xiong et al. 2021 ) are used to enhance the accuracy of machine learning, as shown in Fig. 5 .

The keyword co-occurrence network for the C2 community

The trading volume and asset prices of financial commodities or financial instruments are influenced by a variety of factors in the online environment. Machine learning and sentiment analysis are powerful tools that can help gather vast amounts of useful information to predict financial risk effectively (Li et al. 2009 ). Research on the relationship between public sentiment and stock prices has always been the focus of many scholars (Smailović et al. 2014 ; Xing et al. 2018 ). They have used machine learning methods to explore the influence of sentiments on stock prices through sentiment analysis of news articles, and then predicted the trend changes in the stock market (Ahuja et al. 2015 ; Januário et al. 2022 ; Maqsood et al. 2020 ; Picasso et al. 2019 ).

4.1.2.3 Analysis on research methods and topics of the C3 community

The contents of the C3 community also mainly focus on the methods for sentiment analysis, like "natural language processing", "deep learning," "aspect-based sentiment analysis," and "task analysis" (Fig. 6 ). Sentiment analysis is a sub-field of natural language processing (Nicholls and Song 2010 ), and natural language processing techniques have been widely used in sentiment analysis. Using natural language processing technology can help to better parse text features, such as part-of-speech tagging, word sense disambiguation, keyword extraction, inter-word dependency recognition, semantic parsing, and dictionary construction (Abbasi et al. 2011 ; Syed et al. 2010 ; Trilla and Alías 2009 ). With the rise of deep learning technology, researchers began to introduce it to sentiment analysis. Neural network models like LSTM (Al-Dabet et al. 2021 ; Al-Smadi et al. 2019 ; Li and Qian 2016 ; Schuller et al. 2015 ; Tai et al. 2015 ), CNN (Cai and Xia 2015 ; Jia and Wang 2022 ; Ouyang et al. 2015 ), RNN (Hassan and Mahmood 2017 ; Tembhurne and Diwan 2021 ; You et al. 2016 ), and some combination of these, as well as other models (An and Moon 2022 ; Li et al. 2022 ; Liu et al. 2020a ; Salur and Aydin 2020 ; Zhao et al. 2021 ), have received significant attention.

The keyword co-occurrence network for the C3 community

Sentiment analysis granularity is subdivided into document level, sentence level, and aspect level. Document-level sentiment analysis takes the entire document as a unit, but the premise is that the document needs to have a clear attitude orientation—that is, the point of view needs to be clear (Shirsat et al. 2018 ; Wang and Wan 2011 ). Sentence-level sentiment analysis is intended to perform sentiment analysis of the sentences in the document alone (Arulmurugan et al. 2019 ; Liu et al. 2009 ; Nejat et al. 2017 ). Aspect-based analysis is a fundamental and significant task in sentiment analysis. The aim of aspect-level sentiment analysis is to separately summarize positive and negative views about different aspects of a product or entity, although overall sentiment toward a product or entity may tend to be positive or negative (Rao et al. 2021 ; Thet et al. 2010 ). Aspect-level sentiment analysis facilitates a more finely-grained analysis of sentiment than either document or sentence-level analysis (Liang et al. 2022 ; Wang et al. 2020c ). The traditional levels of analysis, such as sentence-level analysis can only calculate the comprehensive sentiment polarity of paragraphs or sentences (Wang et al. 2016 ; Zhang et al. 2021 ). In recent years, the aspect level has become more and more popular, and with the application of deep learning technology, it has become better at capturing the semantic relationship between aspect terms and words in a more quantifiable way (Huang et al. 2018 ). The process of sentiment analysis involves the coordination of multiple tasks, and the subtasks include feature extraction (Bouktif et al. 2020 ; Lin et al. 2020 ), context analysis (Yu et al. 2019 ; Zuo et al. 2020 ), and the application of some analytical models (Tan et al. 2020 ).

4.1.2.4 Analysis on research methods and topics of the C4 community

The C4 community mainly shows keywords related to the research methods and topics of "opinion mining" and "user review," which is the largest of the six sub-communities (Fig. 7 ). With the popularity of platforms like online review sites and personal blogs on the Internet, opinions and user reviews are readily available on the web. Opinion mining has always been a hot field of research (Khan et al. 2009 ; Poria et al. 2016 ). From Table 4 , we can see that the link between C3 and C4 has 1306 lines. In opinion mining, researchers use many text mining methods to discover users’ opinions on goods or services, and then help improve the quality of corresponding products or services (Da’u et al. 2020 ; Lo and Potdar 2009 ; Martinez-Camara et al. 2011 ). In addition, scholars have found that the consideration of user opinions can help improve the overall quality of recommender systems (Artemenko et al. 2020 ; Da’u et al. 2020 ; Garg 2021 ; Malandri et al. 2022 ). Therefore, "recommendation system" has a strong correlation with "opinion mining."

The keyword co-occurrence network for C4 community

Evaluation metrics for quantifying the existing approaches are also a popular topic related to opinion mining. There is a keyword named "performance sentiment" in the C4 community. Precision, recall, accuracy and F1-score are the most commonly used evaluation metrics (Dangi et al. 2022 ; Jain et al. 2022 ; JayaLakshmi and Kishore 2022 ; Li et al. 2017 ; Wang et al. 2021 ; Yi and Niblack 2005 ). Some researchers have also used runtimes to calculate the model efficiency (Abo et al. 2021 ; Ferilli et al. 2015 ), p-value to statistically evaluate the relationship or difference between two samples of classification results (JayaLakshmi and Kishore 2022 ; Salur and Aydin 2020 ), paired sample t-tests to verify that the results are not obtained by chance (Nhlabano and Lutu 2018 ), and standard deviation to measure the stability of the model (Chang et al. 2020 ). There have also been researchers who have used G-mean (Wang et al. 2021 ), Pearson Correlation Coefficient (Corr) (Yang et al. 2022 ), Mean Absolute Error (MAE) (Yang et al. 2022 ), Normalized Information Transfer (NIT) and Entropy-Modified Accuracy (EMA) (Valverde-Albacete et al. 2013 ), Mean Squared Error (MSE) (Mao et al. 2022 ), Hamming loss (Liu and Chen 2015 ), Area Under the Curve (AUC) (Abo et al. 2021 ), sensitivity and specificity (Thakur and Deshpande 2019 ), etc.

4.1.2.5 Analysis on research methods and topics of the C5 & C6 communities

Both sub-communities C5 (Fig. 8 ) and C6 (Fig. 9 ) are small in size. The C5 community has 25 nodes and the C6 community has 41 nodes. The core content of the C5 community is "Arabic sentiment analysis." Before 2011, most resources and systems built in the field of sentiment analysis were tailored to English and other Indo-European languages. It is increasingly necessary to design sentiment analysis systems for other languages (Korayem et al. 2012 ), and researchers are increasingly interested in the study of tweets and texts in the Arabic language (Heikal et al. 2018 ; Khasawneh et al. 2013 ; Oueslati et al. 2020 ). They use technologies such as named entity recognition (Al-Laith and Shahbaz 2021 ), deep learning (Al-Ayyoub et al. 2018 ; Heikal et al. 2018 ), and corpus construction (Alayba et al. 2018 ) to enhance the accuracy of sentiment analysis.

The keyword co-occurrence network for the C5 community

The keyword co-occurrence network for the C6 community

The contents of the C6 community are not very concentrated. From the size of the circle, we can see that the keywords "domain adaptation"(Blitzer et al. 2007 ; Glorot et al. 2011 ), "domain sentiment," and "cross-domain" appear more frequently. Cross-domain sentiment classification is intended to address the lack of mass labeling data (Du et al. 2020a ). It has attracted much attention (Du et al. 2020b ; Hao et al. 2019 ; Yang et al. 2020b ). Advances in communication technology have provided valuable interactive resources for people in different regions, and the processing of multilingual user comments has gradually become a key challenge in natural language processing (Martinez-Garcia et al. 2021 ). Therefore, some keywords related to "lingual" have appeared. Other keywords, such as "transfer learning," "active learning," and "semi-supervised learning," are mainly related to sentiment analysis technologies.

4.2 Evolution of research methods and topics of sentiment analysis

4.2.1 overall evolution analysis.

Annual changes in keyword frequency in sentiment analysis research can reflect the evolution of research methods and topics in this field. Based on the keyword community network (Fig. 3 ), we counted the frequency of keywords in each sub-community for each year. The keyword community evolution diagram is shown in Fig. 10 . Since there were fewer papers published before 2006, we combined the occurrences of keywords from 2002 to 2006. We can see that the C1 community and the C3 community have shown a significant growth trend. The C2 community was in a state of growth until 2019, and the frequency of keywords decreased year by year after 2019. The frequency of C4 community keywords continued to increase until 2018 and declined after 2018. The number of keywords in the C5 community and in the C6 community both had a slow growth trend, but the trend was not obvious.

Keyword community evolution diagram

4.2.2 Evolution analysis of sub-communities

We selected the high-frequency keywords under each category and plotted the change of word frequency in each year, as shown in Figs. 11 and 12 . In the C1 community, "social medium," "Twitter," "social network," "covid-19," "Latent Dirichlet Allocation," "topic model," and "text analysis" all had significant increases in word frequency, and the growth trend in 2021 was obvious. "Covid-19" appears in 2020, and the word frequency increased rapidly in 2021. Social media platforms have always been the focus of researchers’ attention. Under the influence of COVID-19, more people express their emotions, stress, and thoughts through social media platforms. Sentiment analysis on data from social media platforms related to COVID-19 has become a hot topic (Boon-Itt and Skunkan 2020 ). We believe that due to the impact of COVID-19, the widespread use of social platforms in 2020–2021 has led to a surge in the number of C1-related keywords.

C1, C2, C5, C6 communities: High-frequency keyword evolution diagram

C3, C4 communities: High-frequency keyword evolution diagram

The C2 community focuses on the method of "machine learning," and the C3 community focuses on the methods of "deep learning" and "natural language processing." The keywords in the two communities are mainly related to the techniques and methods of sentiment analysis. We have found that before 2016 (Fig. 10 ), the frequency of keywords in the C2 community was higher than that in the C3 community, and in 2016 and later, the frequency of keywords in the C3 community gradually accounted for a larger proportion of the total. This reflects the fact that deep learning-related technologies and methods have become a research hotspot, and the attention given to SVM, Naive Bayes, supervised learning, and other technologies in machine learning has declined. In addition to deep learning models such as Bi-LSTM, Long Short-term Memory, and recurrent neural network in the C3 community, the number of "aspect based" and "feature extraction" keywords have also been growing, which shows that researchers now pay more attention to the aspect level of text granularity in the field of sentiment analysis.

Among the keywords found in the C4 community, the word frequency of the "opinion mining" keyword has decreased since 2018. This shows that in the field of sentiment analysis, researchers have begun to reduce the attention they give to sentiment analysis of opinions on product or service quality, while still maintaining a certain degree of attention to "user review" and "online review." In addition, the number of keywords for "sentiment lexicon" and "lexicon-based" has declined. It may be because, in the context of the widespread application of deep learning technology in recent years, the lexicon-based method requires more time and higher labor costs (Kaity and Balakrishnan 2020 ). However, its accuracy still attracts attention due to the high involvement of experts, especially in non-English languages (Bakar et al. 2019 ; Kydros et al. 2021 ; Piryani et al. 2020 ; Tammina 2020 ; Xing et al. 2019 ; Yurtalan et al. 2019 ).

The high-frequency keywords in the C5 and C6 communities are "Arabic language," "Arabic sentiment analysis," and "transfer learning." Arabic has 30 variants, including the official Modern Standard Arabic (MSA) (ISO 639–3 2017). Arabic dialects are becoming increasingly popular as the language of informal communication on blogs, forums, and social media networks (Lulu and Elnagar 2018 ). This makes them challenging languages for natural language processing and sentiment analysis (Alali et al. 2019 ; Elshakankery and Ahmed 2019 ; Sayed et al. 2020 ). Transfer learning can solve the problem by leveraging knowledge obtained from a large-scale source domain to enhance the classification performance of target domains (Heaton 2018 ). In recent years, based on the success of deep learning technology, this method has gradually attracted attention.

5 Research hotspots and trends

Through the analysis in Sects. 4.1 and 4.2 , we found that the research methods and topics of sentiment analysis are constantly changing. The keyword topic heat map is shown in Fig. 13 . From this map, we can see that in the past two decades, research hotspots have included social media platforms (such as "social medium," "social network," and "Twitter"); sentiment analysis techniques and methods (such as "machine learning," "svm," "natural language processing," "deep learning," "aspect-based," "text mining," and "sentiment lexicon"), mining of user comments or opinions (e.g., "opinion mining," "user review," and "online review"), and sentiment analysis for non-English languages (e.g., "Arabic sentiment analysis" and "Arabic language").

Keyword topic heat map

With the popularity of digitization, a large amount of user-generated content has appeared on the Internet, where users express their opinions and comments on different topics such as the news, events, activities, products, services, etc. through social media. This is especially so in the case of the Twitter mobile platform, launched in 2006, which has become the most popular social channel (Kumar and Jaiswal 2020 ). However, online text data is mostly unstructured. In order to accurately analyze users’ sentiments, the research methods for sentiment analysis, such as natural language processing technology, and automatic sentiment analysis models have become the focus of researchers’ works. From Fig. 11 , we can see that early technologies and methods are dominated by machine learning and that SVM and Naive Bayes have always been favored by researchers. This has also been confirmed in studies by Neha Raghuvanshi (Raghuvanshi and Patil 2016 ), Harpreet Kaur (Kaur et al. 2017 ), and Marouane Birjali (Birjali et al. 2021 ). With the improvement of neural network and artificial intelligence technology, deep learning technology has been widely used in sentiment analysis, and has resulted in good outcomes (Basiri et al. 2021 ; Ma et al. 2018 ; Prabha and Srikanth 2019 ; Yuan et al. 2020 ). However, deep learning technology still has room for improvement, and the hybrid methods combining sentiment dictionary and semantic analysis are gradually becoming a trend (Prabha and Srikanth 2019 ; Yang et al. 2020a ).

The granularity of sentiment analysis ranges from the early text level to the sentence level and finally to the aspect level, which is currently gaining strong attention. The granularity of sentiment analysis is gradually being refined, but the method is immature at present, and further research work in the future is needed (Agüero-Torales et al. 2021 ; Li et al. 2020 ; Trisna and Jie 2022 ).

Early sentiment analysis was mainly in the English language. In recent years, non-English languages such as Chinese (Lai et al. 2020 ; Peng et al. 2018 ), French (Apidianaki et al. 2016 ; Pecore and Villaneau 2019 ), Spanish (Chaturvedi et al. 2016 ; Plaza-del-Arco et al. 2020 ), Russian (Smetanin 2020 ), and Arabic (Alhumoud and Al Wazrah 2022 ; Ombabi et al. 2020 ) have attracted more and more attention. Furthermore, cross-domain sentiment analysis technology is in urgent need of research and discussion by researchers (Liu et al. 2019 ; Singh et al. 2021 ).

6 Conclusion and future work

6.1 conclusion.

Judging from the increasing number of papers related to sentiment analysis research every year, sentiment analysis has been on the rise. Although there are many surveys on sentiment analysis research, there has not been a survey dedicated to the evolution of research methods and topics of sentiment analysis. This paper has used keyword co-occurrence analysis and the informetric tools to enrich the perspectives and methods of previous studies. Its aims have been to outline the evolution of the research methods and tools, research hotspots and trends and to provide research guidance for researchers.

By adopting keyword co-occurrence analysis and community detection methods, we analyzed the research methods and topics of sentiment analysis, as well as their connections and evolution trends, and summarized the research hotspots and trends in sentiment analysis. We found that research hotspots include social media platforms, sentiment analysis techniques and methods, mining of user comments or opinions, and sentiment analysis for non-English languages. Moreover, deep learning technology, with its hybrid methods combining sentiment dictionary and semantic analysis, fine-grained sentiment analysis methods, and non-English language analysis methods, and cross-domain sentiment analysis techniques have gradually become the research trends.

6.2 Practical implications and technical directions of sentiment analysis

Sentiment analysis has a wide range of application targets, such as e-commerce platforms, social platforms, public opinion platforms, and customer service platforms. Years of development have led to many related tasks in sentiment analysis, such as sentiment analysis of different text granularity, sentiment recognition, opinion mining, dialogue sentiment analysis, irony recognition, false information detection, etc. Such analysis can help structure user reviews, support product improvement decisions, discover public opinion hotspots, identify public positions, investigate user satisfaction with products, and so on. As long as user-generated content is involved, sentiment analysis technology can be used to mine the emotions of human actors associated with the content. The improvement of sentiment analysis technology can help machines better understand the thoughts and opinions of users, make machines more intelligent, and make better decisions for policy leaders, businessmen, and service people. However, most of the current sentiment analysis methods are based on sentiment dictionaries, sentiment rules, statistics-based machine learning models, neural network-based deep learning models, and pre-training models, and have yet to achieve true language understanding in the sense of comprehension at the deep semantic level, though this does not prevent them from being useful in certain practical applications.

As an important task in natural language understanding, sentiment analysis has received extensive attention from academia and industry. Coarse-grained sentiment analysis is increasingly unable to meet people's decision-making needs, and for aspect-level sentiment analysis and complex tasks, pure machine learning is still unable to flexibly achieve true language understanding. Once the scene or domain changes, problems such as the domain incompatibility of the sentiment dictionary and the low transfer effect of the model involved keep appearing. At present, the accuracy of sentiment analysis provided by machines is far less than that of humans. To achieve human-like performance for machines, we believe that it is necessary to incorporate human commonsense knowledge and domain knowledge, as well as grounded definitions of concepts, in order for machines to understand natural language at a deeper level. These, combined with rules for affective reasoning to supplement interpretable information, will be effective in improving the performance of sentiment analysis. Future research in this direction can be strengthened to achieve true language understanding in machines.

6.3 Limitations and future work

There are some research limitations in this paper. First, we only studied papers written in English and searched from the Web of Science platform. We believe there are papers in other languages or other databases (e.g., Scopus, PubMed, Sci-hub, etc.) that also involve sentiment analysis but that were not included in our study. In addition, the keywords we chose to search in the Web of Science were mainly "sentiment analysis," "sentiment mining," and "sentiment classification." There may be papers related to our research topic that do not have these keywords. To track developments in sentiment analysis research, future studies could replicate this work by employing more precise keywords and using different literature databases.

Second, we selected the main high-frequency keywords for analysis, and some important low-frequency keywords may have been ignored. In future work, we can analyze the changes in each keyword in detail from the perspective of time and obtain more comprehensive analysis results.

Third, the results show that the themes of sentiment analysis cover many fields, such as computer science, linguistics, and electrical engineering, which indicates the trend of interdisciplinary research. Therefore, future work should apply co-citation and diversity measures to explore the interdisciplinary nature of sentiment analysis research.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

https://github.com/MaartenGr/KeyBERT .

https://homepage.univie.ac.at/juan.gorraiz/bibexcel/ .

http://mrvar.fdv.uni-lj.si/pajek/ .

https://www.vosviewer.com/ .

Abbasi A, France S, Zhang Z, Chen H (2011) Selecting attributes for sentiment classification using feature relation networks. IEEE Trans Knowl Data Eng 23(3):447–462. https://doi.org/10.1109/TKDE.2010.110

Article Google Scholar

Abdullah NSD, Zolkepli IA (2017) Sentiment analysis of online crowd input towards Brand Provocation in Facebook, Twitter, and Instagram. In: Proceedings of the international conference on big data and internet of thing, association for computing machinery, pp 67–74. https://doi.org/10.1145/3175684.3175689

Abo MEM, Idris N, Mahmud R, Qazi A, Hashem IAT, Maitama JZ et al (2021) A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: exploiting optimal machine learning algorithm selection. Sustainability 13(18):10018. https://doi.org/10.3390/su131810018

Abrahams AS, Jiao J, Wang GA, Fan W (2012) Vehicle defect discovery from social media. Decis Support Syst 54(1):87–97. https://doi.org/10.1016/j.dss.2012.04.005

Acheampong FA, Nunoo-Mensah H, Chen W (2021) Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev 54(8):5789–5829. https://doi.org/10.1007/s10462-021-09958-2

Adak A, Pradhan B, Shukla N (2022) Sentiment analysis of customer reviews of food delivery services using deep learning and explainable artificial intelligence: systematic review. Foods 11(10):1500. https://doi.org/10.3390/foods11101500

Agüero-Torales MM, Salas JIA, López-Herrera AG (2021) Deep learning and multilingual sentiment analysis on social media data: an overview. Appl Soft Comput 107:107373. https://doi.org/10.1016/j.asoc.2021.107373

Ahuja R, Rastogi H, Choudhuri A, Garg B (2015) Stock market forecast using sentiment analysis. In: 2015 2nd International conference on computing for sustainable global development, INDIACom 2015, Bharati Vidyapeeth, New Delhi, pp 1008–1010. https://doi.org/10.48550/arXiv.2204.05783

Ain QT, Ali M, Riaz A, Noureen A, Kamranz M, Hayat B et al (2017) Sentiment analysis using deep learning techniques: a review. Int J Adv Comput Sci Appl 8(6):424–433. https://doi.org/10.14569/ijacsa.2017.080657

Al-Ayyoub M, Nuseir A, Alsmearat K, Jararweh Y, Gupta B (2018) Deep learning for Arabic NLP: a survey. J Comput Sci 26:522–531. https://doi.org/10.1016/j.jocs.2017.11.011

Al-Ayyoub M, Khamaiseh AA, Jararweh Y, Al-Kabi MN (2019) A comprehensive survey of Arabic sentiment analysis. Inf Process Manag 56(2):320–342. https://doi.org/10.1016/j.ipm.2018.07.006

Al-Dabet S, Tedmori S, AL-Smadi M (2021) Enhancing Arabic aspect-based sentiment analysis using deep learning models. Comput Speech Lang 69:1224. https://doi.org/10.1016/j.csl.2021.101224

Al-Laith A, Shahbaz M (2021) Tracking sentiment towards news entities from Arabic news on social media. Futur Gener Comput Syst 118:467–484. https://doi.org/10.1016/j.future.2021.01.015

Al-Smadi M, Talafha B, Al-Ayyoub M, Jararweh Y (2019) Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews. Int J Mach Learn Cybern 10(8):2163–2175. https://doi.org/10.1007/s13042-018-0799-4

Alali M, Sharef NM, Murad MAA, Hamdan H, Husin NA (2019) Narrow convolutional neural network for Arabic dialects polarity classification. IEEE Access 7:96272–96283. https://doi.org/10.1109/ACCESS.2019.2929208

Alamoodi AH, Zaidan BB, Al-Masawa M, Taresh SM, Noman S, Ahmaro IYY et al (2021a) Multi-perspectives systematic review on the applications of sentiment analysis for vaccine hesitancy. Comput Biol Med 139:104957. https://doi.org/10.1016/j.compbiomed.2021.104957

Alamoodi AH, Zaidan BB, Zaidan AA, Albahri OS, Mohammed KI, Malik RQ et al (2021b) Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: a systematic review. Expert Syst Appl 167:114155. https://doi.org/10.1016/j.eswa.2020.114155

Alayba AM, Palade V, England M, Iqbal R (2018) Improving sentiment analysis in arabic using word representation. In: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), IEEE, pp 13–18. https://doi.org/10.1109/ASAR.2018.8480191

Alhumoud SO, Al Wazrah AA (2022) Arabic sentiment analysis using recurrent neural networks: a review. Artif Intell Rev 55(1):707–748. https://doi.org/10.1007/s10462-021-09989-9

Alonso MA, Vilares D, Gómez-Rodríguez C, Vilares J (2021) Sentiment analysis for fake news detection. Electronics 10(11):1348. https://doi.org/10.3390/electronics10111348

Altrabsheh N, Gaber MM, Cocea M (2013) SA-E: sentiment analysis for education. In: The 5th KES International Conference on Intelligent Decision Technologies (KES-IDT), Sesimbra, Portugal, pp 353–362. https://doi.org/10.3233/978-1-61499-264-6-353

Al Amrani Y, Lazaar M, El Kadirp KE (2018) Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput Sci 127:511–520. https://doi.org/10.1016/j.procs.2018.01.150

An H, Moon N (2022) Design of recommendation system for tourist spot using sentiment analysis based on CNN-LSTM. J Ambient Intell Hum Comput 13:1653–1663. https://doi.org/10.1007/s12652-019-01521-w

Angel SO, Negron APP, Espinoza-Valdez A (2021) Systematic literature review of sentiment analysis in the spanish language. Data Technol Appl 55(4):461–479. https://doi.org/10.1108/DTA-09-2020-0200

Apidianaki M, Tannier X, Richart C (2016) Datasets for aspect-based sentiment analysis in French. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16), Portorož, Slovenia: European Language Resources Association (ELRA), pp 1122–1126. https://aclanthology.org/L16-1179

Arafin Mahtab S, Islam N, Mahfuzur Rahaman M (2018) Sentiment analysis on Bangladesh cricket with support vector machine. In: 2018 International conference on Bangla Speech and language processing (ICBSLP), IEEE, pp 1–4. https://doi.org/10.1109/ICBSLP.2018.8554585

Artemenko O, Pasichnyk V, Kunanets N, Shunevych K (2020) Using sentiment text analysis of user reviews in social media for E-Tourism mobile recommender systems. In: COLINS, CEUR-WS, Aachen, pp 259–271. http://ceur-ws.org/Vol-2604/paper20.pdf

Arulmurugan R, Sabarmathi KR, Anandakumar H (2019) Classification of sentence level sentiment analysis using cloud machine learning techniques. Clust Comput 22(1):1199–1209. https://doi.org/10.1007/s10586-017-1200-1

Asghar MZ, Khan A, Ahmad S, Kundi FM (2014) A review of feature selection techniques in sentiment analysis. J Basic Appl Sci Res 4(3):181–186. https://doi.org/10.3233/IDA-173763

Awan MJ, Yasin A, Nobanee H, Ali AA, Shahzad Z, Nabeel M et al (2021) Fake news data exploration and analytics. Electronics 10(19):2326. https://doi.org/10.3390/electronics10192326

Bai H, Yu G (2016) A Weibo-based approach to disaster informatics: incidents monitor in post-disaster situation via weibo text negative sentiment analysis. Nat Hazards 83(2):1177–1196. https://doi.org/10.1007/s11069-016-2370-5

Bakar MFRA, Idris N, Shuib L (2019) An enhancement of Malay social media text normalization for Lexicon-based sentiment analysis. In: 2019 International conference on Asian language processing (IALP), IEEE, pp 211–215. https://doi.org/10.1109/IALP48816.2019.9037700

Bar-Ilan J (2008) Informetrics at the beginning of the 21st century—a review. J Informet 2(1):1–52. https://doi.org/10.1016/j.joi.2007.11.001

Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294. https://doi.org/10.1016/j.future.2020.08.005

Batagelj V, Andrej M (2022) Pajek [Software]. http://mrvar.fdv.uni-lj.si/pajek/

Batagelj V, Mrvar A (1998) Pajek-program for large network analysis eds. M. Jünger and P Mutzel. Connections 21(2): 47–57. http://vlado.fmf.uni-lj.si/pub/networks/doc/pajek.pdf

Bengtsson M (2016) How to plan and perform a qualitative study using content analysis. NursingPlus Open 2:8–14. https://doi.org/10.1016/j.npls.2016.01.001

Berkovic D, Ackerman IN, Briggs AM, Ayton D (2020) Tweets by people with arthritis during the COVID-19 pandemic: content and sentiment analysis. J Med Internet Res 22(12):e24550. https://doi.org/10.2196/24550

Binkheder S, Aldekhyyel RN, Almogbel A, Al-Twairesh N, Alhumaid N, Aldekhyyel SN et al (2021) Public perceptions around Mhealth applications during Covid-19 pandemic: a network and sentiment analysis of tweets in Saudi Arabia. Int J Environ Res Public Health 18(24):1–22. https://doi.org/10.3390/ijerph182413388

Birjali M, Kasri M, Beni-Hssane A (2021) A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl Based Syst 226:107134. https://doi.org/10.1016/j.knosys.2021.107134

Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: 45th Annual Meeting of the association of computational linguistics, association for computational linguistics, pp 440–447. https://doi.org/10.1287/ijoc.2013.0585

Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008

Article MATH Google Scholar

Boon-Itt S, Skunkan Y (2020) Public perception of the COVID-19 pandemic on Twitter: sentiment analysis and topic modeling study. JMIR Public Health Surv 6(4):1978. https://doi.org/10.2196/21978

Boudad N, Faizi R, Thami ROH, Chiheb R (2018) Sentiment analysis in Arabic: a review of the literature. Ain Shams Eng J 9(4):2479–2490. https://doi.org/10.1016/j.asej.2017.04.007

Bouktif S, Fiaz A, Awad M (2020) Augmented textual features-based stock market prediction. IEEE Access 8:40269–40282. https://doi.org/10.1109/ACCESS.2020.2976725

Brito KDS, Filho RLCS, Adeodato PJL (2021) A systematic review of predicting elections based on social media data: research challenges and future directions. IEEE Trans Comput Soc Syst 8(4):819–843. https://doi.org/10.1109/TCSS.2021.3063660

Cai G, Xia B (2015) Convolutional neural networks for multimedia sentiment analysis. In: Natural Language Processing and Chinese Computing, Springer, Cham, p 159–167. https://doi.org/10.1007/978-3-319-25207-0_14

Callon M, Courtial J-P, Turner WA, Bauin S (1983) From translations to problematic networks: an introduction to co-word analysis. Soc Sci Inf 22(2):191–235. https://doi.org/10.1177/053901883022002003

Cambria E, Liu Q, Decherchi S, Xing F, Kwok K (2022a) SenticNet 7: a commonsense-based neurosymbolic AI Framework for Explainable Sentiment Analysis. In: LREC, Marseille: European Language Resources Association (ELRA), pp 3829–3839. https://sentic.net/senticnet-7.pdf

Cambria E, Dragoni M, Kessler B, Donadello I (2022b) Ontosenticnet 2: enhancing reasoning within sentiment analysis. IEEE Intell Syst 37(2):103–110. https://doi.org/10.1109/MIS.2021.3093659

Cambria E, Kumar A, Al-Ayyoub M, Howard N (2022c) Guest editorial: explainable artificial intelligence for sentiment analysis. Knowl Based Syst 238(3):107920. https://doi.org/10.1016/j.knosys.2021.107920

Cambria E, Xing F, Thelwall M, Welsch R (2022d) Sentiment analysis as a multidisciplinary research area. IEEE Trans Artif Intell 3(2):1–4

Google Scholar

Chan JY-L, Bea KT, Leow SMH, Phoong SW, Cheng WK (2022) State of the art: a review of sentiment analysis based on sequential transfer learning. Artif Intell Rev. https://doi.org/10.1007/s10462-022-10183-8

Chang J-R, Liang H-Y, Chen L-S, Chang C-W (2020) Novel feature selection approaches for improving the performance of sentiment classification. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-020-02468-z

Chaturvedi I, Cambria E, Vilares D (2016) Lyapunov filtering of objectivity for Spanish sentiment model. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), IEEE, pp 4474–4481. https://doi.org/10.1109/IJCNN.2016.7727785

Chen Z, Teng S, Zhang W, Tang H, Zhang Z, He J, et al (2019) LSTM sentiment polarity analysis based on LDA clustering. In: Communications in Computer and Information Science, Springer, Singapore, pp 342–355. https://doi.org/10.1007/978-981-13-3044-5_25

Cheng WK, Bea KT, Leow SMH, Chan JY-L, Hong Z-W, Chen Y-L (2022) A review of sentiment, semantic and event-extraction-based approaches in stock forecasting. Mathematics 10(14):2437. https://doi.org/10.3390/math10142437

Da’u A, Salim N, Rabiu I, Osman A (2020) Recommendation System Exploiting Aspect-Based Opinion Mining with Deep Learning Method. Inf Sci 512:1279–1292. https://doi.org/10.1016/j.ins.2019.10.038

Dangi D, Bhagat A, Dixit DK (2022) Sentiment analysis of social media data based on chaotic coyote optimization algorithm based time weight-adaboost support vector machine approach. Concurr Comput 34(3):6581. https://doi.org/10.1002/cpe.6581

Deng S, Xia S, Hu J, Li H, Liu Y (2021) Exploring the topic structure and evolution of associations in information behavior research through co-word analysis. J Librariansh Inf Sci 53(2):280–297. https://doi.org/10.1177/0961000620938120

Dereli T, Eligüzel N, Çetinkaya C (2021) Content analyses of the international federation of Red Cross and Red Crescent Societies (Ifrc) based on machine learning techniques through Twitter. Nat Hazards 106(3):2025–2045. https://doi.org/10.1007/s11069-021-04527-w

Dey A, Jenamani M, Thakkar JJ (2017) Lexical Tf-Idf: An n-Gram Feature Space for Cross-Domain Classification of Sentiment Reviews. In: International Conference on Pattern Recognition and Machine Intelligence, Springer, Cham, pp 380–386. https://doi.org/10.1007/978-3-319-69900-4_48

Ding Y, Chowdhury GG, Foo S (2001) Bibliometric cartography of information retrieval research by using co-word analysis. Inf Process Manag 37(6):817–842. https://doi.org/10.1016/S0306-4573(00)00051-0

Du C, Sun H, Wang J, Qi Q, Liao J (2020a) Adversarial and domain-aware BERT for cross-domain sentiment analysis. In: Proceedings of the 58th Annual meeting of the association for computational linguistics, association for computational linguistics, p 4019–4028. https://doi.org/10.18653/v1/2020a.acl-main.370

Du Y, He M, Wang L, Zhang H (2020b) Wasserstein based transfer network for cross-domain sentiment classification. Knowl Based Syst 204:6162. https://doi.org/10.1016/j.knosys.2020.106162

Elo S, Kyngäs H (2008) The qualitative content analysis process. J Adv Nurs 62(1):107–115. https://doi.org/10.1111/j.1365-2648.2007.04569.x

Elshakankery K, Ahmed MF (2019) HILATSA: a hybrid incremental learning approach for arabic tweets sentiment analysis. Egypt Inform J 20(3):163–171. https://doi.org/10.1016/j.eij.2019.03.002

Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89. https://doi.org/10.1145/2436256.2436274

Ferilli S, De Carolis B, Esposito F, Redavid D (2015) Sentiment analysis as a text categorization task: a study on feature and algorithm selection for Italian language. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), IEEE, pp 1–10. https://doi.org/10.1109/DSAA.2015.7344882

Fink C, Bos N, Perrone A, Liu E, Kopecky J (2013) Twitter, public opinion, and the 2011 Nigerian Presidential Election. In: 2013 International conference on social computing, IEEE, pp 311–320. https://doi.org/10.1109/SocialCom.2013.50

Fitri VA, Andreswari R, Hasibuan MA (2019) Sentiment analysis of social media Twitter with case of anti-LGBT campaign in Indonesia using Naïve Bayes, Decision Tree, and Random Forest Algorithm. Procedia Comput Sci 161:765–772. https://doi.org/10.1016/j.procs.2019.11.181

Garg S (2021) Drug recommendation system based on sentiment analysis of drug reviews using machine learning. In: 2021 11th International conference on cloud computing, data science & engineering (confluence), IEEE, pp 175–181. https://doi.org/10.1109/Confluence51648.2021.9377188

Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: 28th International Conference on Machine Learning, International Machine Learning Society (IMLS), pp 513–520. https://dl.acm.org/doi/ https://doi.org/10.5555/3104482.3104547

Grootendorst M, Warmerdam VD (2021) MaartenGr/KeyBERT (Version 0.5) [Computer program]. https://doi.org/10.5281/ZENODO.5534341 .

Groshek J, Al-Rawi A (2013) Public sentiment and critical framing in social media content during the 2012 US Presidential Campaign. Soc Sci Comput Rev 31(5):563–576. https://doi.org/10.1177/0894439313490401

Habimana O, Li Y, Li R, Gu X, Yu G (2020) Sentiment analysis using deep learning approaches: an overview. Sci China Inf Sci 63(1):1–36. https://doi.org/10.1007/s11432-018-9941-6

Hao Y, Mu T, Hong R, Wang M, Liu X, Goulermas JY (2019) Cross-domain sentiment encoding through stochastic word embedding. IEEE Trans Knowl Data Eng 32(10):1909–1922. https://doi.org/10.1109/TKDE.2019.2913379

Hassan A, Mahmood A (2017) Efficient deep learning model for text classification based on recurrent and convolutional layers. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp 1108–1113. https://doi.org/10.1109/ICMLA.2017.00009

Heaton J (2018). Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep Learning. Genetic Programming and Evolvable Machines 19: 305–307. https://doi.org/10.1007/s10710-017-9314-z

Heikal M, Torki M, El-Makky N (2018) Sentiment analysis of Arabic tweets using deep learning. Procedia Comput Sci 142:114–122. https://doi.org/10.1016/j.procs.2018.10.466

Huang B, Ou Y, Carley KM (2018) Aspect level sentiment classification with attention-over-attention neural networks. In: International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, Springer, Cham, pp 197–206. https://doi.org/10.1007/978-3-319-93372-6_22

Hussain N, Mirza HT, Rasool G, Hussain I, Kaleem M (2019) Spam review detection techniques: a systematic literature review. Appl Sci 9(5):987. https://doi.org/10.3390/app9050987

Hussein DMEDM (2018) A survey on sentiment analysis challenges. J King Saud Univ 30(4):330–338. https://doi.org/10.1016/j.jksues.2016.04.002

Ikram MT, Afzal MT (2019) Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge. Scientometrics 119(1):73–95. https://doi.org/10.1007/s11192-019-03028-9

Injadat MN, Salo F, Nassif AB (2016) Data mining techniques in social media: a survey. Neurocomputing 214:654–670. https://doi.org/10.1016/j.neucom.2016.06.045

Isah H, Trundle P, Neagu D (2014) Social media analysis for product safety using text mining and sentiment analysis. In: 2014 14th UK Workshop on Computational Intelligence (UKCI), IEEE, pp 1–7. https://doi.org/10.1109/UKCI.2014.6930158

ISO 639-3 (2017) Registration Authority. https://iso639-3.sil.org/

Jain DK, Boyapati P, Venkatesh J, Prakash M (2022) An intelligent cognitive-inspired computing with big data analytics framework for sentiment analysis and classification. Inf Process Manag 59(1):2758. https://doi.org/10.1016/j.ipm.2021.102758

Januário BA, de Carosia AEO, da Silva AEA, Coelho GP (2022) Sentiment analysis applied to news from the Brazilian stock market. IEEE Latin Am Trans 20(3):512–518. https://doi.org/10.1109/TLA.2022.9667151

JayaLakshmi ANM, Kishore KVK (2022) Performance evaluation of DNN with other machine learning techniques in a cluster using apache spark and MLlib. J King Saud Univ 34(1):1311–1319. https://doi.org/10.1016/j.jksuci.2018.09.022

Jia X, Wang L (2022) Attention enhanced capsule network for text classification by encoding syntactic dependency trees with graph convolutional neural network. PeerJ Comput Sci 7:e831. https://doi.org/10.7717/PEERJ-CS.831

Jiang D, Luo X, Xuan J, Xu Z (2017) Sentiment computing for the news event based on the social media big data. IEEE Access 5:2373–2382. https://doi.org/10.1109/ACCESS.2016.2607218

Kaity M, Balakrishnan V (2020) Sentiment Lexicons and non-English languages: a survey. Knowl Inf Syst 62(12):4445–4480. https://doi.org/10.1007/s10115-020-01497-6

Kastrati Z, Dalipi F, Imran AS, Nuci KP, Wani MA (2021) Sentiment analysis of students’ feedback with Nlp and deep learning: a systematic mapping study. Appl Sci 11(9):3986. https://doi.org/10.3390/app11093986

Kaur H, Mangat V, Nidhi (2017) A survey of sentiment analysis techniques. In: 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), IEEE, pp 921–925. https://doi.org/10.1109/I-SMAC.2017.8058315

Khan K, Baharudin BB, Khan A (2009) Mining opinion from text documents: a survey. In: 2009 3rd IEEE International conference on digital ecosystems and technologies, IEEE, pp 217–222. https://doi.org/10.4304/jetwi.5.4.343-353

Khasawneh RT, Wahsheh HA, Al-Kabi MN, Alsmadi IM (2013) Sentiment analysis of Arabic social media content: a comparative study. In: 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013), IEEE, pp 101–106. https://doi.org/10.1109/ICITST.2013.6750171

Khattak A, Asghar MZ, Saeed A, Hameed IA, Asif Hassan S, Ahmad S (2021) A survey on sentiment analysis in Urdu: a resource-poor language. Egypt Inform J 22(1):53–74. https://doi.org/10.1016/j.eij.2020.04.003

Khatua A, Khatua A, Cambria E (2020) Predicting political sentiments of voters from Twitter in multi-party contexts. Appl Soft Comput J 97:106743. https://doi.org/10.1016/j.asoc.2020.106743

Kitchenham B (2004) Procedures for performing systematic reviews, version 1.0. Empir Softw Eng 33(2004):1–26

Kitchenham B, Charters SM (2007) Guidelines for performing systematic literature reviews in software engineering. Tech Rep 5:1–57

Korayem M, Crandall D, Abdul-Mageed M (2012) Subjectivity and sentiment analysis of Arabic: a survey. In: International conference on advanced machine learning technologies and applications, Springer, Berlin, Heidelberg, p 128–139. https://doi.org/10.1007/978-3-642-35326-0_14

Koto F, Adriani M (2015) A comparative study on Twitter sentiment analysis: Which Features Are Good? In: International conference on applications of natural language to information systems, Springer, Cham, p 453–457. https://doi.org/10.1007/978-3-319-19581-0_46

Krippendorff K (2018) Content analysis: an introduction to its methodology. Sage publications.

Kumar A, Garg G (2020) Systematic literature review on context-based sentiment analysis in social multimedia. Multimed Tools Appl 79(21):15349–15380. https://doi.org/10.1007/s11042-019-7346-5

Kumar A, Jaiswal A (2020) Systematic literature review of sentiment analysis on twitter using soft computing techniques. Concurr Comput 32(1):e5107. https://doi.org/10.1002/cpe.5107

Article MathSciNet Google Scholar

Kumar A, Sebastian TM (2012) Sentiment analysis: a perspective on its past, present and future. Int J Intell Syst Appl 4(10):1–14. https://doi.org/10.5815/ijisa.2012.10.01

Kumar A, Narapareddy VT, Gupta P, Srikanth VA, Neti LB, Malapati A (2021) Adversarial and auxiliary features-aware BERT for sarcasm detection. In: 8th ACM IKDD CODS and 26th COMAD, association for computing machinery, p 163–170. https://doi.org/10.1145/3430984.3431024

Kydros D, Argyropoulou M, Vrana V (2021) A content and sentiment analysis of Greek tweets during the pandemic. Sustainability (switzerland) 13(11):6150. https://doi.org/10.3390/su13116150

Lai Y, Zhang L, Han D, Zhou R, Wang G (2020) Fine-grained emotion classification of chinese microblogs based on graph convolution networks. World Wide Web 23(5):2771–2787. https://doi.org/10.1007/s11280-020-00803-0

Leiden University's Centre for Science and Technology Studies (CWTS) (2021) VOSviewer (Version 1.6.17)[Software]. https://www.vosviewer.com/

Leydesdorff L, Park HW, Wagner C (2014) International co-authorship relations in the social science citation index: is internationalization leading the network? J Assoc Inf Sci Technol 65(10):2111–2126. https://doi.org/10.48550/arXiv.1305.4242

Li D, Qian J (2016) Text sentiment analysis based on long short-term memory. In: 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI), IEEE, pp 471–475. https://doi.org/10.1109/CCI.2016.7778967

Li F, Huang M, Zhu X (2010) Sentiment analysis with global topics and local dependency. In: Proceedings of the AAAI Conference on Artificial Intelligence, Atlanta, Georgia, USA: AAAI Press, Palo Alto, California USA, pp 1371–1376. https://doi.org/10.1609/aaai.v24i1.7523

Li J, Sun M (2007) Experimental study on sentiment classification of chinese review using machine learning techniques. In: 2007 International Conference on Natural Language Processing and Knowledge Engineering, IEEE, pp 393–400. https://doi.org/10.1109/NLPKE.2007.4368061

Li N, Liang X, Li X, Wang C, Wu DD (2009) Network environment and financial risk using machine learning and sentiment analysis. Hum Ecol Risk Assess 15(2):227–252. https://doi.org/10.1080/10807030902761056

Li W, Zhu L, Shi Y, Guo K, Cambria E (2020) User reviews: sentiment analysis using Lexicon integrated two-channel CNN–LSTM family models. Appl Soft Comput J 94:6435. https://doi.org/10.1016/j.asoc.2020.106435

Li W, Shao W, Ji S, Cambria E (2022) BiERU: bidirectional emotional recurrent unit for conversational sentiment analysis. Neurocomputing 467:73–82. https://doi.org/10.1016/j.neucom.2021.09.057

Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E (2017) Learning word representations for sentiment analysis. Cogn Comput 9(6):843–851. https://doi.org/10.1007/s12559-017-9492-2

Liang B, Su H, Gui L, Cambria E, Xu R (2022) Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl Based Syst 235:107643. https://doi.org/10.1016/j.knosys.2021.107643

Ligthart A, Catal C, Tekinerdogan B (2021) Systematic reviews in sentiment analysis: a tertiary study. Artif Intell Rev 54(7):4997–5053. https://doi.org/10.1007/s10462-021-09973-3

Lin B, Cassee N, Serebrenik A, Bavota G, Novielli N, Lanza M (2022) Opinion mining for software development: a systematic literature review. ACM Trans Softw Eng Methodol 31(3):1–41. https://doi.org/10.1145/3490388

Lin Y, Li J, Yang L, Xu K, Lin H (2020) Sentiment analysis with comparison enhanced deep neural network. IEEE Access 8:78378–78384. https://doi.org/10.1109/ACCESS.2020.2989424

Liu F, Zheng J, Zheng L, Chen C (2020a) Combining attention-based bidirectional gated recurrent neural network and two-dimensional convolutional neural network for document-level sentiment classification. Neurocomputing 371:39–50. https://doi.org/10.1016/j.neucom.2019.09.012

Liu L, Nie X, Wang H (2012) Toward a fuzzy domain sentiment ontology tree for sentiment analysis. In: 2012 5th International congress on image and signal processing, IEEE, pp 1620–1624. https://doi.org/10.1109/CISP.2012.6469930

Liu R, Shi Y, Ji C, Jia M (2019) A survey of sentiment analysis based on transfer learning. IEEE Access 7:85401–85412. https://doi.org/10.1109/ACCESS.2019.2925059

Liu S, Lee K, Lee I (2020b) Document-level multi-topic sentiment classification of email data with BiLSTM and data augmentation. Knowl Based Syst 197:105918. https://doi.org/10.1016/j.knosys.2020.105918

Liu SM, Chen JH (2015) A multi-label classification based approach for sentiment classification. Expert Syst Appl 42(3):1083–1093. https://doi.org/10.1016/j.eswa.2014.08.036

Liu X, Zeng D, Li J, Wang F-Y, Zuo W (2009) Sentiment analysis of Chinese documents: from sentence to document level. J Am Soc Inform Sci Technol 60(12):2474–2487. https://doi.org/10.1002/asi.21206

Lo YW, Potdar V (2009) A review of opinion mining and sentiment classification framework in social networks. In: 2009 3rd IEEE International conference on digital ecosystems and technologies, IEEE, pp 396–401. https://doi.org/10.1109/DEST.2009.5276705

Lulu L, Elnagar A (2018) Automatic arabic dialect classification using deep learning models. Procedia Comput Sci 142:262–269. https://doi.org/10.1016/j.procs.2018.10.489

Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: 32nd AAAI conference on artificial intelligence, New Orleans, Louisiana, USA: AAAI Press, Palo Alto, California USA, pp 5876–5883. https://doi.org/10.1609/aaai.v32i1.12048

Malandri L, Porcel C, Xing F, Serrano-Guerrero J, Cambria E (2022) Soft computing for recommender systems and sentiment analysis. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2021.108246

Mäntylä MV, Graziotin D, Kuutila M (2018) The evolution of sentiment analysis-a review of research topics, venues and top cited papers. Comput Sci Rev 27:16–32. https://doi.org/10.1016/j.cosrev.2017.10.002

Mao Y, Zhang Y, Jiao L, Zhang H (2022) Document-level sentiment analysis using attention-based bi-directional long short-term memory network and two-dimensional convolutional neural network. Electronics 11(12):1906. https://doi.org/10.3390/electronics11121906

Maqsood H, Mehmood I, Maqsood M, Yasir M, Afzal S, Aadil F et al (2020) A local and global event sentiment based efficient stock exchange forecasting using deep learning. Int J Inf Manag 50:432–451. https://doi.org/10.1016/j.ijinfomgt.2019.07.011

Martinez-Camara E, Martin-Valdivia MT, Urena-Lopez LA (2011) Opinion classification techniques applied to a Spanish Corpus. In: International conference on application of natural language to information systems, Springer, Berlin, Heidelberg, pp 169–176. https://doi.org/10.1007/978-3-642-22327-3_17

Martinez-Garcia A, Badia T, Barnes J (2021) Evaluating morphological typology in zero-shot cross-lingual transfer. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, association for computational linguistics, pp 3136–3153. https://doi.org/10.18653/v1/2021.acl-long.244

Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113. https://doi.org/10.1016/j.asej.2014.04.011

Momtazi S (2012) Fine-grained German sentiment analysis on social media. In: Proceedings of the 8th International conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), pp 1215–1220. http://www.lrec-conf.org/proceedings/lrec2012/pdf/999_Paper.pdf

Myslin M, Zhu SH, Chapman W, Conway M (2013) Using Twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Int Res 15(8):174. https://doi.org/10.2196/jmir.2534

Nair RR, Mathew J, Muraleedharan V, Deepa Kanmani S (2019) Study of machine learning techniques for sentiment analysis. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), IEEE, pp 978–984. https://doi.org/10.1109/ICCMC.2019.8819763

Nassif AB, Elnagar A, Shahin I, Henno S (2021) Deep learning for Arabic subjective sentiment analysis: challenges and research opportunities. Appl Soft Comput 98:6836. https://doi.org/10.1016/j.asoc.2020.106836

Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41(16):7653–7670. https://doi.org/10.1016/j.eswa.2014.06.009

Nejat B, Carenini G, Ng R (2017) Exploring joint neural model for sentence level discourse parsing and sentiment analysis. In: Proceedings of the 18th annual sigdial meeting on discourse and dialogue, association for computational linguistics, pp 289–298. https://doi.org/10.18653/v1/w17-5535

Nhlabano VV, Lutu PEN (2018). Impact of text pre-processing on the performance of sentiment analysis models for social media data. In: 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (IcABCD), IEEE, pp 1–6. https://doi.org/10.1109/ICABCD.2018.8465135

Nicholls C, Song F (2010) Comparison of feature selection methods for sentiment analysis. In: Canadian conference on artificial intelligence, Springer, Berlin, Heidelberg, pp 286–289. https://doi.org/10.1007/978-3-319-96292-4_21

Nielsen FA (2011) A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs. In: Proceedings of the ESWC2011 workshop on “Making Sense of Microposts”: big things come in small packages, Heraklion, Crete, Greece: CEUR-WS, Aachen, pp 93–98. https://doi.org/10.48550/arXiv.1103.2903

Obiedat R, Al-Darras D, Alzaghoul E, Harfoushi O (2021) Arabic aspect-based sentiment analysis: a systematic literature review. IEEE Access 9:152628–152645. https://doi.org/10.1109/ACCESS.2021.3127140

Ombabi AH, Ouarda W, Alimi AM (2020) Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min 10(1):1–13. https://doi.org/10.1007/s13278-020-00668-1

Oueslati O, Cambria E, Ben HM, Ounelli H (2020) A review of sentiment analysis research in Arabic language. Futur Gener Comput Syst 112:408–430. https://doi.org/10.1016/j.future.2020.05.034

Ouyang X, Zhou P, Li CH, Liu L (2015) Sentiment Analysis Using Convolutional Neural Network. In: 2015 IEEE International conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing, IEEE, p 2359–2364. https://doi.org/10.1109/CIT/IUCC/DASC/PICOM.2015.349

Pecore S, Villaneau J (2019) Complex and Precise Movie and Book Annotations in French Language for Aspect Based Sentiment Analysis. In: LREC 2018—11th International conference on language resources and evaluation, European Language Resources Association (ELRA), p 2647–2652. https://aclanthology.org/L18-1419

Peng H, Cambria E, Hussain A (2017) A review of sentiment analysis research in Chinese language. Cogn Comput 9(4):423–435. https://doi.org/10.1007/s12559-017-9470-8

Peng H, Ma Y, Li Y, Cambria E (2018) Learning multi-grained aspect target sequence for Chinese sentiment analysis. Knowl Based Syst 148:167–176. https://doi.org/10.1016/j.knosys.2018.02.034

Pereira DA (2021) A survey of sentiment analysis in the Portuguese language. Artif Intell Rev 54(2):1087–1115. https://doi.org/10.1007/s10462-020-09870-1

Perianes-Rodriguez A, Waltman L, van Eck NJ (2016) Constructing bibliometric networks: a comparison between full and fractional counting. J Informetr 10(4):1178–1195. https://doi.org/10.1016/j.joi.2016.10.006

Persson O (2017) BibExcel [Software]. Available from https://homepage.univie.ac.at/juan.gorraiz/bibexcel/

Persson O, Danell R, Schneider JW (2009) How to Use Bibexcel for Various Types of Bibliometric Analysis. In: Celebrating scholarly communication studies: a festschrift for Olle Persson at his 60th birthday, ed. J. Schneider F. Åström, R. Danell, B. Larsen. Leuven, Belgium: International Society for Scientometrics and Informetrics, pp 9–24

Picasso A, Merello S, Ma Y, Oneto L, Cambria E (2019) Technical analysis and sentiment embeddings for market trend prediction. Expert Syst Appl 135:60–70. https://doi.org/10.1016/j.eswa.2019.06.014

Piryani R, Madhavi D, Singh VK (2017) Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Inf Process Manag 53(1):122–150. https://doi.org/10.1016/j.ipm.2016.07.001

Piryani R, Piryani B, Singh VK, Pinto D (2020) Sentiment analysis in Nepali: exploring machine learning and lexicon-based approaches. J Intell Fuzzy Syst 39(2):2201–2212. https://doi.org/10.3233/JIFS-179884

Plaza-del-Arco FM, Martín-Valdivia MT, Ureña-López LA, Mitkov R (2020) Improved emotion recognition in spanish social media through incorporation of lexical knowledge. Futur Gener Comput Syst 110:1000–1008. https://doi.org/10.1016/j.future.2019.09.034

Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl Based Syst 108:42–49. https://doi.org/10.1016/j.knosys.2016.06.009

Prabha MI, Srikanth GU (2019). Survey of Sentiment Analysis Using Deep Learning Techniques. In: 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), IEEE, p 1–9. https://doi.org/10.1109/ICIICT1.2019.8741438

Prabhat A, Khullar V (2017). Sentiment Classification on Big Data Using Naïve Bayes and Logistic Regression. In: 2017 International Conference on Computer Communication and Informatics (ICCCI), IEEE, p 1–5. https://doi.org/10.1109/ICCCI.2017.8117734

Preethi PG, Uma V, Kumar A (2015) Temporal sentiment analysis and causal rules extraction from Tweets for event prediction. Procedia Comput Sci 48:84–89. https://doi.org/10.1016/j.procs.2015.04.154

Qasem M, Thulasiram R, Thulasiram P (2015) Twitter Sentiment Classification Using Machine Learning Techniques for Stock Markets. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, p 834–840. https://doi.org/10.1109/ICACCI.2015.7275714

Qazi A, Fayaz H, Wadi A, Raj RG, Rahim NA, Khan WA (2015) The artificial neural network for solar radiation prediction and designing solar systems: a systematic literature review. J Clean Prod 104:1–12. https://doi.org/10.1016/j.jclepro.2015.04.041

Qazi A, Raj RG, Hardaker G, Standing C (2017) A systematic literature review on opinion types and sentiment analysis techniques: tasks and challenges. Internet Res 27(3):608–630. https://doi.org/10.1108/IntR-04-2016-0086

Raghuvanshi N, Patil JM (2016) A Brief Review on Sentiment Analysis. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), IEEE, p 2827–2831. https://doi.org/10.1109/ICEEOT.2016.7755213

Rambocas M, Pacheco BG (2018) Online sentiment analysis in marketing research: a review. J Res Interact Mark 12(2):146–163. https://doi.org/10.1108/JRIM-05-2017-0030

Rao G, Gu X, Feng Z, Cong Q, Zhang L (2021) A Novel Joint Model with Second-Order Features and Matching Attention for Aspect-Based Sentiment Analysis. In: 2021 International Joint Conference on Neural Networks (IJCNN), IEEE, p 1–8. https://doi.org/10.1109/IJCNN52387.2021.9534321

Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst 89:14–46. https://doi.org/10.1016/j.knosys.2015.06.015

Rotta R, Noack A (2011) Multilevel local search algorithms for modularity clustering. ACM J Exp Algorithmics 16(2):1–27. https://doi.org/10.1145/1963190.1970376

Article MATH MathSciNet Google Scholar

Sadamitsu K, Sekine S, Yamamoto M (2008) Sentiment Analysis Based on Probabilistic Models Using Inter-Sentence Information. In: Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), p 2892–2896. http://www.lrec-conf.org/proceedings/lrec2008/pdf/736_paper.pdf

Salur MU, Aydin I (2020) A novel hybrid deep learning model for sentiment classification. IEEE Access 8:58080–58093. https://doi.org/10.1109/ACCESS.2020.2982538

Sánchez-Rada JF, Iglesias CA (2019) Social context in sentiment analysis: formal definition, overview of current trends and framework for comparison. Inf Fusion 52:344–356. https://doi.org/10.1016/j.inffus.2019.05.003

Santos R, Costa AA, Silvestre JD, Pyl L (2019) Informetric analysis and review of literature on the role of BIM in sustainable construction. Autom Constr 103:221–234. https://doi.org/10.1016/j.autcon.2019.02.022

Sari IC, Ruldeviyani Y (2020) Sentiment Analysis of the Covid-19 Virus Infection in Indonesian Public Transportation on Twitter Data: A Case Study of Commuter Line Passengers. In: 2020 International Workshop on Big Data and Information Security (IWBIS), IEEE, pp 23–28. https://doi.org/10.1109/IWBIS50925.2020.9255531

Sarsam SM, Al-Samarraie H, Alzahrani AI, Wright B (2020) Sarcasm detection using machine learning algorithms in Twitter: a systematic review. Int J Mark Res 62(5):578–598. https://doi.org/10.1177/1470785320921779

Sayed AA, Elgeldawi E, Zaki AM, Galal AR (2020) Sentiment Analysis for Arabic Reviews Using Machine Learning Classification Algorithms. In: 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), IEEE, p 56–63. https://doi.org/10.1109/ITCE48509.2020.9047822

Schouten K, Frasincar F (2015) Survey on aspect-level sentiment analysis. IEEE Trans Knowl Data Eng 28(3):813–830. https://doi.org/10.1109/TKDE.2015.2485209

Schuller B, Mousa AED, Vryniotis V (2015) Sentiment analysis and opinion mining: on optimal parameters and performances. Wiley Interdiscip Rev 5(5):255–263. https://doi.org/10.1002/widm.1159

Serrano-Guerrero J, Romero FP, Olivas JA (2021) Fuzzy logic applied to opinion mining: a review. Knowl Based Syst 222:107018. https://doi.org/10.1016/j.knosys.2021.107018

Sharma S, Jain A (2020) Role of sentiment analysis in social media security and analytics. Wiley Interdiscip Rev 10(5):e1366. https://doi.org/10.1002/widm.1366

Shirsat VS, Jagdale RS, Deshmukh SN (2018) Document Level Sentiment Analysis from News Articles. In: 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA), IEEE, pp 1–4. https://doi.org/10.1109/ICCUBEA.2017.8463638

Shofiya C, Abidi S (2021) Sentiment analysis on Covid-19-related social distancing in Canada using Twitter data. Int J Environ Res Public Health 18(11):5993. https://doi.org/10.3390/ijerph18115993

Singh RK, Sachan MK, Patel RB (2021) 360 Degree view of cross-domain opinion classification: a survey. Artif Intell Rev 54(2):1385–1506. https://doi.org/10.1007/s10462-020-09884-9

Singh T, Kumari M (2016) Role of text pre-processing in twitter sentiment analysis. Procedia Comput Sci 89:549–554. https://doi.org/10.1016/j.procs.2016.06.095

Smailović J, Grčar M, Lavrač N, Žnidaršič M (2014) Stream-based active learning for sentiment analysis in the financial domain. Inf Sci 285(1):181–203. https://doi.org/10.1016/j.ins.2014.04.034

Smetanin S (2020) The applications of sentiment analysis for Russian language texts: current challenges and future perspectives. IEEE Access 8:110693–110719. https://doi.org/10.1109/ACCESS.2020.3002215

Stemler S (2000) An overview of content analysis. Pract Assess Res Eval 7(1):1–16. https://doi.org/10.1362/146934703771910080

Sutoyo E, Rifai AP, Risnumawan A, Saputra M (2022) A comparison of text weighting schemes on sentiment analysis of government policies: a case study of replacement of national examinations. Multimed Tools Appl 81(5):6413–6431. https://doi.org/10.1007/s11042-022-11900-9

Syed AZ, Aslam M, Martinez-Enriquez AM (2010) Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits. In: Mexican international conference on artificial intelligence, Springer, Berlin, Heidelberg, pp 32–43. https://doi.org/10.1007/978-3-642-16761-4_4

Taboada M (2016) Sentiment analysis: an overview from linguistics. Annu Rev Linguist 2:325–347. https://doi.org/10.1146/annurev-linguistics-011415-040518

Tai KS, Socher R, Manning CD (2015) Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks. In: Proceedings of the 53rd Annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, association for computational linguistics, pp 1556–1566. https://doi.org/10.3115/v1/p15-1150

Tammina S (2020) A Hybrid Learning Approach for Sentiment Classification in Telugu Language. In: 2020 International conference on Artificial Intelligence and Signal Processing (AISP), IEEE, p 1–6. https://doi.org/10.1109/AISP48273.2020.9073109

Tan S, Cheng X, Wang Y, Xu H (2009) Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis. In: European Conference on Information Retrieval, Springer, Berlin, Heidelberg, p 337–349. https://doi.org/10.1007/978-3-642-00958-7_31

Tan X, Cai Y, Xu J, Leung H-F, Chen W, Li Q (2020) Improving aspect-based sentiment analysis via aligning aspect embedding. Neurocomputing 383:336–347. https://doi.org/10.1016/j.neucom.2019.12.035

Tembhurne JV, Diwan T (2021) Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks. Multimed Tools Appl 80(5):6871–6910. https://doi.org/10.1007/s11042-020-10037-x

Thakur RK, Deshpande MV (2019) Kernel optimized-support vector machine and mapreduce framework for sentiment classification of train reviews. Int J Uncertain Fuzziness Knowl Based Syst 27(6):1025–1050. https://doi.org/10.1142/S0218488519500454

Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inform Sci Technol 63(1):163–173. https://doi.org/10.1002/asi.21662

Thet TT, Na JC, Khoo CSG (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(6):823–848. https://doi.org/10.1177/0165551510388123

Trilla A, Alías F (2009) Sentiment Classification in English from Sentence-Level Annotations of Emotions Regarding Models of Affect. In: 10th Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), p 516–519. https://doi.org/10.21437/interspeech.2009-189

Trisna KW, Jie HJ (2022) Deep learning approach for aspect-based sentiment classification: a comparative review. Appl Artif Intell. https://doi.org/10.1080/08839514.2021.2014186

Valverde-Albacete FJ, Carrillo-de-Albornoz J, Peláez-Moreno C (2013) A Proposal for New Evaluation Metrics and Result Visualization Technique for Sentiment Analysis Tasks. In: International conference of the cross-language evaluation forum for European languages, Springer, Berlin, Heidelberg, p 41–52. https://doi.org/10.1007/978-3-642-40802-1_5

Van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2):523–538. https://doi.org/10.1007/s11192-009-0146-3

Verma S (2022) Sentiment analysis of public services for smart society: literature review and future research directions. Gov Inf Quart 39(3):101708. https://doi.org/10.1016/j.giq.2022.101708

Waila P, Marisha S, Singh VK, Singh MK (2012) Evaluating Machine Learning and Unsupervised Semantic Orientation Approaches for Sentiment Analysis of Textual Reviews. In: 2012 IEEE International conference on computational intelligence and computing research, IEEE, pp 1–6. https://doi.org/10.1109/ICCIC.2012.6510235

Waltman L, Van Eck NJ (2013) A smart local moving algorithm for large-scale modularity-based community detection. Eur Phys J B 86(11):1–33. https://doi.org/10.1140/epjb/e2013-40829-0

Waltman L, Van Eck NJ, Noyons ECM (2010) A unified approach to mapping and clustering of bibliometric networks. J Inform 4(4):629–635. https://doi.org/10.1016/j.joi.2010.07.002

Wang C, Yang X, Ding L (2021) Deep learning sentiment classification based on weak tagging information. IEEE Access 9:66509–66518. https://doi.org/10.1109/ACCESS.2021.3077059

Wang L, Wan Y (2011) Sentiment Classification of Documents Based on Latent Semantic Analysis. In: International conference on computer education, simulation and modeling, Springer, Berlin, Heidelberg, p 356–361. https://doi.org/10.1007/978-3-642-21802-6_57

Wang T, Lu K, Chow KP, Zhu Q (2020a) COVID-19 sensing: negative sentiment analysis on social media in China via BERT model. IEEE Access 8:138162–138169. https://doi.org/10.1109/ACCESS.2020.3012595

Wang Z, Chong CS, Lan L, Yang Y, Ho S-B, Tong JC (2016) Fine-Grained Sentiment Analysis of Social Media with Emotion Sensing. In: 2016 Future Technologies Conference (FTC), IEEE, pp 1361–1364. https://doi.org/10.1109/FTC.2016.7821783

Wang Z, Ho S-B, Cambria E (2020b) A review of emotion sensing: categorization models and algorithms. Multimed Tools Appl 79(47):35553–35582. https://doi.org/10.1007/s11042-019-08328-z

Wang Z, Ho S-B, Cambria E (2020c) Multi-level fine-scaled sentiment sensing with ambivalence handling. Int J Uncertain Fuzziness Knowl-Based Syst 28(4):683–697. https://doi.org/10.1142/S0218488520500294

Wang Z, Lin Z (2020) Optimal feature selection for learning-based algorithms for sentiment classification. Cogn Comput 12(1):238–248. https://doi.org/10.1007/s12559-019-09669-5

Wang Z, Tong VJC, Chan D (2014) Issues of Social Data Analytics with a New Method for Sentiment Analysis of Social Media Data. In: 2014 IEEE 6th International conference on cloud computing technology and science, IEEE, pp 899–904. https://doi.org/10.1109/CloudCom.2014.40

Wang ZY, Li G, Li CY, Li A (2012) Research on the semantic-based co-word analysis. Scientometrics 90(3):855–875. https://doi.org/10.1007/s11192-011-0563-y

Wankhade M, Rao ACS, Kulkarni C (2022) A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 55:5731–5780. https://doi.org/10.1007/s10462-022-10144-1

Xing FZ, Cambria E, Welsch RE (2018) Natural language based financial forecasting: a survey. Artif Intell Rev 50(1):49–73. https://doi.org/10.1007/s10462-017-9588-9

Xing FZ, Pallucchini F, Cambria E (2019) Cognitive-inspired domain adaptation of sentiment lexicons. Inf Process Manage 56(3):554–564. https://doi.org/10.1016/j.ipm.2018.11.002

Xiong Z, Qin K, Yang H, Luo G (2021) Learning Chinese word representation better by cascade morphological N-Gram. Neural Comput Appl 33(8):3757–3768. https://doi.org/10.1007/s00521-020-05198-7

Yang B, Shao B, Wu L, Lin X (2022) Multimodal sentiment analysis with unidirectional modality translation. Neurocomputing 467:130–137. https://doi.org/10.1016/j.neucom.2021.09.041

Yang L, Li Y, Wang J, Sherratt RS (2020a) Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE Access 8:23522–23530. https://doi.org/10.1109/ACCESS.2020.2969854

Yang M, Qu Q, Shen Y, Lei K, Zhu J (2020b) Cross-domain aspect/sentiment-aware abstractive review summarization by combining topic modeling and deep reinforcement learning. Neural Comput Appl 32(11):6421–6433. https://doi.org/10.1007/s00521-018-3825-2

Yi J, Niblack W (2005) Sentiment Mining in WebFountain. In: 21st International Conference on Data Engineering (ICDE’05), IEEE, p 1073–1083. https://doi.org/10.1109/ICDE.2005.132

Yin H, Yang S, Li J (2020) Detecting Topic and Sentiment Dynamics Due to COVID-19 Pandemic Using Social Media. In: International conference on advanced data mining and applications, Springer, Cham, p 610–623. https://doi.org/10.1007/978-3-030-65390-3_46

You L, Li Y, Wang Y, Zhang J, Yang Y (2016) A deep learning-based RNNs model for automatic security audit of short messages. In: 2016 16th International Symposium on Communications and Information Technologies (ISCIT), IEEE, p 225–229. https://doi.org/10.1109/ISCIT.2016.7751626

You T, Yoon J, Kwon O-H, Jung W-S (2021) Tracing the evolution of physics with a keyword co-occurrence network. J Korean Phys Soc 78(3):236–243. https://doi.org/10.1007/s40042-020-00051-5

Yu J, Jiang J, Xia R (2019) Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans Audio Speech Lang Process 28:429–439. https://doi.org/10.1109/TASLP.2019.2957872

Yuan JH, Wu Y, Lu X, Zhao YY, Qin B, Liu T (2020) Recent advances in deep learning based sentiment analysis. Sci China Technol Sci 63(10):1947–1970. https://doi.org/10.1007/s11431-020-1634-3

Yue L, Chen W, Li X, Zuo W, Yin M (2019) A survey of sentiment analysis in social media. Knowl Inf Syst 60(2):617–663. https://doi.org/10.1007/s10115-018-1236-4

Yurtalan G, Koyuncu M, Turhan Ç (2019) A polarity calculation approach for lexicon-based Turkish sentiment analysis. Turk J Electr Eng Comput Sci 27(2):1325–1339. https://doi.org/10.3906/elk-1803-92

Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev 8(4):e1253. https://doi.org/10.1002/widm.1253

Zhang Y, Du J, Ma X, Wen H, Fortino G (2021) Aspect-based sentiment analysis for user reviews. Cogn Comput 13(5):1114–1127. https://doi.org/10.1007/s12559-021-09855-4

Zhang Y, Zhang Z, Miao D, Wang J (2019) Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Inf Sci 477:55–64. https://doi.org/10.1016/j.ins.2018.10.030

Zhao N, Gao H, Wen X, Li H (2021) Combination of convolutional neural network and gated recurrent unit for aspect-based sentiment analysis. IEEE Access 9:15561–15569. https://doi.org/10.1109/ACCESS.2021.3052937

Zhou J, Ye J (2020) Sentiment analysis in education research: a review of journal publications. Interact Learn Environ. https://doi.org/10.1080/10494820.2020.1826985

Zucco C, Calabrese B, Agapito G, Guzzi PH, Cannataro M (2020) Sentiment analysis for mining texts and social networks data: methods and tools. Wiley Interdiscip Rev 10(1):e1333. https://doi.org/10.1002/widm.1333

Zunic A, Corcoran P, Spasic I (2020) Sentiment analysis in health and well-being: systematic review. JMIR Med Inform 8(1):e16023. https://doi.org/10.2196/16023

Zuo E, Zhao H, Chen B, Chen Q (2020) Context-specific heterogeneous graph convolutional network for implicit sentiment analysis. IEEE Access 8:37967–37975. https://doi.org/10.1109/ACCESS.2020.2975244

Download references

Acknowledgements

The authors would like to thank the China Scholarship Council (CSC No. 202106850069) for its support for the visiting study.

This work has not received any funding.

Author information

Authors and affiliations.

Institute of High Performance Computing, A*STAR, 1 Fusionopolis Way, Singapore, 138632, Singapore

Jingfeng Cui & Seng-Beng Ho

School of Information Management, Nanjing Agricultural University, 1 Weigang, Nanjing, 210095, China

Jingfeng Cui

School of Computing and Information Systems, Singapore Management University, 80 Stamford Rd, Singapore, 178902, Singapore

Zhaoxia Wang

School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore

Erik Cambria

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaoxia Wang .

Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest or competing interest in this article.

Research involving human participants or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cui, J., Wang, Z., Ho, SB. et al. Survey on sentiment analysis: evolution of research methods and topics. Artif Intell Rev 56 , 8469–8510 (2023). https://doi.org/10.1007/s10462-022-10386-z

Download citation

Accepted : 29 December 2022

Published : 06 January 2023

Issue Date : August 2023

DOI : https://doi.org/10.1007/s10462-022-10386-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Sentiment analysis
Keyword co-occurrence analysis
Evolution analysis
Research methods
Research topics
Find a journal
Publish with us
Track your research

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

Subscribe SciFeed
Recommended Articles
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A survey of sentiment analysis: approaches, datasets, and future research.

1. Introduction

A comprehensive overview of the state-of-the-art studies on sentiment analysis, which are categorized as conventional machine learning, deep learning, and ensemble learning, with a focus on the preprocessing techniques, feature extraction methods, classification methods, and datasets used, as well as the experimental results.
An in-depth discussion of the commonly used sentiment analysis datasets and their challenges, as well as a discussion about the limitations of the current works and the potential for future research in this field.

2. Sentiment Analysis Algorithms

2.1. machine learning approach, 2.2. deep learning approach, 3. ensemble learning approach, 4. sentiment analysis datasets, 4.1. internet movie database (imdb), 4.2. twitter us airline sentiment, 4.3. sentiment140, 4.4. semeval-2017 task 4, 5. limitations and future research prospects.

Poorly Structured and Sarcastic Texts: Many sentiment analysis methods rely on structured and grammatically correct text, which can lead to inaccuracies in analyzing informal and poorly structured texts, such as social media posts, slang, and sarcastic comments. This is because the sentiments expressed in these types of texts can be subtle and require contextual understanding beyond surface-level analysis.
Coarse-Grained Sentiment Analysis: Although positive, negative, and neutral classes are commonly used in sentiment analysis, they may not capture the full range of emotions and intensities that a person can express. Fine-grained sentiment analysis, which categorizes emotions into more specific categories such as happy, sad, angry, or surprised, can provide more nuanced insights into the sentiment expressed in a text.
Lack of Cultural Awareness: Sentiment analysis models trained on data from a specific language or culture may not accurately capture the sentiments expressed in texts from other languages or cultures. This is because the use of language, idioms, and expressions can vary widely across cultures, and a sentiment analysis model trained on one culture may not be effective in analyzing sentiment in another culture.
Dependence on Annotated Data: Sentiment analysis algorithms often rely on annotated data, where humans manually label the sentiment of a text. However, collecting and labeling a large dataset can be time-consuming and resource-intensive, which can limit the scope of analysis to a specific domain or language.
Shortcomings of Word Embeddings: Word embeddings, which are a popular technique used in deep learning-based sentiment analysis, can be limited in capturing the complex relationships between words and their meanings in a text. This can result in a model that does not accurately represent the sentiment expressed in a text, leading to inaccuracies in analysis.
Bias in Training Data: The training data used to train a sentiment analysis model can be biased, which can impact the model’s accuracy and generalization to new data. For example, a dataset that is predominantly composed of texts from one gender or race can lead to a model that is biased toward that group, resulting in inaccurate predictions for texts from other groups.
Fine-Grained Sentiment Analysis: The current sentiment analysis models mainly classify the sentiment into three coarse classes: positive, negative, and neutral. However, there is a need to extend this to a fine-grained sentiment analysis, which consists of different emotional intensities, such as strongly positive, positive, neutral, negative, and strongly negative. Researchers can explore various deep learning architectures and techniques to perform fine-grained sentiment analysis. One such approach is to use hierarchical attention networks that can capture the sentiment expressed in different parts of a text at different levels of granularity.
Sentiment Quantification: Sentiment quantification is an important application of sentiment analysis. It involves computing the polarity distributions based on the topics to aid in strategic decision making. Researchers can develop more advanced models that can accurately capture the sentiment distribution across different topics. One way to achieve this is to use topic modeling techniques to identify the underlying topics in a corpus of text and then use sentiment analysis to compute the sentiment distribution for each topic.
Handling Ambiguous and Sarcastic Texts: Sentiment analysis models face challenges in accurately detecting sentiment in ambiguous and sarcastic texts. Researchers can explore the use of reinforcement learning techniques to train models that can handle ambiguous and sarcastic texts. This involves developing models that can learn from feedback and adapt their predictions accordingly.
Cross-lingual Sentiment Analysis: Currently, sentiment analysis models are primarily trained on English text. However, there is a growing need for sentiment analysis models that can work across multiple languages. Cross-lingual sentiment analysis would help to better understand the sentiment expressed in different languages, making sentiment analysis accessible to a larger audience. Researchers can explore the use of transfer learning techniques to develop sentiment analysis models that can work across multiple languages. One approach is to pretrain models on large multilingual corpora and then fine-tune them for sentiment analysis tasks in specific languages.
Sentiment Analysis in Social Media: Social media platforms generate huge amounts of data every day, making it difficult to manually process the data. Researchers can explore the use of domain-specific embeddings that are trained on social media text to improve the accuracy of sentiment analysis models. They can also develop models that can handle noisy or short social media text by incorporating contextual information and leveraging user interactions.

6. Conclusions

Author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

Ligthart, A.; Catal, C.; Tekinerdogan, B. Systematic reviews in sentiment analysis: A tertiary study. Artif. Intell. Rev. 2021 , 54 , 4997–5053. [ Google Scholar ] [ CrossRef ]
Dang, N.C.; Moreno-García, M.N.; De la Prieta, F. Sentiment analysis based on deep learning: A comparative study. Electronics 2020 , 9 , 483. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Chakriswaran, P.; Vincent, D.R.; Srinivasan, K.; Sharma, V.; Chang, C.Y.; Reina, D.G. Emotion AI-driven sentiment analysis: A survey, future research directions, and open issues. Appl. Sci. 2019 , 9 , 5462. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Jung, Y.G.; Kim, K.T.; Lee, B.; Youn, H.Y. Enhanced Naive Bayes classifier for real-time sentiment analysis with SparkR. In Proceedings of the 2016 IEEE International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2016; pp. 141–146. [ Google Scholar ]
Athindran, N.S.; Manikandaraj, S.; Kamaleshwar, R. Comparative analysis of customer sentiments on competing brands using hybrid model approach. In Proceedings of the 2018 IEEE 3rd International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 15–16 November 2018; pp. 348–353. [ Google Scholar ]
Vanaja, S.; Belwal, M. Aspect-level sentiment analysis on e-commerce data. In Proceedings of the 2018 IEEE International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 11–12 July 2018; pp. 1275–1279. [ Google Scholar ]
Iqbal, N.; Chowdhury, A.M.; Ahsan, T. Enhancing the performance of sentiment analysis by using different feature combinations. In Proceedings of the 2018 IEEE International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 8–9 February 2018; pp. 1–4. [ Google Scholar ]
Rathi, M.; Malik, A.; Varshney, D.; Sharma, R.; Mendiratta, S. Sentiment analysis of tweets using machine learning approach. In Proceedings of the 2018 IEEE Eleventh International Conference on Contemporary Computing (IC3), Noida, India, 2–4 August 2018; pp. 1–3. [ Google Scholar ]
Tariyal, A.; Goyal, S.; Tantububay, N. Sentiment Analysis of Tweets Using Various Machine Learning Techniques. In Proceedings of the 2018 IEEE International Conference on Advanced Computation and Telecommunication (ICACAT), Bhopal, India, 28–29 December 2018; pp. 1–5. [ Google Scholar ]
Hemakala, T.; Santhoshkumar, S. Advanced classification method of twitter data using sentiment analysis for airline service. Int. J. Comput. Sci. Eng. 2018 , 6 , 331–335. [ Google Scholar ] [ CrossRef ]
Rahat, A.M.; Kahir, A.; Masum, A.K.M. Comparison of Naive Bayes and SVM Algorithm based on sentiment analysis using review dataset. In Proceedings of the 2019 IEEE 8th International Conference System Modeling and Advancement in Research Trends (SMART), Moradabad, India, 22–23 November 2019; pp. 266–270. [ Google Scholar ]
Makhmudah, U.; Bukhori, S.; Putra, J.A.; Yudha, B.A.B. Sentiment Analysis of Indonesian Homosexual Tweets Using Support Vector Machine Method. In Proceedings of the 2019 IEEE International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), Jember, Indonesia, 16–17 October 2019; pp. 183–186. [ Google Scholar ]
Wongkar, M.; Angdresey, A. Sentiment analysis using Naive Bayes Algorithm of the data crawler: Twitter. In Proceedings of the 2019 IEEE Fourth International Conference on Informatics and Computing (ICIC), Semarang, Indonesia, 16–17 October 2019; pp. 1–5. [ Google Scholar ]
Madhuri, D.K. A machine learning based framework for sentiment classification: Indian railways case study. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 2019 , 8 , 441–445. [ Google Scholar ]
Gupta, A.; Singh, A.; Pandita, I.; Parashar, H. Sentiment analysis of Twitter posts using machine learning algorithms. In Proceedings of the 2019 IEEE 6th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 13–15 March 2019; pp. 980–983. [ Google Scholar ]
Prabhakar, E.; Santhosh, M.; Krishnan, A.H.; Kumar, T.; Sudhakar, R. Sentiment analysis of US Airline Twitter data using new AdaBoost approach. Int. J. Eng. Res. Technol. (IJERT) 2019 , 7 , 1–6. [ Google Scholar ]
Hourrane, O.; Idrissi, N. Sentiment Classification on Movie Reviews and Twitter: An Experimental Study of Supervised Learning Models. In Proceedings of the 2019 IEEE 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco, 3–4 October 2019; pp. 1–6. [ Google Scholar ]
AlSalman, H. An improved approach for sentiment analysis of arabic tweets in twitter social media. In Proceedings of the 2020 IEEE 3rd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, 19–21 March 2020; pp. 1–4. [ Google Scholar ]
Saad, A.I. Opinion Mining on US Airline Twitter Data Using Machine Learning Techniques. In Proceedings of the 2020 IEEE 16th International Computer Engineering Conference (ICENCO), Cairo, Egypt, 29–30 December 2020; pp. 59–63. [ Google Scholar ]
Alzyout, M.; Bashabsheh, E.A.; Najadat, H.; Alaiad, A. Sentiment Analysis of Arabic Tweets about Violence Against Women using Machine Learning. In Proceedings of the 2021 IEEE 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021; pp. 171–176. [ Google Scholar ]
Jemai, F.; Hayouni, M.; Baccar, S. Sentiment Analysis Using Machine Learning Algorithms. In Proceedings of the 2021 IEEE International Wireless Communications and Mobile Computing (IWCMC), Harbin, China, 28 June–2 July 2021; pp. 775–779. [ Google Scholar ]
Ramadhani, A.M.; Goo, H.S. Twitter sentiment analysis using deep learning methods. In Proceedings of the 2017 IEEE 7th International Annual Engineering Seminar (InAES), Yogyakarta, Indonesia, 1–2 August 2017; pp. 1–4. [ Google Scholar ]
Demirci, G.M.; Keskin, Ş.R.; Doğan, G. Sentiment analysis in Turkish with deep learning. In Proceedings of the 2019 IEEE International Conference on Big Data, Honolulu, HI, USA, 29–31 May 2019; pp. 2215–2221. [ Google Scholar ]
Raza, G.M.; Butt, Z.S.; Latif, S.; Wahid, A. Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models. In Proceedings of the 2021 IEEE International Conference on Digital Futures and Transformative Technologies (ICoDT2), Islamabad, Pakistan, 20–21 May 2021; pp. 1–6. [ Google Scholar ]
Dholpuria, T.; Rana, Y.; Agrawal, C. A sentiment analysis approach through deep learning for a movie review. In Proceedings of the 2018 IEEE 8th International Conference on Communication Systems and Network Technologies (CSNT), Bhopal, India, 24–26 November 2018; pp. 173–181. [ Google Scholar ]
Harjule, P.; Gurjar, A.; Seth, H.; Thakur, P. Text classification on Twitter data. In Proceedings of the 2020 IEEE 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE), Jaipur, India, 7–8 February 2020; pp. 160–164. [ Google Scholar ]
Uddin, A.H.; Bapery, D.; Arif, A.S.M. Depression Analysis from Social Media Data in Bangla Language using Long Short Term Memory (LSTM) Recurrent Neural Network Technique. In Proceedings of the 2019 IEEE International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 11–12 July 2019; pp. 1–4. [ Google Scholar ]
Alahmary, R.M.; Al-Dossari, H.Z.; Emam, A.Z. Sentiment analysis of Saudi dialect using deep learning techniques. In Proceedings of the 2019 IEEE International Conference on Electronics, Information, and Communication (ICEIC), Auckland, New Zealand, 22–25 January 2019; pp. 1–6. [ Google Scholar ]
Yang, Y. Convolutional neural networks with recurrent neural filters. arXiv 2018 , arXiv:1808.09315. [ Google Scholar ]
Goularas, D.; Kamis, S. Evaluation of deep learning techniques in sentiment analysis from Twitter data. In Proceedings of the 2019 IEEE International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), Istanbul, Turkey, 26–28 August 2019; pp. 12–17. [ Google Scholar ]
Hossain, N.; Bhuiyan, M.R.; Tumpa, Z.N.; Hossain, S.A. Sentiment analysis of restaurant reviews using combined CNN-LSTM. In Proceedings of the 2020 IEEE 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–5. [ Google Scholar ]
Tyagi, V.; Kumar, A.; Das, S. Sentiment Analysis on Twitter Data Using Deep Learning approach. In Proceedings of the 2020 IEEE 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 18–19 December 2020; pp. 187–190. [ Google Scholar ]
Rhanoui, M.; Mikram, M.; Yousfi, S.; Barzali, S. A CNN-BiLSTM model for document-level sentiment analysis. Mach. Learn. Knowl. Extr. 2019 , 1 , 832–847. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Jang, B.; Kim, M.; Harerimana, G.; Kang, S.U.; Kim, J.W. Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Appl. Sci. 2020 , 10 , 5841. [ Google Scholar ] [ CrossRef ]
Chundi, R.; Hulipalled, V.R.; Simha, J. SAEKCS: Sentiment analysis for English–Kannada code switchtext using deep learning techniques. In Proceedings of the 2020 IEEE International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE), Bengaluru, India, 10–11 July 2020; pp. 327–331. [ Google Scholar ]
Thinh, N.K.; Nga, C.H.; Lee, Y.S.; Wu, M.L.; Chang, P.C.; Wang, J.C. Sentiment Analysis Using Residual Learning with Simplified CNN Extractor. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 335–3353. [ Google Scholar ]
Janardhana, D.; Vijay, C.; Swamy, G.J.; Ganaraj, K. Feature Enhancement Based Text Sentiment Classification using Deep Learning Model. In Proceedings of the 2020 IEEE 5th International Conference on Computing, Communication and Security (ICCCS), Bihar, India, 14–16 October 2020; pp. 1–6. [ Google Scholar ]
Chowdhury, S.; Rahman, M.L.; Ali, S.N.; Alam, M.J. A RNN Based Parallel Deep Learning Framework for Detecting Sentiment Polarity from Twitter Derived Textual Data. In Proceedings of the 2020 IEEE 11th International Conference on Electrical and Computer Engineering (ICECE), Dhaka, Bangladesh, 17–19 December 2020; pp. 9–12. [ Google Scholar ]
Vimali, J.; Murugan, S. A Text Based Sentiment Analysis Model using Bi-directional LSTM Networks. In Proceedings of the 2021 IEEE 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 8–10 July 2021; pp. 1652–1658. [ Google Scholar ]
Anbukkarasi, S.; Varadhaganapathy, S. Analyzing Sentiment in Tamil Tweets using Deep Neural Network. In Proceedings of the 2020 IEEE Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 11–13 March 2020; pp. 449–453. [ Google Scholar ]
Kumar, D.A.; Chinnalagu, A. Sentiment and Emotion in Social Media COVID-19 Conversations: SAB-LSTM Approach. In Proceedings of the 2020 IEEE 9th International Conference System Modeling and Advancement in Research Trends (SMART), Moradabad, India, 4–5 December 2020; pp. 463–467. [ Google Scholar ]
Hossen, M.S.; Jony, A.H.; Tabassum, T.; Islam, M.T.; Rahman, M.M.; Khatun, T. Hotel review analysis for the prediction of business using deep learning approach. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; pp. 1489–1494. [ Google Scholar ]
Younas, A.; Nasim, R.; Ali, S.; Wang, G.; Qi, F. Sentiment Analysis of Code-Mixed Roman Urdu-English Social Media Text using Deep Learning Approaches. In Proceedings of the 2020 IEEE 23rd International Conference on Computational Science and Engineering (CSE), Dubai, United Arab Emirates, 12–13 December 2020; pp. 66–71. [ Google Scholar ]
Dhola, K.; Saradva, M. A Comparative Evaluation of Traditional Machine Learning and Deep Learning Classification Techniques for Sentiment Analysis. In Proceedings of the 2021 IEEE 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Uttar Pradesh, India, 28–29 January 2021; pp. 932–936. [ Google Scholar ]
Tan, K.L.; Lee, C.P.; Anbananthen, K.S.M.; Lim, K.M. RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis with Transformer and Recurrent Neural Network. IEEE Access 2022 , 10 , 21517–21525. [ Google Scholar ] [ CrossRef ]
Kokab, S.T.; Asghar, S.; Naz, S. Transformer-based deep learning models for the sentiment analysis of social media data. Array 2022 , 14 , 100157. [ Google Scholar ] [ CrossRef ]
AlBadani, B.; Shi, R.; Dong, J.; Al-Sabri, R.; Moctard, O.B. Transformer-based graph convolutional network for sentiment analysis. Appl. Sci. 2022 , 12 , 1316. [ Google Scholar ] [ CrossRef ]
Tiwari, D.; Nagpal, B. KEAHT: A knowledge-enriched attention-based hybrid transformer model for social sentiment analysis. New Gener. Comput. 2022 , 40 , 1165–1202. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Tesfagergish, S.G.; Kapočiūtė-Dzikienė, J.; Damaševičius, R. Zero-shot emotion detection for semi-supervised sentiment analysis using sentence transformers and ensemble learning. Appl. Sci. 2022 , 12 , 8662. [ Google Scholar ] [ CrossRef ]
Maghsoudi, A.; Nowakowski, S.; Agrawal, R.; Sharafkhaneh, A.; Kunik, M.E.; Naik, A.D.; Xu, H.; Razjouyan, J. Sentiment Analysis of Insomnia-Related Tweets via a Combination of Transformers Using Dempster-Shafer Theory: Pre–and Peri–COVID-19 Pandemic Retrospective Study. J. Med Internet Res. 2022 , 24 , e41517. [ Google Scholar ] [ CrossRef ]
Jing, H.; Yang, C. Chinese text sentiment analysis based on transformer model. In Proceedings of the 2022 IEEE 3rd International Conference on Electronic Communication and Artificial Intelligence (IWECAI), Sanya, China, 14–16 January 2022; pp. 185–189. [ Google Scholar ]
Alrehili, A.; Albalawi, K. Sentiment analysis of customer reviews using ensemble method. In Proceedings of the 2019 IEEE International Conference on Computer and Information Sciences (ICCIS), Aljouf, Saudi Arabia, 3–4 April 2019; pp. 1–6. [ Google Scholar ]
Bian, W.; Wang, C.; Ye, Z.; Yan, L. Emotional Text Analysis Based on Ensemble Learning of Three Different Classification Algorithms. In Proceedings of the 2019 IEEE 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Metz, France, 18–21 September 2019; Volume 2, pp. 938–941. [ Google Scholar ]
Gifari, M.K.; Lhaksmana, K.M.; Dwifebri, P.M. Sentiment Analysis on Movie Review using Ensemble Stacking Model. In Proceedings of the 2021 IEEE International Conference Advancement in Data Science, E-learning and Information Systems (ICADEIS), Bali, Indonesia, 13–14 October 2021; pp. 1–5. [ Google Scholar ]
Parveen, R.; Shrivastava, N.; Tripathi, P. Sentiment Classification of Movie Reviews by Supervised Machine Learning Approaches Using Ensemble Learning & Voted Algorithm. In Proceedings of the IEEE 2nd International Conference on Data, Engineering and Applications (IDEA), Bhopal, India, 28–29 February 2020; pp. 1–6. [ Google Scholar ]
Aziz, R.H.H.; Dimililer, N. Twitter Sentiment Analysis using an Ensemble Weighted Majority Vote Classifier. In Proceedings of the 2020 IEEE International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, 23–24 December 2020; pp. 103–109. [ Google Scholar ]
Varshney, C.J.; Sharma, A.; Yadav, D.P. Sentiment analysis using ensemble classification technique. In Proceedings of the 2020 IEEE Students Conference on Engineering & Systems (SCES), Prayagraj, India, 10–12 July 2020; pp. 1–6. [ Google Scholar ]
Athar, A.; Ali, S.; Sheeraz, M.M.; Bhattachariee, S.; Kim, H.C. Sentimental Analysis of Movie Reviews using Soft Voting Ensemble-based Machine Learning. In Proceedings of the 2021 IEEE Eighth International Conference on Social Network Analysis, Management and Security (SNAMS), Gandia, Spain, 6–9 December 2021; pp. 1–5. [ Google Scholar ]
Nguyen, H.Q.; Nguyen, Q.U. An ensemble of shallow and deep learning algorithms for Vietnamese Sentiment Analysis. In Proceedings of the 2018 IEEE 5th NAFOSTED Conference on Information and Computer Science (NICS), Ho Chi Minh City, Vietnam, 23–24 November 2018; pp. 165–170. [ Google Scholar ]
Kamruzzaman, M.; Hossain, M.; Imran, M.R.I.; Bakchy, S.C. A Comparative Analysis of Sentiment Classification Based on Deep and Traditional Ensemble Machine Learning Models. In Proceedings of the 2021 IEEE International Conference on Science & Contemporary Technologies (ICSCT), Dhaka, Bangladesh, 5–7 August 2021; pp. 1–5. [ Google Scholar ]
Al Wazrah, A.; Alhumoud, S. Sentiment Analysis Using Stacked Gated Recurrent Unit for Arabic Tweets. IEEE Access 2021 , 9 , 137176–137187. [ Google Scholar ] [ CrossRef ]
Tan, K.L.; Lee, C.P.; Lim, K.M.; Anbananthen, K.S.M. Sentiment Analysis with Ensemble Hybrid Deep Learning Model. IEEE Access 2022 , 10 , 103694–103704. [ Google Scholar ] [ CrossRef ]
Maas, A.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the IEEE 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 142–150. [ Google Scholar ]
Go, A.; Bhayani, R.; Huang, L. Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanf. 2009 , 1 , 2009. [ Google Scholar ]
Rosenthal, S.; Farra, N.; Nakov, P. SemEval-2017 task 4: Sentiment analysis in Twitter. arXiv 2019 , arXiv:1912.00741. [ Google Scholar ]

Click here to enlarge figure

Literature	Features	Classifier	Dataset	Accuracy (%)
Jung et al. (2016) [ ]		MNB	Sentiment140	85
Athindran et al. (2018) [ ]		NB	Self-collected dataset (from Tweets)	77
Vanaja et al. (2018) [ ]	A priori algorithm	NB, SVM	Self-collected dataset (from Amazon)	83.42
Iqbal et al. (2018) [ ]	Unigram, Bigram	NB, SVM, ME	IMDb	88
			Sentiment140	90
Rathi et al. (2018) [ ]	TF-IDF	DT	Sentiment140, Polarity Dataset, and University of Michigan dataset	84
		AdaBoost		67
		SVM		82
Hemakala and Santhoshkumar (2018) [ ]		AdaBoost	Indian Airlines	84.5
Tariyal et al. (2018) [ ]		Regression Tree	Own dataset	88.99
Rahat et al. (2019) [ ]		SVC	Airline review	82.48
		MNB		76.56
Makhmudah et al. (2019) [ ]	TF-IDF	SVM	Tweets related to homosexuals	99.5
Wongkar and Angdresey (2019) [ ]		NB	Twitter (2019 presidential candidates of the Republic of Indonesia)	75.58
Madhuri (2019) [ ]		SVM	Twitter (Indian Railways)	91.5
Gupta et al. (2019) [ ]	TF-IDF	Neural Network	Sentiment140	80
Prabhakar et al. (2019) [ ]		AdaBoost (Bagging and Boosting)	Skytrax and Twitter (Airlines)	68 F-score
Hourrane et al. (2019) [ ]	TF-IDF	Ridge Classifier	IMDb	90.54
			Sentiment 140	76.84
Alsalman (2020) [ ]	TF-IDF	MNB	Arabic Tweets	87.5
Saad et al. (2020) [ ]	Bag of Words	SVM	Twitter US Airline Sentiment	83.31
Alzyout et al. (2021) [ ]	TF-IDF	SVM	Self-collected dataset	78.25
Jemai et al. (2021) [ ]		NB	NLTK corpus	99.73

Literature	Embedding	Classifier	Dataset	Accuracy (%)
Ramadhani et al. (2017) [ ]		MLP	Korean and English Tweets	75.03
Demirci et al. (2019) [ ]	word2vec	MLP	Turkish Tweets	81.86
Raza et al. (2021) [ ]	Count Vectorizer and TF-IDF Vectorizer	MLP	COVID-19 reviews	93.73
Dholpuria et al. (2018) [ ]		CNN	IMDb (3000 reviews)	99.33
Harjule et al. (2020) [ ]		LSTM	Twitter US Airline Sentiment	82
			Sentiment140	66
Uddin et al. (2019) [ ]		LSTM	Bangla Tweets	86.3
Alahmary and Al-Dossari (2018) [ ]	word2vec	BiLSTM	Saudi dialect Tweets	94
Yang (2018) [ ]	GloVe	Recurrent neural filter-based CNN and LSTM	Stanford Sentiment Treebank	53.4
Goularas and Kamis (2019) [ ]	word2vec and GloVe	CNN and LSTM	Tweets from semantic evaluation	59
Hossain and Bhuiyan (2019) [ ]	word2vec	CNN and LSTM	Foodpanda and Shohoz Food	75.01
Tyagi et al. (2020) [ ]	GloVe	CNN and BiLSTM	Sentiment140	81.20
Rhanoui et al. (2019) [ ]	doc2vec	CNN and BiLSTM	French articles and international news	90.66
Jang et al. (2020) [ ]	word2vec	hybrid CNN and BiLSTM	IMDb	90.26
Chundi et al. (2020) [ ]		Convolutional BiLSTM	English, Kannada, and a mixture of both languages	77.6
Thinh et al. (2019) [ ]		1D-CNN with GRU	IMDb	90.02
Janardhana et al. (2020) [ ]	GloVe	Convolutional RNN	Movie reviews	84
Chowdhury et al. (2020) [ ]	word2vec, GloVe, and sentiment-specific word embedding	BiLSTM	Twitter US Airline Sentiment	81.20
Vimali and Murugan (2021) [ ]		BiLSTM	Self-collected	90.26
Anbukkarasi and Varadhaganapathy (2020) [ ]		DBLSTM	Self-collected (Tamil Tweets)	86.2
Kumar and Chinnalagu (2020) [ ]		SAB-LSTM	Self-collected	29 (POS) 50 (NEG) 21 (NEU)
Hossen et al. (2021) [ ]		LSTM	Self-collected	86
		GRU		84
Younas et al. (2020) [ ]		mBERT	Pakistan elections in 2018 (Tweets)	69
		XLM-R		71
Dhola and Saradva (2021) [ ]		BERT	Sentiment140	85.4
Tan et a. (2022) [ ]		RoBERTa-LSTM	IMDb	92.96
			Twitter US Airline Sentiment	91.37
			Sentiment140	89.70
Kokab et al. (2022) [ ]	BERT	CBRNN	US airline reviews	97
			Self-driving car reviews	90
			US presidential election reviews	96
			IMDb	93
AlBadani et al. (2022) [ ]	ST-GCN	ST-GCN	SST-B	95.43
			IMDB	94.94
			Yelp 2014	72.7
Tiwari and Nagpal (2022) [ ]	BERT	KEAHT	COVID-19 vaccine	91
			Indian Farmer Protests	81.49
Tesfagergish et al. (2022) [ ]	Zero-shot transformer	Ensemble learning	SemEval 2017	87.3
Maghsoudi et al. (2022) [ ]	Transformer	DST	Self-collected	84
Jing and Yang (2022) [ ]	Light-Transformer	Light-Transformer	NLPCC2014 Task2	76.40

Literature	Feature Extractor	Classifier	Dataset	Accuracy (%)
Alrehili et al. (2019) [ ]		NB + SVM + RF + Bagging + Boosting	Self-collected	89.4
Bian et al. (2019) [ ]	TF-IDF	LR + SVM + KNN	COVID-19 reviews	98.99
Gifari and Lhaksmana (2021) [ ]	TF-IDF	MNB + KNN + LR	IMDb	89.40
Parveen et al. (2020) [ ]		MNB + BNB + LR + LSVM + NSVM	Movie reviews	91
Aziz and Dimililer (2020) [ ]	TF-IDF	NB + LR + SGD + RF + DT + SVM	SemEval-2017 4A	72.95
			SemEval-2017 4B	90.8
			SemEval-2017 4C	68.89
Varshney et al. (2020) [ ]	TF-IDF	LR + NB + SGD	Sentiment140	80
Athar et al. (2021) [ ]	TF-IDF	LR + NB + XGBoost + RF + MLP	IMDb	89.9
Nguyen and Nguyen (2018) [ ]	TF-IDF, word2vec	LR + SVM + CNN + LSTM (Mean)	Vietnamese Sentiment	69.71
		LR + SVM + CNN + LSTM (Vote)	Vietnamese Sentiment Food Reviews	89.19
		LR + SVM + CNN + LSTM (Vote)	Vietnamese Sentiment	92.80
Kamruzzaman et al.(2021) [ ]	GloVe	7-Layer CNN + GRU + GloVe	Grammar and Online Product Reviews	94.19
	Attention embedding	7-Layer CNN + LSTM + Attention Layer	Restaurant Reviews	96.37
Al Wazrah and Alhumoud (2021) [ ]	AraVec	SGRU + SBi-GRU + AraBERT	Arabic Sentiment Analysis	90.21
Tan et a. (2022) [ ]		RoBERTa-LSTM + RoBERTa-BiLSTM + RoBERTa-GRU	IMDb	94.9
			Twitter US Airline Sentiment	91.77
			Sentiment140	89.81

Dataset	Classes	Strongly Positive	Positive	Neutral	Negative	Strongly Negative	Total
IMDb	2	-	25,000	-	25,000	-	50,000
Twitter US Airline Sentiment	3	-	2363	3099	9178	-	14,160
Sentiment140	2	-	800,000	-	800,000	-	1,600,000
SemEval-2017 4A	3	-	22,277	28,528	11,812	-	62,617
SemEval-2017 4B	2	-	17,414	-	7735	-	25,149
SemEval-2017 4C	5	1151	15,254	19,187	6943	476	43,011
SemEval-2017 4D	2	-	17,414	-	7735	-	25,149
SemEval-2017 4E	5	1151	15,254	19,187	6943	476	43,011

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Tan, K.L.; Lee, C.P.; Lim, K.M. A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. Appl. Sci. 2023 , 13 , 4550. https://doi.org/10.3390/app13074550

Tan KL, Lee CP, Lim KM. A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research. Applied Sciences . 2023; 13(7):4550. https://doi.org/10.3390/app13074550

Tan, Kian Long, Chin Poo Lee, and Kian Ming Lim. 2023. "A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research" Applied Sciences 13, no. 7: 4550. https://doi.org/10.3390/app13074550

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

Sentiment Analysis: Types, Tools, and Use Cases
PhD Thesis Topics in Machine Learning for Sentiment Analysis
Introduction to Sentiment Analysis: Concept, Working, and Application
Sentiment Analysis
A Quick Guide To Sentiment Analysis
Reduce noise in your data with topic driven sentiment analysis

VIDEO

Automated sentiment analysis tool for political polling and analysis
AI Text Intelligence
Sentiment Analysis: Introduction
How to visualize your Sentiment Analysis results in Power BI
CRE Distress to Ignite Opportunity in 2024
The Future of AI-driven investing: Trends to watch in 2024, 20th Feb 2024

COMMENTS

A review of sentiment analysis: tasks, applications, and deep ...
Sentiment analysis, a transformative force in natural language processing, revolutionizes diverse fields such as business, social media, healthcare, and disaster response. This review delves into the intricate landscape of sentiment analysis, exploring its significance, challenges, and evolving methodologies. We examine crucial aspects like dataset selection, algorithm choice, language ...
Sentiment Analysis Projects & Topics For Beginners [2024]
Taking up a project on sentiment analysis can be highly beneficial to both beginners and final yearv students, These projects empower them with practical skills, industry relevance, and the ability to make a worthy impact, fostering a holistic learning experience.
Sentiment Analysis | Papers With Code
Sentiment Analysis. 1334 papers with code • 39 benchmarks • 93 datasets. Sentiment Analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral".
The evolution of sentiment analysis—A review of research ...
The 167 pages contain a wide array of topics, with chapters about document, sentence and aspect-based sentiment analysis. Overall the topic is approached first by introducing the research problems of sentiment analysis and then answering them with the latest knowledge available during the writing of the book. 3.6.2. Early years – Online reviews
Survey on sentiment analysis: evolution of research methods ...
By adopting keyword co-occurrence analysis and community detection methods, we analyzed the research methods and topics of sentiment analysis, as well as their connections and evolution trends, and summarized the research hotspots and trends in sentiment analysis.
What Is Sentiment Analysis? | IBM
Sentiment analysis, or opinion mining, is the process of analyzing large volumes of text to determine whether it expresses a positive sentiment, a negative sentiment or a neutral sentiment.
Sentiment Analysis - an overview | ScienceDirect Topics
Sentiment Analysis (SA) is one of the attractive research branches of Opinion Mining (OM). The research scope of SA is the computational study of individuals’ opinions and attitudes toward entities mentioned in a text. The entities generally refer to individuals or events.
A Survey of Sentiment Analysis: Approaches, Datasets, and ...
Sentiment analysis is a critical subfield of natural language processing that focuses on categorizing text into three primary sentiments: positive, negative, and neutral.
The Evolution of Sentiment Analysis - A Review of Research ...
We present the top-20 cited papers from Google Scholar and Scopus and a taxonomy of research topics. In recent years, sentiment analysis has shifted from analyzing online product reviews to...
A Survey of Sentiment Analysis: Approaches, Datasets, and ...
Sentiment analysis is a critical subfield of natural language processing that focuses on categorizing text into three primary sentiments: positive, negative, and neutral.