research methodology analysis definition

What Is Research Methodology? A Plain-Language Explanation & Definition (With Examples)

By Derek Jansen (MBA)  and Kerryn Warren (PhD) | June 2020 (Last updated April 2023)

If you’re new to formal academic research, it’s quite likely that you’re feeling a little overwhelmed by all the technical lingo that gets thrown around. And who could blame you – “research methodology”, “research methods”, “sampling strategies”… it all seems never-ending!

In this post, we’ll demystify the landscape with plain-language explanations and loads of examples (including easy-to-follow videos), so that you can approach your dissertation, thesis or research project with confidence. Let’s get started.

Research Methodology 101

  • What exactly research methodology means
  • What qualitative , quantitative and mixed methods are
  • What sampling strategy is
  • What data collection methods are
  • What data analysis methods are
  • How to choose your research methodology
  • Example of a research methodology

Free Webinar: Research Methodology 101

What is research methodology?

Research methodology simply refers to the practical “how” of a research study. More specifically, it’s about how  a researcher  systematically designs a study  to ensure valid and reliable results that address the research aims, objectives and research questions . Specifically, how the researcher went about deciding:

  • What type of data to collect (e.g., qualitative or quantitative data )
  • Who  to collect it from (i.e., the sampling strategy )
  • How to  collect  it (i.e., the data collection method )
  • How to  analyse  it (i.e., the data analysis methods )

Within any formal piece of academic research (be it a dissertation, thesis or journal article), you’ll find a research methodology chapter or section which covers the aspects mentioned above. Importantly, a good methodology chapter explains not just   what methodological choices were made, but also explains  why they were made. In other words, the methodology chapter should justify  the design choices, by showing that the chosen methods and techniques are the best fit for the research aims, objectives and research questions. 

So, it’s the same as research design?

Not quite. As we mentioned, research methodology refers to the collection of practical decisions regarding what data you’ll collect, from who, how you’ll collect it and how you’ll analyse it. Research design, on the other hand, is more about the overall strategy you’ll adopt in your study. For example, whether you’ll use an experimental design in which you manipulate one variable while controlling others. You can learn more about research design and the various design types here .

Need a helping hand?

research methodology analysis definition

What are qualitative, quantitative and mixed-methods?

Qualitative, quantitative and mixed-methods are different types of methodological approaches, distinguished by their focus on words , numbers or both . This is a bit of an oversimplification, but its a good starting point for understanding.

Let’s take a closer look.

Qualitative research refers to research which focuses on collecting and analysing words (written or spoken) and textual or visual data, whereas quantitative research focuses on measurement and testing using numerical data . Qualitative analysis can also focus on other “softer” data points, such as body language or visual elements.

It’s quite common for a qualitative methodology to be used when the research aims and research questions are exploratory  in nature. For example, a qualitative methodology might be used to understand peoples’ perceptions about an event that took place, or a political candidate running for president. 

Contrasted to this, a quantitative methodology is typically used when the research aims and research questions are confirmatory  in nature. For example, a quantitative methodology might be used to measure the relationship between two variables (e.g. personality type and likelihood to commit a crime) or to test a set of hypotheses .

As you’ve probably guessed, the mixed-method methodology attempts to combine the best of both qualitative and quantitative methodologies to integrate perspectives and create a rich picture. If you’d like to learn more about these three methodological approaches, be sure to watch our explainer video below.

What is sampling strategy?

Simply put, sampling is about deciding who (or where) you’re going to collect your data from . Why does this matter? Well, generally it’s not possible to collect data from every single person in your group of interest (this is called the “population”), so you’ll need to engage a smaller portion of that group that’s accessible and manageable (this is called the “sample”).

How you go about selecting the sample (i.e., your sampling strategy) will have a major impact on your study.  There are many different sampling methods  you can choose from, but the two overarching categories are probability   sampling and  non-probability   sampling .

Probability sampling  involves using a completely random sample from the group of people you’re interested in. This is comparable to throwing the names all potential participants into a hat, shaking it up, and picking out the “winners”. By using a completely random sample, you’ll minimise the risk of selection bias and the results of your study will be more generalisable  to the entire population. 

Non-probability sampling , on the other hand,  doesn’t use a random sample . For example, it might involve using a convenience sample, which means you’d only interview or survey people that you have access to (perhaps your friends, family or work colleagues), rather than a truly random sample. With non-probability sampling, the results are typically not generalisable .

To learn more about sampling methods, be sure to check out the video below.

What are data collection methods?

As the name suggests, data collection methods simply refers to the way in which you go about collecting the data for your study. Some of the most common data collection methods include:

  • Interviews (which can be unstructured, semi-structured or structured)
  • Focus groups and group interviews
  • Surveys (online or physical surveys)
  • Observations (watching and recording activities)
  • Biophysical measurements (e.g., blood pressure, heart rate, etc.)
  • Documents and records (e.g., financial reports, court records, etc.)

The choice of which data collection method to use depends on your overall research aims and research questions , as well as practicalities and resource constraints. For example, if your research is exploratory in nature, qualitative methods such as interviews and focus groups would likely be a good fit. Conversely, if your research aims to measure specific variables or test hypotheses, large-scale surveys that produce large volumes of numerical data would likely be a better fit.

What are data analysis methods?

Data analysis methods refer to the methods and techniques that you’ll use to make sense of your data. These can be grouped according to whether the research is qualitative  (words-based) or quantitative (numbers-based).

Popular data analysis methods in qualitative research include:

  • Qualitative content analysis
  • Thematic analysis
  • Discourse analysis
  • Narrative analysis
  • Interpretative phenomenological analysis (IPA)
  • Visual analysis (of photographs, videos, art, etc.)

Qualitative data analysis all begins with data coding , after which an analysis method is applied. In some cases, more than one analysis method is used, depending on the research aims and research questions . In the video below, we explore some  common qualitative analysis methods, along with practical examples.  

Moving on to the quantitative side of things, popular data analysis methods in this type of research include:

  • Descriptive statistics (e.g. means, medians, modes )
  • Inferential statistics (e.g. correlation, regression, structural equation modelling)

Again, the choice of which data collection method to use depends on your overall research aims and objectives , as well as practicalities and resource constraints. In the video below, we explain some core concepts central to quantitative analysis.

How do I choose a research methodology?

As you’ve probably picked up by now, your research aims and objectives have a major influence on the research methodology . So, the starting point for developing your research methodology is to take a step back and look at the big picture of your research, before you make methodology decisions. The first question you need to ask yourself is whether your research is exploratory or confirmatory in nature.

If your research aims and objectives are primarily exploratory in nature, your research will likely be qualitative and therefore you might consider qualitative data collection methods (e.g. interviews) and analysis methods (e.g. qualitative content analysis). 

Conversely, if your research aims and objective are looking to measure or test something (i.e. they’re confirmatory), then your research will quite likely be quantitative in nature, and you might consider quantitative data collection methods (e.g. surveys) and analyses (e.g. statistical analysis).

Designing your research and working out your methodology is a large topic, which we cover extensively on the blog . For now, however, the key takeaway is that you should always start with your research aims, objectives and research questions (the golden thread). Every methodological choice you make needs align with those three components. 

Example of a research methodology chapter

In the video below, we provide a detailed walkthrough of a research methodology from an actual dissertation, as well as an overview of our free methodology template .

research methodology analysis definition

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

199 Comments

Leo Balanlay

Thank you for this simple yet comprehensive and easy to digest presentation. God Bless!

Derek Jansen

You’re most welcome, Leo. Best of luck with your research!

Asaf

I found it very useful. many thanks

Solomon F. Joel

This is really directional. A make-easy research knowledge.

Upendo Mmbaga

Thank you for this, I think will help my research proposal

vicky

Thanks for good interpretation,well understood.

Alhaji Alie Kanu

Good morning sorry I want to the search topic

Baraka Gombela

Thank u more

Boyd

Thank you, your explanation is simple and very helpful.

Suleiman Abubakar

Very educative a.nd exciting platform. A bigger thank you and I’ll like to always be with you

Daniel Mondela

That’s the best analysis

Okwuchukwu

So simple yet so insightful. Thank you.

Wendy Lushaba

This really easy to read as it is self-explanatory. Very much appreciated…

Lilian

Thanks for this. It’s so helpful and explicit. For those elements highlighted in orange, they were good sources of referrals for concepts I didn’t understand. A million thanks for this.

Tabe Solomon Matebesi

Good morning, I have been reading your research lessons through out a period of times. They are important, impressive and clear. Want to subscribe and be and be active with you.

Hafiz Tahir

Thankyou So much Sir Derek…

Good morning thanks so much for the on line lectures am a student of university of Makeni.select a research topic and deliberate on it so that we’ll continue to understand more.sorry that’s a suggestion.

James Olukoya

Beautiful presentation. I love it.

ATUL KUMAR

please provide a research mehodology example for zoology

Ogar , Praise

It’s very educative and well explained

Joseph Chan

Thanks for the concise and informative data.

Goja Terhemba John

This is really good for students to be safe and well understand that research is all about

Prakash thapa

Thank you so much Derek sir🖤🙏🤗

Abraham

Very simple and reliable

Chizor Adisa

This is really helpful. Thanks alot. God bless you.

Danushika

very useful, Thank you very much..

nakato justine

thanks a lot its really useful

karolina

in a nutshell..thank you!

Bitrus

Thanks for updating my understanding on this aspect of my Thesis writing.

VEDASTO DATIVA MATUNDA

thank you so much my through this video am competently going to do a good job my thesis

Jimmy

Thanks a lot. Very simple to understand. I appreciate 🙏

Mfumukazi

Very simple but yet insightful Thank you

Adegboyega ADaeBAYO

This has been an eye opening experience. Thank you grad coach team.

SHANTHi

Very useful message for research scholars

Teijili

Really very helpful thank you

sandokhan

yes you are right and i’m left

MAHAMUDUL HASSAN

Research methodology with a simplest way i have never seen before this article.

wogayehu tuji

wow thank u so much

Good morning thanks so much for the on line lectures am a student of university of Makeni.select a research topic and deliberate on is so that we will continue to understand more.sorry that’s a suggestion.

Gebregergish

Very precise and informative.

Javangwe Nyeketa

Thanks for simplifying these terms for us, really appreciate it.

Mary Benard Mwanganya

Thanks this has really helped me. It is very easy to understand.

mandla

I found the notes and the presentation assisting and opening my understanding on research methodology

Godfrey Martin Assenga

Good presentation

Nhubu Tawanda

Im so glad you clarified my misconceptions. Im now ready to fry my onions. Thank you so much. God bless

Odirile

Thank you a lot.

prathap

thanks for the easy way of learning and desirable presentation.

Ajala Tajudeen

Thanks a lot. I am inspired

Visor Likali

Well written

Pondris Patrick

I am writing a APA Format paper . I using questionnaire with 120 STDs teacher for my participant. Can you write me mthology for this research. Send it through email sent. Just need a sample as an example please. My topic is ” impacts of overcrowding on students learning

Thanks for your comment.

We can’t write your methodology for you. If you’re looking for samples, you should be able to find some sample methodologies on Google. Alternatively, you can download some previous dissertations from a dissertation directory and have a look at the methodology chapters therein.

All the best with your research.

Anon

Thank you so much for this!! God Bless

Keke

Thank you. Explicit explanation

Sophy

Thank you, Derek and Kerryn, for making this simple to understand. I’m currently at the inception stage of my research.

Luyanda

Thnks a lot , this was very usefull on my assignment

Beulah Emmanuel

excellent explanation

Gino Raz

I’m currently working on my master’s thesis, thanks for this! I’m certain that I will use Qualitative methodology.

Abigail

Thanks a lot for this concise piece, it was quite relieving and helpful. God bless you BIG…

Yonas Tesheme

I am currently doing my dissertation proposal and I am sure that I will do quantitative research. Thank you very much it was extremely helpful.

zahid t ahmad

Very interesting and informative yet I would like to know about examples of Research Questions as well, if possible.

Maisnam loyalakla

I’m about to submit a research presentation, I have come to understand from your simplification on understanding research methodology. My research will be mixed methodology, qualitative as well as quantitative. So aim and objective of mixed method would be both exploratory and confirmatory. Thanks you very much for your guidance.

Mila Milano

OMG thanks for that, you’re a life saver. You covered all the points I needed. Thank you so much ❤️ ❤️ ❤️

Christabel

Thank you immensely for this simple, easy to comprehend explanation of data collection methods. I have been stuck here for months 😩. Glad I found your piece. Super insightful.

Lika

I’m going to write synopsis which will be quantitative research method and I don’t know how to frame my topic, can I kindly get some ideas..

Arlene

Thanks for this, I was really struggling.

This was really informative I was struggling but this helped me.

Modie Maria Neswiswi

Thanks a lot for this information, simple and straightforward. I’m a last year student from the University of South Africa UNISA South Africa.

Mursel Amin

its very much informative and understandable. I have enlightened.

Mustapha Abubakar

An interesting nice exploration of a topic.

Sarah

Thank you. Accurate and simple🥰

Sikandar Ali Shah

This article was really helpful, it helped me understanding the basic concepts of the topic Research Methodology. The examples were very clear, and easy to understand. I would like to visit this website again. Thank you so much for such a great explanation of the subject.

Debbie

Thanks dude

Deborah

Thank you Doctor Derek for this wonderful piece, please help to provide your details for reference purpose. God bless.

Michael

Many compliments to you

Dana

Great work , thank you very much for the simple explanation

Aryan

Thank you. I had to give a presentation on this topic. I have looked everywhere on the internet but this is the best and simple explanation.

omodara beatrice

thank you, its very informative.

WALLACE

Well explained. Now I know my research methodology will be qualitative and exploratory. Thank you so much, keep up the good work

GEORGE REUBEN MSHEGAME

Well explained, thank you very much.

Ainembabazi Rose

This is good explanation, I have understood the different methods of research. Thanks a lot.

Kamran Saeed

Great work…very well explanation

Hyacinth Chebe Ukwuani

Thanks Derek. Kerryn was just fantastic!

Great to hear that, Hyacinth. Best of luck with your research!

Matobela Joel Marabi

Its a good templates very attractive and important to PhD students and lectuter

Thanks for the feedback, Matobela. Good luck with your research methodology.

Elie

Thank you. This is really helpful.

You’re very welcome, Elie. Good luck with your research methodology.

Sakina Dalal

Well explained thanks

Edward

This is a very helpful site especially for young researchers at college. It provides sufficient information to guide students and equip them with the necessary foundation to ask any other questions aimed at deepening their understanding.

Thanks for the kind words, Edward. Good luck with your research!

Ngwisa Marie-claire NJOTU

Thank you. I have learned a lot.

Great to hear that, Ngwisa. Good luck with your research methodology!

Claudine

Thank you for keeping your presentation simples and short and covering key information for research methodology. My key takeaway: Start with defining your research objective the other will depend on the aims of your research question.

Zanele

My name is Zanele I would like to be assisted with my research , and the topic is shortage of nursing staff globally want are the causes , effects on health, patients and community and also globally

Oluwafemi Taiwo

Thanks for making it simple and clear. It greatly helped in understanding research methodology. Regards.

Francis

This is well simplified and straight to the point

Gabriel mugangavari

Thank you Dr

Dina Haj Ibrahim

I was given an assignment to research 2 publications and describe their research methodology? I don’t know how to start this task can someone help me?

Sure. You’re welcome to book an initial consultation with one of our Research Coaches to discuss how we can assist – https://gradcoach.com/book/new/ .

BENSON ROSEMARY

Thanks a lot I am relieved of a heavy burden.keep up with the good work

Ngaka Mokoena

I’m very much grateful Dr Derek. I’m planning to pursue one of the careers that really needs one to be very much eager to know. There’s a lot of research to do and everything, but since I’ve gotten this information I will use it to the best of my potential.

Pritam Pal

Thank you so much, words are not enough to explain how helpful this session has been for me!

faith

Thanks this has thought me alot.

kenechukwu ambrose

Very concise and helpful. Thanks a lot

Eunice Shatila Sinyemu 32070

Thank Derek. This is very helpful. Your step by step explanation has made it easier for me to understand different concepts. Now i can get on with my research.

Michelle

I wish i had come across this sooner. So simple but yet insightful

yugine the

really nice explanation thank you so much

Goodness

I’m so grateful finding this site, it’s really helpful…….every term well explained and provide accurate understanding especially to student going into an in-depth research for the very first time, even though my lecturer already explained this topic to the class, I think I got the clear and efficient explanation here, much thanks to the author.

lavenda

It is very helpful material

Lubabalo Ntshebe

I would like to be assisted with my research topic : Literature Review and research methodologies. My topic is : what is the relationship between unemployment and economic growth?

Buddhi

Its really nice and good for us.

Ekokobe Aloysius

THANKS SO MUCH FOR EXPLANATION, ITS VERY CLEAR TO ME WHAT I WILL BE DOING FROM NOW .GREAT READS.

Asanka

Short but sweet.Thank you

Shishir Pokharel

Informative article. Thanks for your detailed information.

Badr Alharbi

I’m currently working on my Ph.D. thesis. Thanks a lot, Derek and Kerryn, Well-organized sequences, facilitate the readers’ following.

Tejal

great article for someone who does not have any background can even understand

Hasan Chowdhury

I am a bit confused about research design and methodology. Are they the same? If not, what are the differences and how are they related?

Thanks in advance.

Ndileka Myoli

concise and informative.

Sureka Batagoda

Thank you very much

More Smith

How can we site this article is Harvard style?

Anne

Very well written piece that afforded better understanding of the concept. Thank you!

Denis Eken Lomoro

Am a new researcher trying to learn how best to write a research proposal. I find your article spot on and want to download the free template but finding difficulties. Can u kindly send it to my email, the free download entitled, “Free Download: Research Proposal Template (with Examples)”.

fatima sani

Thank too much

Khamis

Thank you very much for your comprehensive explanation about research methodology so I like to thank you again for giving us such great things.

Aqsa Iftijhar

Good very well explained.Thanks for sharing it.

Krishna Dhakal

Thank u sir, it is really a good guideline.

Vimbainashe

so helpful thank you very much.

Joelma M Monteiro

Thanks for the video it was very explanatory and detailed, easy to comprehend and follow up. please, keep it up the good work

AVINASH KUMAR NIRALA

It was very helpful, a well-written document with precise information.

orebotswe morokane

how do i reference this?

Roy

MLA Jansen, Derek, and Kerryn Warren. “What (Exactly) Is Research Methodology?” Grad Coach, June 2021, gradcoach.com/what-is-research-methodology/.

APA Jansen, D., & Warren, K. (2021, June). What (Exactly) Is Research Methodology? Grad Coach. https://gradcoach.com/what-is-research-methodology/

sheryl

Your explanation is easily understood. Thank you

Dr Christie

Very help article. Now I can go my methodology chapter in my thesis with ease

Alice W. Mbuthia

I feel guided ,Thank you

Joseph B. Smith

This simplification is very helpful. It is simple but very educative, thanks ever so much

Dr. Ukpai Ukpai Eni

The write up is informative and educative. It is an academic intellectual representation that every good researcher can find useful. Thanks

chimbini Joseph

Wow, this is wonderful long live.

Tahir

Nice initiative

Thembsie

thank you the video was helpful to me.

JesusMalick

Thank you very much for your simple and clear explanations I’m really satisfied by the way you did it By now, I think I can realize a very good article by following your fastidious indications May God bless you

G.Horizon

Thanks very much, it was very concise and informational for a beginner like me to gain an insight into what i am about to undertake. I really appreciate.

Adv Asad Ali

very informative sir, it is amazing to understand the meaning of question hidden behind that, and simple language is used other than legislature to understand easily. stay happy.

Jonas Tan

This one is really amazing. All content in your youtube channel is a very helpful guide for doing research. Thanks, GradCoach.

mahmoud ali

research methodologies

Lucas Sinyangwe

Please send me more information concerning dissertation research.

Amamten Jr.

Nice piece of knowledge shared….. #Thump_UP

Hajara Salihu

This is amazing, it has said it all. Thanks to Gradcoach

Gerald Andrew Babu

This is wonderful,very elaborate and clear.I hope to reach out for your assistance in my research very soon.

Safaa

This is the answer I am searching about…

realy thanks a lot

Ahmed Saeed

Thank you very much for this awesome, to the point and inclusive article.

Soraya Kolli

Thank you very much I need validity and reliability explanation I have exams

KuzivaKwenda

Thank you for a well explained piece. This will help me going forward.

Emmanuel Chukwuma

Very simple and well detailed Many thanks

Zeeshan Ali Khan

This is so very simple yet so very effective and comprehensive. An Excellent piece of work.

Molly Wasonga

I wish I saw this earlier on! Great insights for a beginner(researcher) like me. Thanks a mil!

Blessings Chigodo

Thank you very much, for such a simplified, clear and practical step by step both for academic students and general research work. Holistic, effective to use and easy to read step by step. One can easily apply the steps in practical terms and produce a quality document/up-to standard

Thanks for simplifying these terms for us, really appreciated.

Joseph Kyereme

Thanks for a great work. well understood .

Julien

This was very helpful. It was simple but profound and very easy to understand. Thank you so much!

Kishimbo

Great and amazing research guidelines. Best site for learning research

ankita bhatt

hello sir/ma’am, i didn’t find yet that what type of research methodology i am using. because i am writing my report on CSR and collect all my data from websites and articles so which type of methodology i should write in dissertation report. please help me. i am from India.

memory

how does this really work?

princelow presley

perfect content, thanks a lot

George Nangpaak Duut

As a researcher, I commend you for the detailed and simplified information on the topic in question. I would like to remain in touch for the sharing of research ideas on other topics. Thank you

EPHRAIM MWANSA MULENGA

Impressive. Thank you, Grad Coach 😍

Thank you Grad Coach for this piece of information. I have at least learned about the different types of research methodologies.

Varinder singh Rana

Very useful content with easy way

Mbangu Jones Kashweeka

Thank you very much for the presentation. I am an MPH student with the Adventist University of Africa. I have successfully completed my theory and starting on my research this July. My topic is “Factors associated with Dental Caries in (one District) in Botswana. I need help on how to go about this quantitative research

Carolyn Russell

I am so grateful to run across something that was sooo helpful. I have been on my doctorate journey for quite some time. Your breakdown on methodology helped me to refresh my intent. Thank you.

Indabawa Musbahu

thanks so much for this good lecture. student from university of science and technology, Wudil. Kano Nigeria.

Limpho Mphutlane

It’s profound easy to understand I appreciate

Mustafa Salimi

Thanks a lot for sharing superb information in a detailed but concise manner. It was really helpful and helped a lot in getting into my own research methodology.

Rabilu yau

Comment * thanks very much

Ari M. Hussein

This was sooo helpful for me thank you so much i didn’t even know what i had to write thank you!

You’re most welcome 🙂

Varsha Patnaik

Simple and good. Very much helpful. Thank you so much.

STARNISLUS HAAMBOKOMA

This is very good work. I have benefited.

Dr Md Asraul Hoque

Thank you so much for sharing

Nkasa lizwi

This is powerful thank you so much guys

I am nkasa lizwi doing my research proposal on honors with the university of Walter Sisulu Komani I m on part 3 now can you assist me.my topic is: transitional challenges faced by educators in intermediate phase in the Alfred Nzo District.

Atonisah Jonathan

Appreciate the presentation. Very useful step-by-step guidelines to follow.

Bello Suleiman

I appreciate sir

Titilayo

wow! This is super insightful for me. Thank you!

Emerita Guzman

Indeed this material is very helpful! Kudos writers/authors.

TSEDEKE JOHN

I want to say thank you very much, I got a lot of info and knowledge. Be blessed.

Akanji wasiu

I want present a seminar paper on Optimisation of Deep learning-based models on vulnerability detection in digital transactions.

Need assistance

Clement Lokwar

Dear Sir, I want to be assisted on my research on Sanitation and Water management in emergencies areas.

Peter Sone Kome

I am deeply grateful for the knowledge gained. I will be getting in touch shortly as I want to be assisted in my ongoing research.

Nirmala

The information shared is informative, crisp and clear. Kudos Team! And thanks a lot!

Bipin pokhrel

hello i want to study

Kassahun

Hello!! Grad coach teams. I am extremely happy in your tutorial or consultation. i am really benefited all material and briefing. Thank you very much for your generous helps. Please keep it up. If you add in your briefing, references for further reading, it will be very nice.

Ezra

All I have to say is, thank u gyz.

Work

Good, l thanks

Artak Ghonyan

thank you, it is very useful

Trackbacks/Pingbacks

  • What Is A Literature Review (In A Dissertation Or Thesis) - Grad Coach - […] the literature review is to inform the choice of methodology for your own research. As we’ve discussed on the Grad Coach blog,…
  • Free Download: Research Proposal Template (With Examples) - Grad Coach - […] Research design (methodology) […]
  • Dissertation vs Thesis: What's the difference? - Grad Coach - […] and thesis writing on a daily basis – everything from how to find a good research topic to which…

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

research methodology analysis definition

Get science-backed answers as you write with Paperpal's Research feature

What is Research Methodology? Definition, Types, and Examples

research methodology analysis definition

Research methodology 1,2 is a structured and scientific approach used to collect, analyze, and interpret quantitative or qualitative data to answer research questions or test hypotheses. A research methodology is like a plan for carrying out research and helps keep researchers on track by limiting the scope of the research. Several aspects must be considered before selecting an appropriate research methodology, such as research limitations and ethical concerns that may affect your research.

The research methodology section in a scientific paper describes the different methodological choices made, such as the data collection and analysis methods, and why these choices were selected. The reasons should explain why the methods chosen are the most appropriate to answer the research question. A good research methodology also helps ensure the reliability and validity of the research findings. There are three types of research methodology—quantitative, qualitative, and mixed-method, which can be chosen based on the research objectives.

What is research methodology ?

A research methodology describes the techniques and procedures used to identify and analyze information regarding a specific research topic. It is a process by which researchers design their study so that they can achieve their objectives using the selected research instruments. It includes all the important aspects of research, including research design, data collection methods, data analysis methods, and the overall framework within which the research is conducted. While these points can help you understand what is research methodology, you also need to know why it is important to pick the right methodology.

Why is research methodology important?

Having a good research methodology in place has the following advantages: 3

  • Helps other researchers who may want to replicate your research; the explanations will be of benefit to them.
  • You can easily answer any questions about your research if they arise at a later stage.
  • A research methodology provides a framework and guidelines for researchers to clearly define research questions, hypotheses, and objectives.
  • It helps researchers identify the most appropriate research design, sampling technique, and data collection and analysis methods.
  • A sound research methodology helps researchers ensure that their findings are valid and reliable and free from biases and errors.
  • It also helps ensure that ethical guidelines are followed while conducting research.
  • A good research methodology helps researchers in planning their research efficiently, by ensuring optimum usage of their time and resources.

Writing the methods section of a research paper? Let Paperpal help you achieve perfection

Types of research methodology.

There are three types of research methodology based on the type of research and the data required. 1

  • Quantitative research methodology focuses on measuring and testing numerical data. This approach is good for reaching a large number of people in a short amount of time. This type of research helps in testing the causal relationships between variables, making predictions, and generalizing results to wider populations.
  • Qualitative research methodology examines the opinions, behaviors, and experiences of people. It collects and analyzes words and textual data. This research methodology requires fewer participants but is still more time consuming because the time spent per participant is quite large. This method is used in exploratory research where the research problem being investigated is not clearly defined.
  • Mixed-method research methodology uses the characteristics of both quantitative and qualitative research methodologies in the same study. This method allows researchers to validate their findings, verify if the results observed using both methods are complementary, and explain any unexpected results obtained from one method by using the other method.

What are the types of sampling designs in research methodology?

Sampling 4 is an important part of a research methodology and involves selecting a representative sample of the population to conduct the study, making statistical inferences about them, and estimating the characteristics of the whole population based on these inferences. There are two types of sampling designs in research methodology—probability and nonprobability.

  • Probability sampling

In this type of sampling design, a sample is chosen from a larger population using some form of random selection, that is, every member of the population has an equal chance of being selected. The different types of probability sampling are:

  • Systematic —sample members are chosen at regular intervals. It requires selecting a starting point for the sample and sample size determination that can be repeated at regular intervals. This type of sampling method has a predefined range; hence, it is the least time consuming.
  • Stratified —researchers divide the population into smaller groups that don’t overlap but represent the entire population. While sampling, these groups can be organized, and then a sample can be drawn from each group separately.
  • Cluster —the population is divided into clusters based on demographic parameters like age, sex, location, etc.
  • Convenience —selects participants who are most easily accessible to researchers due to geographical proximity, availability at a particular time, etc.
  • Purposive —participants are selected at the researcher’s discretion. Researchers consider the purpose of the study and the understanding of the target audience.
  • Snowball —already selected participants use their social networks to refer the researcher to other potential participants.
  • Quota —while designing the study, the researchers decide how many people with which characteristics to include as participants. The characteristics help in choosing people most likely to provide insights into the subject.

What are data collection methods?

During research, data are collected using various methods depending on the research methodology being followed and the research methods being undertaken. Both qualitative and quantitative research have different data collection methods, as listed below.

Qualitative research 5

  • One-on-one interviews: Helps the interviewers understand a respondent’s subjective opinion and experience pertaining to a specific topic or event
  • Document study/literature review/record keeping: Researchers’ review of already existing written materials such as archives, annual reports, research articles, guidelines, policy documents, etc.
  • Focus groups: Constructive discussions that usually include a small sample of about 6-10 people and a moderator, to understand the participants’ opinion on a given topic.
  • Qualitative observation : Researchers collect data using their five senses (sight, smell, touch, taste, and hearing).

Quantitative research 6

  • Sampling: The most common type is probability sampling.
  • Interviews: Commonly telephonic or done in-person.
  • Observations: Structured observations are most commonly used in quantitative research. In this method, researchers make observations about specific behaviors of individuals in a structured setting.
  • Document review: Reviewing existing research or documents to collect evidence for supporting the research.
  • Surveys and questionnaires. Surveys can be administered both online and offline depending on the requirement and sample size.

Let Paperpal help you write the perfect research methods section. Start now!

What are data analysis methods.

The data collected using the various methods for qualitative and quantitative research need to be analyzed to generate meaningful conclusions. These data analysis methods 7 also differ between quantitative and qualitative research.

Quantitative research involves a deductive method for data analysis where hypotheses are developed at the beginning of the research and precise measurement is required. The methods include statistical analysis applications to analyze numerical data and are grouped into two categories—descriptive and inferential.

Descriptive analysis is used to describe the basic features of different types of data to present it in a way that ensures the patterns become meaningful. The different types of descriptive analysis methods are:

  • Measures of frequency (count, percent, frequency)
  • Measures of central tendency (mean, median, mode)
  • Measures of dispersion or variation (range, variance, standard deviation)
  • Measure of position (percentile ranks, quartile ranks)

Inferential analysis is used to make predictions about a larger population based on the analysis of the data collected from a smaller population. This analysis is used to study the relationships between different variables. Some commonly used inferential data analysis methods are:

  • Correlation: To understand the relationship between two or more variables.
  • Cross-tabulation: Analyze the relationship between multiple variables.
  • Regression analysis: Study the impact of independent variables on the dependent variable.
  • Frequency tables: To understand the frequency of data.
  • Analysis of variance: To test the degree to which two or more variables differ in an experiment.

Qualitative research involves an inductive method for data analysis where hypotheses are developed after data collection. The methods include:

  • Content analysis: For analyzing documented information from text and images by determining the presence of certain words or concepts in texts.
  • Narrative analysis: For analyzing content obtained from sources such as interviews, field observations, and surveys. The stories and opinions shared by people are used to answer research questions.
  • Discourse analysis: For analyzing interactions with people considering the social context, that is, the lifestyle and environment, under which the interaction occurs.
  • Grounded theory: Involves hypothesis creation by data collection and analysis to explain why a phenomenon occurred.
  • Thematic analysis: To identify important themes or patterns in data and use these to address an issue.

How to choose a research methodology?

Here are some important factors to consider when choosing a research methodology: 8

  • Research objectives, aims, and questions —these would help structure the research design.
  • Review existing literature to identify any gaps in knowledge.
  • Check the statistical requirements —if data-driven or statistical results are needed then quantitative research is the best. If the research questions can be answered based on people’s opinions and perceptions, then qualitative research is most suitable.
  • Sample size —sample size can often determine the feasibility of a research methodology. For a large sample, less effort- and time-intensive methods are appropriate.
  • Constraints —constraints of time, geography, and resources can help define the appropriate methodology.

Got writer’s block? Kickstart your research paper writing with Paperpal now!

How to write a research methodology .

A research methodology should include the following components: 3,9

  • Research design —should be selected based on the research question and the data required. Common research designs include experimental, quasi-experimental, correlational, descriptive, and exploratory.
  • Research method —this can be quantitative, qualitative, or mixed-method.
  • Reason for selecting a specific methodology —explain why this methodology is the most suitable to answer your research problem.
  • Research instruments —explain the research instruments you plan to use, mainly referring to the data collection methods such as interviews, surveys, etc. Here as well, a reason should be mentioned for selecting the particular instrument.
  • Sampling —this involves selecting a representative subset of the population being studied.
  • Data collection —involves gathering data using several data collection methods, such as surveys, interviews, etc.
  • Data analysis —describe the data analysis methods you will use once you’ve collected the data.
  • Research limitations —mention any limitations you foresee while conducting your research.
  • Validity and reliability —validity helps identify the accuracy and truthfulness of the findings; reliability refers to the consistency and stability of the results over time and across different conditions.
  • Ethical considerations —research should be conducted ethically. The considerations include obtaining consent from participants, maintaining confidentiality, and addressing conflicts of interest.

Streamline Your Research Paper Writing Process with Paperpal

The methods section is a critical part of the research papers, allowing researchers to use this to understand your findings and replicate your work when pursuing their own research. However, it is usually also the most difficult section to write. This is where Paperpal can help you overcome the writer’s block and create the first draft in minutes with Paperpal Copilot, its secure generative AI feature suite.  

With Paperpal you can get research advice, write and refine your work, rephrase and verify the writing, and ensure submission readiness, all in one place. Here’s how you can use Paperpal to develop the first draft of your methods section.  

  • Generate an outline: Input some details about your research to instantly generate an outline for your methods section 
  • Develop the section: Use the outline and suggested sentence templates to expand your ideas and develop the first draft.  
  • P araph ras e and trim : Get clear, concise academic text with paraphrasing that conveys your work effectively and word reduction to fix redundancies. 
  • Choose the right words: Enhance text by choosing contextual synonyms based on how the words have been used in previously published work.  
  • Check and verify text : Make sure the generated text showcases your methods correctly, has all the right citations, and is original and authentic. .   

You can repeat this process to develop each section of your research manuscript, including the title, abstract and keywords. Ready to write your research papers faster, better, and without the stress? Sign up for Paperpal and start writing today!

Frequently Asked Questions

Q1. What are the key components of research methodology?

A1. A good research methodology has the following key components:

  • Research design
  • Data collection procedures
  • Data analysis methods
  • Ethical considerations

Q2. Why is ethical consideration important in research methodology?

A2. Ethical consideration is important in research methodology to ensure the readers of the reliability and validity of the study. Researchers must clearly mention the ethical norms and standards followed during the conduct of the research and also mention if the research has been cleared by any institutional board. The following 10 points are the important principles related to ethical considerations: 10

  • Participants should not be subjected to harm.
  • Respect for the dignity of participants should be prioritized.
  • Full consent should be obtained from participants before the study.
  • Participants’ privacy should be ensured.
  • Confidentiality of the research data should be ensured.
  • Anonymity of individuals and organizations participating in the research should be maintained.
  • The aims and objectives of the research should not be exaggerated.
  • Affiliations, sources of funding, and any possible conflicts of interest should be declared.
  • Communication in relation to the research should be honest and transparent.
  • Misleading information and biased representation of primary data findings should be avoided.

Q3. What is the difference between methodology and method?

A3. Research methodology is different from a research method, although both terms are often confused. Research methods are the tools used to gather data, while the research methodology provides a framework for how research is planned, conducted, and analyzed. The latter guides researchers in making decisions about the most appropriate methods for their research. Research methods refer to the specific techniques, procedures, and tools used by researchers to collect, analyze, and interpret data, for instance surveys, questionnaires, interviews, etc.

Research methodology is, thus, an integral part of a research study. It helps ensure that you stay on track to meet your research objectives and answer your research questions using the most appropriate data collection and analysis tools based on your research design.

Accelerate your research paper writing with Paperpal. Try for free now!

  • Research methodologies. Pfeiffer Library website. Accessed August 15, 2023. https://library.tiffin.edu/researchmethodologies/whatareresearchmethodologies
  • Types of research methodology. Eduvoice website. Accessed August 16, 2023. https://eduvoice.in/types-research-methodology/
  • The basics of research methodology: A key to quality research. Voxco. Accessed August 16, 2023. https://www.voxco.com/blog/what-is-research-methodology/
  • Sampling methods: Types with examples. QuestionPro website. Accessed August 16, 2023. https://www.questionpro.com/blog/types-of-sampling-for-social-research/
  • What is qualitative research? Methods, types, approaches, examples. Researcher.Life blog. Accessed August 15, 2023. https://researcher.life/blog/article/what-is-qualitative-research-methods-types-examples/
  • What is quantitative research? Definition, methods, types, and examples. Researcher.Life blog. Accessed August 15, 2023. https://researcher.life/blog/article/what-is-quantitative-research-types-and-examples/
  • Data analysis in research: Types & methods. QuestionPro website. Accessed August 16, 2023. https://www.questionpro.com/blog/data-analysis-in-research/#Data_analysis_in_qualitative_research
  • Factors to consider while choosing the right research methodology. PhD Monster website. Accessed August 17, 2023. https://www.phdmonster.com/factors-to-consider-while-choosing-the-right-research-methodology/
  • What is research methodology? Research and writing guides. Accessed August 14, 2023. https://paperpile.com/g/what-is-research-methodology/
  • Ethical considerations. Business research methodology website. Accessed August 17, 2023. https://research-methodology.net/research-methodology/ethical-considerations/

Paperpal is a comprehensive AI writing toolkit that helps students and researchers achieve 2x the writing in half the time. It leverages 21+ years of STM experience and insights from millions of research articles to provide in-depth academic writing, language editing, and submission readiness support to help you write better, faster.  

Get accurate academic translations, rewriting support, grammar checks, vocabulary suggestions, and generative AI assistance that delivers human precision at machine speed. Try for free or upgrade to Paperpal Prime starting at US$19 a month to access premium features, including consistency, plagiarism, and 30+ submission readiness checks to help you succeed.  

Experience the future of academic writing – Sign up to Paperpal and start writing for free!  

Related Reads:

  • Dangling Modifiers and How to Avoid Them in Your Writing 
  • Webinar: How to Use Generative AI Tools Ethically in Your Academic Writing
  • Research Outlines: How to Write An Introduction Section in Minutes with Paperpal Copilot
  • How to Paraphrase Research Papers Effectively

Language and Grammar Rules for Academic Writing

Climatic vs. climactic: difference and examples, you may also like, how to write an academic paragraph (step-by-step guide), maintaining academic integrity with paperpal’s generative ai writing..., research funding basics: what should a grant proposal..., how to write an abstract in research papers..., how to write dissertation acknowledgements, how to structure an essay, leveraging generative ai to enhance student understanding of..., how to write a good hook for essays,..., addressing peer review feedback and mastering manuscript revisions..., how paperpal can boost comprehension and foster interdisciplinary....

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology

Research Methods | Definition, Types, Examples

Research methods are specific procedures for collecting and analysing data. Developing your research methods is an integral part of your research design . When planning your methods, there are two key decisions you will make.

First, decide how you will collect data . Your methods depend on what type of data you need to answer your research question :

  • Qualitative vs quantitative : Will your data take the form of words or numbers?
  • Primary vs secondary : Will you collect original data yourself, or will you use data that have already been collected by someone else?
  • Descriptive vs experimental : Will you take measurements of something as it is, or will you perform an experiment?

Second, decide how you will analyse the data .

  • For quantitative data, you can use statistical analysis methods to test relationships between variables.
  • For qualitative data, you can use methods such as thematic analysis to interpret patterns and meanings in the data.

Table of contents

Methods for collecting data, examples of data collection methods, methods for analysing data, examples of data analysis methods, frequently asked questions about methodology.

Data are the information that you collect for the purposes of answering your research question . The type of data you need depends on the aims of your research.

Qualitative vs quantitative data

Your choice of qualitative or quantitative data collection depends on the type of knowledge you want to develop.

For questions about ideas, experiences and meanings, or to study something that can’t be described numerically, collect qualitative data .

If you want to develop a more mechanistic understanding of a topic, or your research involves hypothesis testing , collect quantitative data .

Qualitative
Quantitative .

You can also take a mixed methods approach, where you use both qualitative and quantitative research methods.

Primary vs secondary data

Primary data are any original information that you collect for the purposes of answering your research question (e.g. through surveys , observations and experiments ). Secondary data are information that has already been collected by other researchers (e.g. in a government census or previous scientific studies).

If you are exploring a novel research question, you’ll probably need to collect primary data. But if you want to synthesise existing knowledge, analyse historical trends, or identify patterns on a large scale, secondary data might be a better choice.

Primary
Secondary

Descriptive vs experimental data

In descriptive research , you collect data about your study subject without intervening. The validity of your research will depend on your sampling method .

In experimental research , you systematically intervene in a process and measure the outcome. The validity of your research will depend on your experimental design .

To conduct an experiment, you need to be able to vary your independent variable , precisely measure your dependent variable, and control for confounding variables . If it’s practically and ethically possible, this method is the best choice for answering questions about cause and effect.

Descriptive
Experimental

Prevent plagiarism, run a free check.

Research methods for collecting data
Research method Primary or secondary? Qualitative or quantitative? When to use
Primary Quantitative To test cause-and-effect relationships.
Primary Quantitative To understand general characteristics of a population.
Interview/focus group Primary Qualitative To gain more in-depth understanding of a topic.
Observation Primary Either To understand how something occurs in its natural setting.
Secondary Either To situate your research in an existing body of work, or to evaluate trends within a research topic.
Either Either To gain an in-depth understanding of a specific group or context, or when you don’t have the resources for a large study.

Your data analysis methods will depend on the type of data you collect and how you prepare them for analysis.

Data can often be analysed both quantitatively and qualitatively. For example, survey responses could be analysed qualitatively by studying the meanings of responses or quantitatively by studying the frequencies of responses.

Qualitative analysis methods

Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that were collected:

  • From open-ended survey and interview questions, literature reviews, case studies, and other sources that use text rather than numbers.
  • Using non-probability sampling methods .

Qualitative analysis tends to be quite flexible and relies on the researcher’s judgement, so you have to reflect carefully on your choices and assumptions.

Quantitative analysis methods

Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive studies) or cause-and-effect relationships (in experiments).

You can use quantitative analysis to interpret data that were collected either:

  • During an experiment.
  • Using probability sampling methods .

Because the data are collected and analysed in a statistically valid way, the results of quantitative analysis can be easily standardised and shared among researchers.

Research methods for analysing data
Research method Qualitative or quantitative? When to use
Quantitative To analyse data collected in a statistically valid manner (e.g. from experiments, surveys, and observations).
Meta-analysis Quantitative To statistically analyse the results of a large collection of studies.

Can only be applied to studies that collected data in a statistically valid manner.

Qualitative To analyse data collected from interviews, focus groups or textual sources.

To understand general themes in the data and how they are communicated.

Either To analyse large volumes of textual or visual data collected from surveys, literature reviews, or other sources.

Can be quantitative (i.e. frequencies of words) or qualitative (i.e. meanings of words).

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

A sample is a subset of individuals from a larger population. Sampling means selecting the group that you will actually collect data from in your research.

For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

Statistical sampling allows you to test a hypothesis about the characteristics of a population. There are various sampling methods you can use to ensure that your sample is representative of the population as a whole.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyse data (e.g. experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Is this article helpful?

More interesting articles.

  • A Quick Guide to Experimental Design | 5 Steps & Examples
  • Between-Subjects Design | Examples, Pros & Cons
  • Case Study | Definition, Examples & Methods
  • Cluster Sampling | A Simple Step-by-Step Guide with Examples
  • Confounding Variables | Definition, Examples & Controls
  • Construct Validity | Definition, Types, & Examples
  • Content Analysis | A Step-by-Step Guide with Examples
  • Control Groups and Treatment Groups | Uses & Examples
  • Controlled Experiments | Methods & Examples of Control
  • Correlation vs Causation | Differences, Designs & Examples
  • Correlational Research | Guide, Design & Examples
  • Critical Discourse Analysis | Definition, Guide & Examples
  • Cross-Sectional Study | Definitions, Uses & Examples
  • Data Cleaning | A Guide with Examples & Steps
  • Data Collection Methods | Step-by-Step Guide & Examples
  • Descriptive Research Design | Definition, Methods & Examples
  • Doing Survey Research | A Step-by-Step Guide & Examples
  • Ethical Considerations in Research | Types & Examples
  • Explanatory Research | Definition, Guide, & Examples
  • Explanatory vs Response Variables | Definitions & Examples
  • Exploratory Research | Definition, Guide, & Examples
  • External Validity | Types, Threats & Examples
  • Extraneous Variables | Examples, Types, Controls
  • Face Validity | Guide with Definition & Examples
  • How to Do Thematic Analysis | Guide & Examples
  • How to Write a Strong Hypothesis | Guide & Examples
  • Inclusion and Exclusion Criteria | Examples & Definition
  • Independent vs Dependent Variables | Definition & Examples
  • Inductive Reasoning | Types, Examples, Explanation
  • Inductive vs Deductive Research Approach (with Examples)
  • Internal Validity | Definition, Threats & Examples
  • Internal vs External Validity | Understanding Differences & Examples
  • Longitudinal Study | Definition, Approaches & Examples
  • Mediator vs Moderator Variables | Differences & Examples
  • Mixed Methods Research | Definition, Guide, & Examples
  • Multistage Sampling | An Introductory Guide with Examples
  • Naturalistic Observation | Definition, Guide & Examples
  • Operationalisation | A Guide with Examples, Pros & Cons
  • Population vs Sample | Definitions, Differences & Examples
  • Primary Research | Definition, Types, & Examples
  • Qualitative vs Quantitative Research | Examples & Methods
  • Quasi-Experimental Design | Definition, Types & Examples
  • Questionnaire Design | Methods, Question Types & Examples
  • Random Assignment in Experiments | Introduction & Examples
  • Reliability vs Validity in Research | Differences, Types & Examples
  • Reproducibility vs Replicability | Difference & Examples
  • Research Design | Step-by-Step Guide with Examples
  • Sampling Methods | Types, Techniques, & Examples
  • Semi-Structured Interview | Definition, Guide & Examples
  • Simple Random Sampling | Definition, Steps & Examples
  • Stratified Sampling | A Step-by-Step Guide with Examples
  • Structured Interview | Definition, Guide & Examples
  • Systematic Review | Definition, Examples & Guide
  • Systematic Sampling | A Step-by-Step Guide with Examples
  • Textual Analysis | Guide, 3 Approaches & Examples
  • The 4 Types of Reliability in Research | Definitions & Examples
  • The 4 Types of Validity | Types, Definitions & Examples
  • Transcribing an Interview | 5 Steps & Transcription Software
  • Triangulation in Research | Guide, Types, Examples
  • Types of Interviews in Research | Guide & Examples
  • Types of Research Designs Compared | Examples
  • Types of Variables in Research | Definitions & Examples
  • Unstructured Interview | Definition, Guide & Examples
  • What Are Control Variables | Definition & Examples
  • What Is a Case-Control Study? | Definition & Examples
  • What Is a Cohort Study? | Definition & Examples
  • What Is a Conceptual Framework? | Tips & Examples
  • What Is a Double-Barrelled Question?
  • What Is a Double-Blind Study? | Introduction & Examples
  • What Is a Focus Group? | Step-by-Step Guide & Examples
  • What Is a Likert Scale? | Guide & Examples
  • What is a Literature Review? | Guide, Template, & Examples
  • What Is a Prospective Cohort Study? | Definition & Examples
  • What Is a Retrospective Cohort Study? | Definition & Examples
  • What Is Action Research? | Definition & Examples
  • What Is an Observational Study? | Guide & Examples
  • What Is Concurrent Validity? | Definition & Examples
  • What Is Content Validity? | Definition & Examples
  • What Is Convenience Sampling? | Definition & Examples
  • What Is Convergent Validity? | Definition & Examples
  • What Is Criterion Validity? | Definition & Examples
  • What Is Deductive Reasoning? | Explanation & Examples
  • What Is Discriminant Validity? | Definition & Example
  • What Is Ecological Validity? | Definition & Examples
  • What Is Ethnography? | Meaning, Guide & Examples
  • What Is Non-Probability Sampling? | Types & Examples
  • What Is Participant Observation? | Definition & Examples
  • What Is Peer Review? | Types & Examples
  • What Is Predictive Validity? | Examples & Definition
  • What Is Probability Sampling? | Types & Examples
  • What Is Purposive Sampling? | Definition & Examples
  • What Is Qualitative Observation? | Definition & Examples
  • What Is Qualitative Research? | Methods & Examples
  • What Is Quantitative Observation? | Definition & Examples
  • What Is Quantitative Research? | Definition & Methods
  • What Is Quota Sampling? | Definition & Examples
  • What is Secondary Research? | Definition, Types, & Examples
  • What Is Snowball Sampling? | Definition & Examples
  • Within-Subjects Design | Explanation, Approaches, Examples

Reference management. Clean and simple.

What is research methodology?

research methodology analysis definition

The basics of research methodology

Why do you need a research methodology, what needs to be included, why do you need to document your research method, what are the different types of research instruments, qualitative / quantitative / mixed research methodologies, how do you choose the best research methodology for you, frequently asked questions about research methodology, related articles.

When you’re working on your first piece of academic research, there are many different things to focus on, and it can be overwhelming to stay on top of everything. This is especially true of budding or inexperienced researchers.

If you’ve never put together a research proposal before or find yourself in a position where you need to explain your research methodology decisions, there are a few things you need to be aware of.

Once you understand the ins and outs, handling academic research in the future will be less intimidating. We break down the basics below:

A research methodology encompasses the way in which you intend to carry out your research. This includes how you plan to tackle things like collection methods, statistical analysis, participant observations, and more.

You can think of your research methodology as being a formula. One part will be how you plan on putting your research into practice, and another will be why you feel this is the best way to approach it. Your research methodology is ultimately a methodological and systematic plan to resolve your research problem.

In short, you are explaining how you will take your idea and turn it into a study, which in turn will produce valid and reliable results that are in accordance with the aims and objectives of your research. This is true whether your paper plans to make use of qualitative methods or quantitative methods.

The purpose of a research methodology is to explain the reasoning behind your approach to your research - you'll need to support your collection methods, methods of analysis, and other key points of your work.

Think of it like writing a plan or an outline for you what you intend to do.

When carrying out research, it can be easy to go off-track or depart from your standard methodology.

Tip: Having a methodology keeps you accountable and on track with your original aims and objectives, and gives you a suitable and sound plan to keep your project manageable, smooth, and effective.

With all that said, how do you write out your standard approach to a research methodology?

As a general plan, your methodology should include the following information:

  • Your research method.  You need to state whether you plan to use quantitative analysis, qualitative analysis, or mixed-method research methods. This will often be determined by what you hope to achieve with your research.
  • Explain your reasoning. Why are you taking this methodological approach? Why is this particular methodology the best way to answer your research problem and achieve your objectives?
  • Explain your instruments.  This will mainly be about your collection methods. There are varying instruments to use such as interviews, physical surveys, questionnaires, for example. Your methodology will need to detail your reasoning in choosing a particular instrument for your research.
  • What will you do with your results?  How are you going to analyze the data once you have gathered it?
  • Advise your reader.  If there is anything in your research methodology that your reader might be unfamiliar with, you should explain it in more detail. For example, you should give any background information to your methods that might be relevant or provide your reasoning if you are conducting your research in a non-standard way.
  • How will your sampling process go?  What will your sampling procedure be and why? For example, if you will collect data by carrying out semi-structured or unstructured interviews, how will you choose your interviewees and how will you conduct the interviews themselves?
  • Any practical limitations?  You should discuss any limitations you foresee being an issue when you’re carrying out your research.

In any dissertation, thesis, or academic journal, you will always find a chapter dedicated to explaining the research methodology of the person who carried out the study, also referred to as the methodology section of the work.

A good research methodology will explain what you are going to do and why, while a poor methodology will lead to a messy or disorganized approach.

You should also be able to justify in this section your reasoning for why you intend to carry out your research in a particular way, especially if it might be a particularly unique method.

Having a sound methodology in place can also help you with the following:

  • When another researcher at a later date wishes to try and replicate your research, they will need your explanations and guidelines.
  • In the event that you receive any criticism or questioning on the research you carried out at a later point, you will be able to refer back to it and succinctly explain the how and why of your approach.
  • It provides you with a plan to follow throughout your research. When you are drafting your methodology approach, you need to be sure that the method you are using is the right one for your goal. This will help you with both explaining and understanding your method.
  • It affords you the opportunity to document from the outset what you intend to achieve with your research, from start to finish.

A research instrument is a tool you will use to help you collect, measure and analyze the data you use as part of your research.

The choice of research instrument will usually be yours to make as the researcher and will be whichever best suits your methodology.

There are many different research instruments you can use in collecting data for your research.

Generally, they can be grouped as follows:

  • Interviews (either as a group or one-on-one). You can carry out interviews in many different ways. For example, your interview can be structured, semi-structured, or unstructured. The difference between them is how formal the set of questions is that is asked of the interviewee. In a group interview, you may choose to ask the interviewees to give you their opinions or perceptions on certain topics.
  • Surveys (online or in-person). In survey research, you are posing questions in which you ask for a response from the person taking the survey. You may wish to have either free-answer questions such as essay-style questions, or you may wish to use closed questions such as multiple choice. You may even wish to make the survey a mixture of both.
  • Focus Groups.  Similar to the group interview above, you may wish to ask a focus group to discuss a particular topic or opinion while you make a note of the answers given.
  • Observations.  This is a good research instrument to use if you are looking into human behaviors. Different ways of researching this include studying the spontaneous behavior of participants in their everyday life, or something more structured. A structured observation is research conducted at a set time and place where researchers observe behavior as planned and agreed upon with participants.

These are the most common ways of carrying out research, but it is really dependent on your needs as a researcher and what approach you think is best to take.

It is also possible to combine a number of research instruments if this is necessary and appropriate in answering your research problem.

There are three different types of methodologies, and they are distinguished by whether they focus on words, numbers, or both.

Data typeWhat is it?Methodology

Quantitative

This methodology focuses more on measuring and testing numerical data. What is the aim of quantitative research?

When using this form of research, your objective will usually be to confirm something.

Surveys, tests, existing databases.

For example, you may use this type of methodology if you are looking to test a set of hypotheses.

Qualitative

Qualitative research is a process of collecting and analyzing both words and textual data.

This form of research methodology is sometimes used where the aim and objective of the research are exploratory.

Observations, interviews, focus groups.

Exploratory research might be used where you are trying to understand human actions i.e. for a study in the sociology or psychology field.

Mixed-method

A mixed-method approach combines both of the above approaches.

The quantitative approach will provide you with some definitive facts and figures, whereas the qualitative methodology will provide your research with an interesting human aspect.

Where you can use a mixed method of research, this can produce some incredibly interesting results. This is due to testing in a way that provides data that is both proven to be exact while also being exploratory at the same time.

➡️ Want to learn more about the differences between qualitative and quantitative research, and how to use both methods? Check out our guide for that!

If you've done your due diligence, you'll have an idea of which methodology approach is best suited to your research.

It’s likely that you will have carried out considerable reading and homework before you reach this point and you may have taken inspiration from other similar studies that have yielded good results.

Still, it is important to consider different options before setting your research in stone. Exploring different options available will help you to explain why the choice you ultimately make is preferable to other methods.

If proving your research problem requires you to gather large volumes of numerical data to test hypotheses, a quantitative research method is likely to provide you with the most usable results.

If instead you’re looking to try and learn more about people, and their perception of events, your methodology is more exploratory in nature and would therefore probably be better served using a qualitative research methodology.

It helps to always bring things back to the question: what do I want to achieve with my research?

Once you have conducted your research, you need to analyze it. Here are some helpful guides for qualitative data analysis:

➡️  How to do a content analysis

➡️  How to do a thematic analysis

➡️  How to do a rhetorical analysis

Research methodology refers to the techniques used to find and analyze information for a study, ensuring that the results are valid, reliable and that they address the research objective.

Data can typically be organized into four different categories or methods: observational, experimental, simulation, and derived.

Writing a methodology section is a process of introducing your methods and instruments, discussing your analysis, providing more background information, addressing your research limitations, and more.

Your research methodology section will need a clear research question and proposed research approach. You'll need to add a background, introduce your research question, write your methodology and add the works you cited during your data collecting phase.

The research methodology section of your study will indicate how valid your findings are and how well-informed your paper is. It also assists future researchers planning to use the same methodology, who want to cite your study or replicate it.

Rhetorical analysis illustration

Research Methods: What are research methods?

  • What are research methods?
  • Searching specific databases

What are research methods

Research methods are the strategies, processes or techniques utilized in the collection of data or evidence for analysis in order to uncover new information or create better understanding of a topic.

There are different types of research methods which use different tools for data collection.

Types of research

  • Qualitative Research
  • Quantitative Research
  • Mixed Methods Research

Qualitative Research gathers data about lived experiences, emotions or behaviours, and the meanings individuals attach to them. It assists in enabling researchers to gain a better understanding of complex concepts, social interactions or cultural phenomena. This type of research is useful in the exploration of how or why things have occurred, interpreting events and describing actions.

Quantitative Research gathers numerical data which can be ranked, measured or categorised through statistical analysis. It assists with uncovering patterns or relationships, and for making generalisations. This type of research is useful for finding out how many, how much, how often, or to what extent.

Mixed Methods Research integrates both Q ualitative and Quantitative Research . It provides a holistic approach combining and analysing the statistical data with deeper contextualised insights. Using Mixed Methods also enables Triangulation,  or verification, of the data from two or more sources.

Finding Mixed Methods research in the Databases 

“mixed model*” OR “mixed design*” OR “multiple method*” OR multimethod* OR triangulat*

Data collection tools

Techniques or tools used for gathering research data include:

Qualitative Techniques or Tools Quantitative Techniques or Tools
: these can be structured, semi-structured or unstructured in-depth sessions with the researcher and a participant. Surveys or questionnaires: which ask the same questions to large numbers of participants or use Likert scales which measure opinions as numerical data.
: with several participants discussing a particular topic or a set of questions. Researchers can be facilitators or observers. Observation: which can either involve counting the number of times a specific phenomenon occurs, or the coding of observational data in order to translate it into numbers.
: On-site, in-context or role-play options. Document screening: sourcing numerical data from financial reports or counting word occurrences.
: Interrogation of correspondence (letters, diaries, emails etc) or reports. Experiments: testing hypotheses in laboratories, testing cause and effect relationships, through field experiments, or via quasi- or natural experiments.
: Remembrances or memories of experiences told to the researcher.  

SAGE research methods

  • SAGE research methods online This link opens in a new window Research methods tool to help researchers gather full-text resources, design research projects, understand a particular method and write up their research. Includes access to collections of video, business cases and eBooks,

Help and Information

Help and information

  • Next: Finding qualitative research >>
  • Last Updated: Apr 18, 2024 11:16 AM
  • URL: https://libguides.newcastle.edu.au/researchmethods

Pfeiffer Library

Research Methodologies

  • What are research designs?

What are research methodologies?

Quantitative research methodologies, qualitative research methodologies, mixed method methodologies, selecting a methodology.

  • What are research methods?
  • Additional Sources

According to Dawson (2019),a research methodology is the primary principle that will guide your research.  It becomes the general approach in conducting research on your topic and determines what research method you will use. A research methodology is different from a research method because research methods are the tools you use to gather your data (Dawson, 2019).  You must consider several issues when it comes to selecting the most appropriate methodology for your topic.  Issues might include research limitations and ethical dilemmas that might impact the quality of your research.  Descriptions of each type of methodology are included below.

Quantitative research methodologies are meant to create numeric statistics by using survey research to gather data (Dawson, 2019).  This approach tends to reach a larger amount of people in a shorter amount of time.  According to Labaree (2020), there are three parts that make up a quantitative research methodology:

  • Sample population
  • How you will collect your data (this is the research method)
  • How you will analyze your data

Once you decide on a methodology, you can consider the method to which you will apply your methodology.

Qualitative research methodologies examine the behaviors, opinions, and experiences of individuals through methods of examination (Dawson, 2019).  This type of approach typically requires less participants, but more time with each participant.  It gives research subjects the opportunity to provide their own opinion on a certain topic.

Examples of Qualitative Research Methodologies

  • Action research:  This is when the researcher works with a group of people to improve something in a certain environment.  It is a common approach for research in organizational management, community development, education, and agriculture (Dawson, 2019).
  • Ethnography:  The process of organizing and describing cultural behaviors (Dawson, 2019).  Researchers may immerse themselves into another culture to receive in "inside look" into the group they are studying.  It is often a time consuming process because the researcher will do this for a long period of time.  This can also be called "participant observation" (Dawson, 2019).
  • Feminist research:  The goal of this methodology is to study topics that have been dominated by male test subjects.  It aims to study females and compare the results to previous studies that used male participants (Dawson, 2019).
  • Grounded theory:  The process of developing a theory to describe a phenomenon strictly through the data results collected in a study.  It is different from other research methodologies where the researcher attempts to prove a hypothesis that they create before collecting data.  Popular research methods for this approach include focus groups and interviews (Dawson, 2019).

A mixed methodology allows you to implement the strengths of both qualitative and quantitative research methods.  In some cases, you may find that your research project would benefit from this.  This approach is beneficial because it allows each methodology to counteract the weaknesses of the other (Dawson, 2019).  You should consider this option carefully, as it can make your research complicated if not planned correctly.

What should you do to decide on a research methodology?  The most logical way to determine your methodology is to decide whether you plan on conducting qualitative or qualitative research.  You also have the option to implement a mixed methods approach.  Looking back on Dawson's (2019) five "W's" on the previous page , may help you with this process.  You should also look for key words that indicate a specific type of research methodology in your hypothesis or proposal.  Some words may lean more towards one methodology over another.

Quantitative Research Key Words

  • How satisfied

Qualitative Research Key Words

  • Experiences
  • Thoughts/Think
  • Relationship
  • << Previous: What are research designs?
  • Next: What are research methods? >>
  • Last Updated: Aug 2, 2022 2:36 PM
  • URL: https://library.tiffin.edu/researchmethodologies

Encyclopedia Britannica

  • Games & Quizzes
  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center
  • Introduction

Data collection

data analysis

data analysis

Our editors will review what you’ve submitted and determine whether to revise the article.

  • Academia - Data Analysis
  • U.S. Department of Health and Human Services - Office of Research Integrity - Data Analysis
  • Chemistry LibreTexts - Data Analysis
  • IBM - What is Exploratory Data Analysis?
  • Table Of Contents

data analysis

data analysis , the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data , generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making . Data analysis techniques are used to gain useful insights from datasets, which can then be used to make operational decisions or guide future research . With the rise of “Big Data,” the storage of vast quantities of data in large databases and data warehouses, there is increasing need to apply data analysis techniques to generate insights about volumes of data too large to be manipulated by instruments of low information-processing capacity.

Datasets are collections of information. Generally, data and datasets are themselves collected to help answer questions, make decisions, or otherwise inform reasoning. The rise of information technology has led to the generation of vast amounts of data of many kinds, such as text, pictures, videos, personal information, account data, and metadata, the last of which provide information about other data. It is common for apps and websites to collect data about how their products are used or about the people using their platforms. Consequently, there is vastly more data being collected today than at any other time in human history. A single business may track billions of interactions with millions of consumers at hundreds of locations with thousands of employees and any number of products. Analyzing that volume of data is generally only possible using specialized computational and statistical techniques.

The desire for businesses to make the best use of their data has led to the development of the field of business intelligence , which covers a variety of tools and techniques that allow businesses to perform data analysis on the information they collect.

For data to be analyzed, it must first be collected and stored. Raw data must be processed into a format that can be used for analysis and be cleaned so that errors and inconsistencies are minimized. Data can be stored in many ways, but one of the most useful is in a database . A database is a collection of interrelated data organized so that certain records (collections of data related to a single entity) can be retrieved on the basis of various criteria . The most familiar kind of database is the relational database , which stores data in tables with rows that represent records (tuples) and columns that represent fields (attributes). A query is a command that retrieves a subset of the information in the database according to certain criteria. A query may retrieve only records that meet certain criteria, or it may join fields from records across multiple tables by use of a common field.

Frequently, data from many sources is collected into large archives of data called data warehouses. The process of moving data from its original sources (such as databases) to a centralized location (generally a data warehouse) is called ETL (which stands for extract , transform , and load ).

  • The extraction step occurs when you identify and copy or export the desired data from its source, such as by running a database query to retrieve the desired records.
  • The transformation step is the process of cleaning the data so that they fit the analytical need for the data and the schema of the data warehouse. This may involve changing formats for certain fields, removing duplicate records, or renaming fields, among other processes.
  • Finally, the clean data are loaded into the data warehouse, where they may join vast amounts of historical data and data from other sources.

After data are effectively collected and cleaned, they can be analyzed with a variety of techniques. Analysis often begins with descriptive and exploratory data analysis. Descriptive data analysis uses statistics to organize and summarize data, making it easier to understand the broad qualities of the dataset. Exploratory data analysis looks for insights into the data that may arise from descriptions of distribution, central tendency, or variability for a single data field. Further relationships between data may become apparent by examining two fields together. Visualizations may be employed during analysis, such as histograms (graphs in which the length of a bar indicates a quantity) or stem-and-leaf plots (which divide data into buckets, or “stems,” with individual data points serving as “leaves” on the stem).

Data analysis frequently goes beyond descriptive analysis to predictive analysis, making predictions about the future using predictive modeling techniques. Predictive modeling uses machine learning , regression analysis methods (which mathematically calculate the relationship between an independent variable and a dependent variable), and classification techniques to identify trends and relationships among variables. Predictive analysis may involve data mining , which is the process of discovering interesting or useful patterns in large volumes of information. Data mining often involves cluster analysis , which tries to find natural groupings within data, and anomaly detection , which detects instances in data that are unusual and stand out from other patterns. It may also look for rules within datasets, strong relationships among variables in the data.

  • Open access
  • Published: 09 July 2024

Academic resilience in nusing students: a concept analysis

  • Yang Shen 1 ,
  • Hanbo Feng 1 &
  • Xiaohan Li 1  

BMC Nursing volume  23 , Article number:  466 ( 2024 ) Cite this article

106 Accesses

Metrics details

Academic resilience is a crucial concept for nursing students to cope with academic challenges. Currently, there is significant variation in the description of the concept attributes of academic resilience among nursing students, which impedes the advancement of academic research. Therefore, it is essential to establish a clear definition of the concept of academic resilience for nursing students.

The purpose of this paper is to report the results of concept analysis of academic resilience of nursing students.

The Rodgers evolutionary concept analysis was employed to test the attributes, antecedents, consequences and related concepts of academic resilience of nursing students. Walker and Avant’s method was utilized to construct a model case and provide empirical referents.

The findings indicate that the attributes of nursing students’ academic resilience include self-efficacy, self-regulation and recovery, and the antecedents include internal factors and external environmental factors. The consequences include adaptability, career maturity, adversity quotient level, probability of academic success, a sense of belonging to school and low levels of psychological distress.

The systematic understanding of academic resilience among nursing students provides a pathway for nursing educators and students to enhance academic resilience, promote academic success, and establish a foundation for the training of more qualified nurses.

Peer Review reports

Introduction

Nursing is a scientific discipline that studies nursing theory and technology in the process of health promotion and disease prevention, with the aim of equipping students with knowledge, attitudes, and skills essential to the nursing profession [ 1 ]. Compared with students in other majors, nursing students often experience great pressure from unfamiliar clinical environments, the gap between professional theory and practice, unexpected emergencies, strained relationship with patients and their families, exposure to infectious diseases, as well as heavy workloads [ 2 , 3 ]. The high levels of stress experienced by nursing students can negatively affect their overall well-being and academic performance [ 4 , 5 , 6 ], compromising their quality of life while increasing burnout rates [ 7 ]. Moreover, sustained pressure may impede critical thinking abilities, problem-solving skills, decision-making capabilities among nursing students thereby diminishing their motivation for learning and hindering academic achievement [ 4 , 6 , 8 ], ultimately failing to complete their professional learning. Students’ coping ability and academic success are closely related to their perception and handling of academic pressure. Therefore, it is particularly crucial to assist them in comprehending their academic pressure and difficulties and continuously enhancing nursing students’ psychological quality in order to cope with setbacks and achieve successful adaptation [ 9 , 10 ].

Resilience refers to experiencing significant traumatic events and difficulties and still achieving positive developmental outcomes [ 11 ]. It is the process of adapting to major stressors and negative factors, as well as the strength to recover from stressful life events or successfully cope with negative experiences [ 12 , 13 ]. There is evidence that resilience can effectively buffer the negative impact of individuals and contribute to improved individual happiness and life satisfaction [ 14 , 15 , 16 , 17 ]. It is essential for nursing students to possess a positive psychological quality in order to effectively cope with academic pressure [ 18 , 19 ]. As resilience research continues to expand and evolve, its scope, focus, and application have diversified across different fields [ 10 , 20 ].

Academic resilience is a relatively new concept. It is the manifestation of resilience in the field of education [ 3 ]. It can assist students in adapting to the demands of school and clinical settings, enabling them to overcome academic pressure [ 21 ]. Resilient students are able to maintain high academic achievement and perform well even when they are faced with the threat of failure or stressful situations [ 22 ]. Therefore, academic resilience is crucial for nursing students [ 23 , 24 , 25 ], and enhancing academic resilience may help students focus on their strengths and potentially reverse academic failure. Furthermore, it can also help more nursing students persist in their nursing education, thereby contributing much-needed nurses to the profession [ 3 ].

Academic resilience is a crucial concept for nursing students in managing academic challenges. However, there are significant differences in the depiction of the conceptual attributes of academic resilience among nursing students, which impedes the advancement of academic research [ 26 , 27 ]. It is necessary to clarify the attributes and connotations of the concept in order to establish a foundation for planning academic resilience intervention measures for nursing students. This will ultimately improve the quality of nursing education and better meet the needs of students. Currently, there is no conceptual analysis of the academic resilience of nursing students. Therefore, the aim of this study is to explore the concept of academic resilience among nursing students.

In this study, Rodgers [ 28 ] evolutionary concept analysis method was applied to explore and clarify the concept of academic resilience of nursing students. The main steps were to determine the concept evolution, application, definition, conceptual attributes, antecedents, consequences of nursing students’ academic resilience, distinguish related concepts of nursing students’ academic resilience. Walker and Avant’s [ 29 ] method was employed to construct a model case and provide empirical referents. The researchers initially read the articles included, focus on the context of the concept, surrogate and related terms, the attributes, antecedents, consequences, and examples, and then analyze the data collected.

Search strategy

Search PubMed, Scopus, Embase, CINAHL, PsycINFO, Web of Science, ProQuest, Science Direct, CNKI, Wanfang Database, VIP database. Full search strategy can be found in the supplemental material.

Selection criteria

Literature inclusion criteria: (1) The subjects were nursing students who were studying in schools or practicing in hospitals. (2) The research contents include the concept, defining characteristics, attributes, antecedents, influencing factors and consequences of academic resilience. Exclusion criteria: (1) The full text cannot be obtained. (1) Repeated reports. (2) Non-Chinese or non-English literature.

Literature screening

The publications were imported into EndNote 20.0 software for de-duplication, the articles was selected independently by two researchers according to inclusion and exclusion criteria. If there was any disagreement, the two researchers would discuss it or invite a third researcher to join the discussion, and the included articles were ultimately determined.

Search results

A total of 918 literature were obtained. Among them, there are 741 English papers and 177 Chinese papers. After screening and exclusion according to the selection criteria, 22 literature were finally included. See Fig.  1 for PRISMA Diagram.

The 22 articles included in this study were published between 2015 and 2023 and were conducted in 8 countries, including 7 in Chinese, 5 in South Korea, 3 in the United Kingdom, 3 in Iran, 1 in Turkey, 1 in Saudi Arabia, 1 in the Philippines, and 1 in Indonesia. Most studies were cross-sectional (70%), with quantitative studies (86%) ranging from 106 to 1339 nursing students and qualitative studies (9%) ranging from 13 to 19 nursing students.

figure 1

PRISMA diagram of literature search

Conceptual evolution

The development of academic resilience is based on resilience theory [ 30 ]. Due to researchers’ varying perspectives, several different concepts have emerged in the field of education after the introduction of the concept of resilience. Wang et al. [ 31 ]. first proposed this concept and defined it as follows: despite early traits, conditions and experiences creating an unfavorable environment, there is an increased probability of success in school. Based on previous literature, some researchers [ 32 ] have concluded that traditional academic resilience research has primarily focused on students who have suffered severe life experiences (such as changes in their living environment, illness, parental divorce, poverty, discrimination, etc.) during their early years. Despite enduring such adversity for an extended period of time, these students still managed to achieve good academic performance. The significance of academic resilience in recent risk factors (such as unsatisfactory academic performance, academic pressure, etc.) was ignored. Martin [ 22 ] et al. pointed out that academic pressure is a prevalent issue in the daily learning of any school-age student. Every student will encounter recent risk factors such as academic setbacks and pressure. Therefore, the research on academic resilience should shift from the previous minority and special groups to ordinary students. The academic resilience of this group refers to students’ ability to adjust in time after encountering academic setbacks or challenges (including poor academic performance, high levels of academic pressure, etc.) during their daily learning process, while still achieving good academic outcomes [ 33 ].

Application of concepts

The application of academic resilience in the nursing field basically adopted the definition of academic resilience provided by Martin et al. [ 33 ]. This focuses on the role of academic resilience in the proximal influencing factors (the academic pressure perceived by nursing students in their daily learning process). In the field of nursing, these proximal influencing factors generally include adapting to unfamiliar clinical practice environments and interpersonal relationships, feelings of helplessness caused by the gap between theory and practice [ 34 ], fear of infectious diseases, and heavy workloads [ 35 ]. Jeong et al. [ 36 ] believe that academic resilience can serve as a resource for nursing students to protect their mental health from the impact of stress. It is also the ability to attain high academic achievement and sustain strong motivation and enthusiasm for university life even under stressful conditions. Li et al. [ 10 ] argue that academic resilience is specific to the academic situation, referring to students’ ability to overcome setbacks and challenges in their schoolwork. Yang et al. [ 37 ] define academic resilience as the application of resilience within the field of education. In the context of education (usually referring to schools), students are able to effectively adapt to and manage various pressures encountered during the learning process, such as setbacks and failures. This crucial ability enables them to achieve positive learning outcomes. Wang et al. [ 38 ] propose that academic resilience refers to students’ ability to recover from stressful academic situations, enabling them to effectively navigate the challenges associated with the academic achievement process. It is defined as an individual’s capacity to confidently handle, adapt to, or manage important sources of interaction in a stressful learning environment and transform negative clinical learning experiences into positive evolving outcomes. Luo et al. [ 39 ] believe that academic resilience means that nursing students can leverage personal strengths and mobilize external resources when encountering academic difficulties, ultimately leading to a positive adaptation process and successful outcomes. Shi et al. [ 39 ] define academic resilience as the ability of students to successfully cope with typical academic setbacks in school daily learning activities, reflecting their perseverance and learning capability. According to Shahidi et al. [ 40 ], academic resilience is a crucial quality of nursing students, enabling them to overcome academic pressure and adapt to the demands of both learning and clinical practice.

In summary, after conducting in-depth research on academic resilience of nursing students, researchers have presented various viewpoints. However, an exact and comprehensive definition has not yet been established.

Definition of nursing student’s academic resilience

After conducting a thorough review and analysis of the literature, this study found that the academic resilience of nursing students is affected by a complex interaction of various factors, including internal factors of nursing students and external environmental factors. Through the analysis of the attributes, antecedents and consequences of nursing students’ academic resilience, the following definitions were obtained: when faced with academic challenges, nursing students maintain a belief in their capacity to overcome these obstacles. They proactively regulate their cognition and behavior, leverage their personal and external resources to adapt effectively to both school educational and clinical settings, ultimately achieving positive learning outcomes.

Attributes are a set of features or components of a concept. Clarifying the defining attributes of a concept helps to deepen the understanding of the concept, which is an important aspect of concept analysis [ 41 ]. Through literature analysis and induction, this study summarized three core attributes of nursing students’ academic resilience, including self-efficacy, self-regulation and recovery.

Self-efficacy.

Self-efficacy is confidence and belief [ 42 ], which refers to an individual’s belief in their ability to control outcomes, successfully complete tasks, and possess the necessary skills to accomplish those tasks [ 43 ]. It is one of the internal attributes and driving force of nursing students’ academic resilience. Improving the self-confidence of nursing students is the initial step of constructive learning, particularly when faced with academic challenges. Confidence enables students to seize learning opportunities, actively analyze and assess dilemmas they encounter, and positively impacts their ability to cope with academic pressure [ 38 , 44 ].

Self-regulation.

Self-regulation is one of the important elements of resilience [ 45 ], which refers to the regulation of thinking, emotion, behavior and attention through deliberate or specific mechanisms. This enables individuals to effectively manage their activities over time and environmental changes [ 46 ], as well as helping them to control emotions and actions under pressure. This attribute includes cognitive regulation and behavioral regulation. Cognitive regulation: When facing academic setbacks, nursing students can positively view setbacks by cognitive reconstruction, emotional adjustment, reconstructing negative thoughts and re-attribution [ 47 ], thereby enhancing their self-efficacy to overcome difficulties. Behavioral adjustment: In the face of academic pressure or challenges, individuals have to adjust their original coping strategies, assess their own advantages, mobilize available resources within their reach, re-integrate personal resources and social resources, make the most of all available resources to overcome academic difficulties [ 48 , 49 ].

Recovery is defined as the outcome of becoming well again after an illness or injury, according to the Oxford Dictionary. As one attribute of academic resilience, it refers to the outcome that nursing students rebound from adversity, and return to the right track or enter a new balance in this paper [ 38 , 44 ].

Antecedents

Antecedents refer to the events or situations that should exist prior to the development of a concept [ 41 ]. Based on Kumpfer’s psychological resilience model [ 49 ], a review of previous studies has identified two categories of antecedents for nursing students’ academic resilience: individual internal factors and external factors.

Studies have indicated that the academic resilience of nursing students is influenced by various internal factors. (1) social demographic factors: research has demonstrated that as age [ 40 ] and grade [ 39 , 50 , 51 ] levels increase, along with a higher educational background among nursing students [ 52 ], there is a corresponding enhancement in academic resilience. Additionally, studies have revealed that female nursing students exhibit greater academic resilience compared to male students [ 3 , 50 ], while rural students tend to demonstrate higher levels of academic resilience than urban students [ 52 ]. Moreover, it has been found that non-only-child students generally display higher academic resilience when compared to only-child students [ 53 ]. (2) Spiritual factors: Studies have shown that nursing students with an optimistic attitude [ 44 , 54 ], high levels of mindfulness [ 38 , 51 ], strong self-confidence [ 44 ], effective self-control ability [ 37 , 44 ], high learning engagement [ 55 ] and strong religious coping ability [ 56 ] are conducive to the development of academic resilience. Furthermore, nursing students who independently choose to study nursing as their major demonstrate higher academic resilience than those recommended by relatives and friends or those who are transferred into the major [ 52 ]. Additionally, nursing students with role models exhibit higher academic resilience than those without role models [ 3 ]. (3) Emotional factors: Research has indicated that higher levels of emotional intelligence [ 44 , 54 ] is linked to increased academic resilience among nursing students. In addition, when faced with academic setbacks, nursing students may employ different coping styles [ 36 ], which can also impact their academic resilience. (4) Cognitive factors: Literature has shown that nursing students with high levels of academic performance [ 3 , 36 , 40 , 57 ], scholarship [ 52 ], self-directed learning ability [ 54 ] and skill level [ 38 ] also have high academic resilience. (5) Physical factors: Nursing students with good subjective health status demonstrate higher academic resilience than those with poor subjective health status [ 54 ]. (6) Behavioral factors: Nursing students’ interpersonal communication ability [ 57 ] are also influential factors in their academic resilience.

The level of nursing students’ academic resilience is not only affected by internal factors of individuals, but also restricted by external factors. (1) school factors: Studies have indicated that high pressure in school life [ 54 ] and low professional satisfaction [ 3 , 36 , 54 ] are hindering factors for nursing students’ academic resilience. Moreover, a positive clinical work atmosphere [ 38 ], strong interpersonal relationships at school [ 3 ], encouragement and support from teachers [ 38 , 44 ], as well as support from patients and their families [ 38 ] and recognition [ 38 ] during clinical practicum were promoting factors of academic resilience. (2) Family factors: Literature has shown that high levels of family relationship satisfaction [ 3 ], effective family communication mode [ 57 ] and strong family support [ 38 ] contribute to the improvement of academic resilience. (3) Peer group factors: Peer support [ 38 , 58 ] and good peer’s interpersonal relationship [ 3 ] can also promote the enhancement of nursing students’ academic resilience.

Consequences

Consequences refer to the events or situations caused by the concept [ 41 ]. Studies [ 21 , 38 , 39 , 52 , 55 , 59 ] have indicated that academic resilience plays a significant role in enhancing nursing students’ adaptability during the school period, career maturity, adversity quotient level, probability of academic success and a sense of belonging to school. Additionally, it has been found to reduce the degree of psychological distress experienced by nursing students. “Career maturity” refers to the extent to which individuals are able to accomplish tasks that are coordinated with their stage of career development [ 52 ]. “Adversity quotient level” refers to an individual’s ability to cope with setbacks and adversity [ 39 ].

Related concepts

Related concepts refer to words that share commonalities with concepts but do not possess identical characteristics. Academic buoyancy refers to a student’s capacity to successfully cope with the academic setbacks and challenges typical of school life. Unlike academic resilience, academic buoyancy is more closely related to how a student handles the everyday stressors commonly experienced by students, such as tight deadlines, challenging assignments, exam stress, and unexpected or persistent low grades. However, resilience plays a role when students suffer from long-term poor performance, overwhelming anxiety that they cannot bear, truancy, and significant setbacks due to dissatisfaction [ 60 ]. Academic hardiness is defined as the ability to prepare oneself to confront academic problems. It comprises three components: control (the ability to manage different life situations), commitment (willingness to engage) and challenge (the ability to understand that changes in life are normal), emphasizing individuals’ endurance in the face of difficulties [ 61 , 62 ].

The concept analysis of model case is beneficial to better understand and identify the connotation of concepts [ 41 ]. Although Rodgers does not suggest the creation of a model case, in order to better understand the concept, we used Walker and Avant’s [ 29 ] method to construct a model case as follows.

Lucy, 22 years old, is a member of a family consisting of four members (father, mother and brother), with very harmonious family relations. As a junior student, Lucy has a passion for medicine and has taken the initiative to study nursing at university in pursuit of a bachelor’s degree. After three years of theoretical study, she entered a large hospital to begin clinical practice and complete the final stage of her undergraduate education. Lucy’s first internship took place in the emergency ward of a hospital. Faced with an unfamiliar clinical environment, heavy medical workloads, anxious patients and their families, as well as sudden situations (such as the rescue of patients and unexpected deaths, etc.), Lucy experienced significant pressure. Due to the gap between theoretical knowledge and clinical practice and inexperienced nursing skills, Lucy was afraid of providing nursing interventions for patients and making mistakes. Every day when she went to the hospital for her internship, Lucy felt extremely anxious. During conversations with her family members, Lucy confided in them about her troubles. Her family listened patiently to Lucy’s difficulties and offered positive affirmation and encouragement. They believed that Lucy would overcome the current difficulties and become a qualified nurse. With the encouragement of her family, Lucy recalled her initial decision to pursue a career in nursing voluntarily, aspiring to be someone who could alleviate patients’ suffering like Florence Nightingale. She then adjusted her mindset and earnestly contemplated how to confront and surmount the present difficulties with a positive outlook. She firmly believed that she could triumph over the current predicament through her diligent efforts. Lucy took the initiative to contact the senior students and inquire about their experiences in adapting to the clinical environment during their internship. Drawing on their experience, she utilized her solid theoretical knowledge from the initial three years of study for understanding new knowledge in clinical work. Throughout her internship, Lucy carefully observed the operations of her mentor, diligently recorded any challenges encountered, sought clarification when needed, and dedicated herself to repeated practice. The mentor also affirmed Lucy’s efforts and gave timely feedback to her operation, which significantly enhanced Lucy’s nursing skills and bolstered her confidence. In order to maintain a healthy physical state for clinical work, Lucy conscientiously balanced her study and rest time. Additionally, she incorporated mindfulness exercises into her routine to relax both body and mind while improving focus and reducing distress related to professional challenges. Through her efforts, Lucy earned recognition from patients and their families while experiencing a profound sense of accomplishment when performing patient care procedures. Finally, she successfully completed her internship tasks and obtained a bachelor’s degree in nursing.

Lucy’s case exemplifies the three attributes of nursing students’ academic resilience, namely self-efficacy, self-regulation and recovery. Upon entering the clinical practice stage, Lucy experienced significant pressure but maintained a belief in her ability to overcome challenges through diligent effort. She actively analyzed her current station and seized learning opportunities, reflecting the first attribute of academic resilience, self-efficacy. Through cognitive reconstruction and emotional adjustment, Lucy approached her difficulties with a positive attitude and effectively utilized her strengths to navigate the challenges she faced. For instance, she has established a solid foundation for theoretical learning, confided her difficulties with her family, sought assistance from her seniors and mentors, maintained good health, utilizing available internal and external resources around her. By regulating one’s own cognition and behavior to overcome academic challenges, demonstrating the second attribute of academic resilience, self-regulation. By actively analyzing the dilemma she faced, adjusting her own thoughts and behaviors, integrating internal and external resources, Lucy successfully rebounded from adversity. She adapted to the clinical pressure environment and ultimately completed the internship task. This reflects the third attribute recovery of academic resilience. The stress induced by clinical practice, Lucy’s optimistic attitude, high levels of mindfulness, independent choice of nursing major, role model, high levels of emotional intelligence, positive coping style, good academic achievement, healthy physical state, high self-directed learning ability, self-control ability, interpersonal communication ability, skill level, support from mentors, family members, patients and their family members, peers all contribute to the enhancement of Lucy’s academic resilience. The improvement of Lucy’s adaptability, career maturity, adversity quotient level and the reduction of her psychological distress during her clinical internship are the consequences of her academic resilience.

Empirical referents

In order to gain a better understanding of the concept, we used Walker and Avant’s [ 29 ] method to provide empirical referents. Currently, there are two primary tools for evaluating nursing students’ academic resilience. One is the Nursing Student Academic Resilience Inventory (NSARI) developed by Iranian scholar Ali-Abadi [ 27 ], which comprises a total of 24 items. The six dimensions include optimism (5 items), communication (4 items), self-esteem/evaluation (4 items), self-awareness (3 items), trustworthiness (4 items) and self-regulation (4 items). The scale employed a 5-point Likert scale ranging from completely agree (5 points) to completely disagree (1 point), with scores requiring conversion into standardized scores. Higher scores indicate higher levels of academic resilience. The intraclass correlation coefficient of the scale was 0.903, and the Cronbach’ α coefficient of the scale was 0.66–0.78, indicating good reliability and validity. This instrument was developed with undergraduate nursing students as the research subjects and specifically used to measure the academic resilience of nursing students. It has been applied to undergraduate nursing students in Iran [ 40 ]. Additionally, Turkish scholar Enes Ucan [ 63 ] et al. conducted cross-cultural adaptation of the tool.

The other is Academic Resilience Inventory for Nursing Students (ARINS) developed by Li Chengjie [ 10 ], a scholar from Taiwan, China. The scale contains a total of 15 items, including three dimensions: cognitive maturity (5 items), emotional regulation (3 items) and help-seeking resources (7 items). The 5-point Likert scale ranging from “very inconsistent” (1 point) to “very consistent” (5 points) was employed, with higher scores indicating higher levels of academic resilience. The Cronbach’s α coefficient of the total scale was 0.929. The Cronbach’s α coefficients of the three dimensions of cognitive maturity, emotional regulation and help-seeking resources were 0.926, 0.801 and 0.855, respectively. The theoretical model exhibited a good fit with the observation data. The total scale demonstrated excellent reliability and validity. Originally designed to measure the academic resilience of nursing students holding junior college degrees in Taiwan. Additionally, it has been applied in undergraduate nursing students in Chinese mainland [ 37 , 55 ] while showing good reliability and validity. However, further investigation is required to determine these tools’ applicability among nursing students at different educational levels.

Disscussion

The conceptual model of nursing students’ academic resilience.

Through a comprehensive analysis and synthesis of literature, this study has developed a conceptual model of nursing students’ academic resilience, as shown in Fig.  2 . In essence, the academic resilience of nursing students is shaped by the interplay between internal and external factors when they encounter setbacks or challenges in their academic pursuits. This interaction ultimately fosters the development of academic resilience among nursing students and yields positive outcomes. Nursing students with high levels of academic resilience are able to view academic setbacks positively, enhance their problem-solving skills, and effectively navigate through academic difficulties, thereby successfully completing their studies and go to the future nursing position [ 42 ].

figure 2

Model of nursing student’s academic resilience

The significance of the role of nurses for the nursing students’ academic resilience

The definition and attributes of nursing students’ academic resilience proposed in this study emphasize the characteristics of nurses’ working environment and learning environment of nursing students. Nurses play a crucial role as providers of healthcare services, taking on important medical tasks with increasingly complex professional responsibilities. They possess multifaceted roles as healthcare providers, strategists, administrators, educators, and coordinators [ 64 , 65 ]. This necessitates not only a solid foundation of professional knowledge and skills but also profound critical thinking and judgment abilities, astute observational and analytical capabilities, decisive decision-making prowess, and exceptional interpersonal communication aptitude [ 66 ]. These qualities enable them to effectively manage busy and high-pressure clinical tasks within an increasingly tense doctor-patient relationship, ultimately providing satisfactory medical services to patients and their families. As the reserve army of the nursing team, nursing students are expected to quickly acquire these knowledge and abilities during their theoretical learning and clinical practice, thereby imposing significant academic pressure on them [ 67 ]. Simultaneously, it is these qualities that distinguish nursing from other specialties and healthcare disciplines. It is hoped that the results of this concept analysis will inspire further research into effective measures aimed at enhancing the academic resilience of nursing students, in order to improve their ability to cope with academic setbacks and increase retention rates of nurses.

Implications and recommendations

Currently, there are relatively few studies on the academic resilience of nursing students, and there are significant discrepancies in the definition of conceptual attributes among nursing students’ academic resilience in the existing literature. This study addresses this gap by providing a comprehensive conceptualization of academic resilience of nursing students by combing the related concepts, attributes, antecedence and consequences of nursing students’ academic resilience. The findings offer a theoretical foundation for future research.

Academic resilience plays a crucial role in regulating individuals’ pressure levels, enhancing their problem-solving abilities, and supporting nursing students in successfully completing their academic studies and transitioning into future nursing positions [ 42 ]. This study provides a conceptual framework for nursing schools and hospitals to design academic resilience training curricula and programs for nursing students. It is essential to thoroughly investigate these relationships based on the nursing context through appropriate measures. This process can clearly define attributes, antecedence, and consequences of nursing students’ academic resilience. Nursing educators can begin by focusing on the antecedence and attributes of nursing students’ academic resilience. This can be achieved through targeted training programs or specialized courses aimed at improving emotional intelligence, interpersonal communication skills, self-directed learning capabilities, self-control, self-regulation, and self-efficacy among nursing students. In addition, nursing educators can also consider external environmental factors. This includes creating a conducive learning and working atmosphere for nursing students, providing training for professional teachers and clinical instructors, setting a professional example for nursing students, offering timely guidance, feedback and encouragement to support students’ learning. Moreover, it’s important to guide nursing students to think positively, helping them build self-confidence, and enhance their academic confidence. Additionally, peer support programs can be designed. Existing research [ 68 ] suggests that interaction and practice among nursing students with similar knowledge, skills, and experience can promote the effective learning. The experience of other students in internship and studies can help build confidence and resilience in academic research while reducing concerns about academic issues and clinical practice. It is helpful for nursing students to improve their self-esteem and confidence in academic success which can enhance their resilience.

This study has discovered that academic resilience of nursing students is associated with several key outcomes, including academic success, career maturity, adversity quotient and school sense of belonging. Additionally, it has been found to have a positive impact on reducing psychological distress among nursing students. As a result, nursing educators can consider targeting academic resilience as an intervention strategy in order to indirectly improve these outcomes for students.

Limitations

Due to language limitations, only English and Chinese literature were included in this study. Although the relevant literature was searched as comprehensively and scientifically as possible, grey literature sources were not included. In addition, due to limitations of some database resources, 14% of the literature was not obtained in full text. Nevertheless, this study explored the concept, attributes, antecedents and consequences of nursing students’ academic resilience through the obtained studies and achieved the research purpose.

This study applied Rodgers’ evolutionary concept analysis method to clarify the conceptual attributes, antecedents, consequences and related concepts of academic resilience among nursing students, and combined with model case to help understand the concept. The clear definition of nursing students’ academic resilience is beneficial for both nursing educators and nursing students themselves. It allows them more accurately assess the level of academic resilience and explore effective intervention measures to improve the level of academic resilience and increase the probability of academic success of nursing students.

Data availability

All data generated or analyzed during this study is included in this published article.

Li XH, Shang SM. The fundamentals of nursing. Beijing: People’s Med Press,7 ed,2022.

Ahmed W, Mohammed B. Nursing students’ stress and coping strategies during clinical training in KSA. J TAIBAH UNIV MED SC. 2019;14(2):116–22.

Google Scholar  

Hwang E, Shin S. Characteristics of nursing students with high levels of academic resilience: a cross-sectional study. NURS EDUC TODAY. 2018;71:54–9.

Article   Google Scholar  

Spurr S, Walker K, Squires V, Redl N. Examining nursing students’ wellness and resilience: an exploratory study. NURSE EDUC PRACT. 2021;51:102978.

Article   PubMed   Google Scholar  

Alatawi A, Gonzales AG, Alatawi Y, Albalawi M, Alatawi NAS, Hermas NB, Alshehri A. Shikah, Alanzi: stress and coping strategies of the nursing students in the University of Tabuk, Saudi Arabia. Int J Nurs Health Care Res 2020.

McCarthy B, Trace A, O Donovan M, Brady-Nevin C, Murphy M, O’Shea M, O’Regan P. Nursing and midwifery students’ stress and coping during their undergraduate education programmes: an integrative review. NURS EDUC TODAY. 2018;61:197–209.

Gurková E, Zeleníková R. Nursing students’ perceived stress, coping strategies, health and supervisory approaches in clinical practice: a Slovak and Czech perspective. NURS EDUC TODAY. 2018;65:4–10.

Shin S, Hwang E. The effects of clinical practice stress and resilience on nursing students’ academic burnout. Korean Med Educ Rev. 2020;22:115–21.

Leslie K, Brown K, Aiken J. Perceived academic-related sources of stress among graduate nursing students in a Jamaican University. NURSE EDUC PRACT. 2021;53:103088.

Li CC, Wei CF, Tung YY. [Development and validation of the academic resilience inventory for nursing students in Taiwan]. Hu Li Za Zhi. 2017;64(5):30–40.

PubMed   Google Scholar  

Wright MO, Masten AS. Resilience processes in development: fostering positive adaptation in the context of Adversity. New York, NY, US: Kluwer Academic/Plenum; 2005. pp. 17–37.

Walsh P, Owen PA, Mustafa N, Beech R. Learning and teaching approaches promoting resilience in student nurses: an integrated review of the literature. NURSE EDUC PRACT. 2020;45:102748.

Park S, Choi M, Kim S. Validation of the resilience scale for nurses (RSN). ARCH PSYCHIAT NURS. 2019;33(4):434–9.

Cheng C, Chua JH, Cheng LJ, Ang W, Lau Y. Global prevalence of resilience in health care professionals: a systematic review, meta-analysis and meta-regression. J NURS MANAGE. 2022;30(3):795–816.

Li ZS, Hasson F. Resilience, stress, and psychological well-being in nursing students: a systematic review. NURS EDUC TODAY. 2020;90:104440.

Zhao F, Guo Y, Suhonen R, Leino-Kilpi H. Subjective well-being and its association with peer caring and resilience among nursing vs medical students: a questionnaire study. NURS EDUC TODAY. 2016;37:108–13.

Serçe Ö, Çelik İS, Özkul B, Partlak GN. The predictive role of nursing students’ individual characteristics and psychological resilience in psychological distress. PERSPECT PSYCHIATR C. 2021;57(4):1656–63.

Xiaoyun C, Yu Z, Hao L, Yongmei LU. Psychological elasticity level of graduate nursing students and its influencing factors. Nurs J Chin People’s Liberation Army. 2019;36(12):21–4.

Stephens TM. Nursing student resilience: a concept clarification. NURS FORUM. 2013;48(2):125–33.

Tamannaeifar M, Shahmirzaei S. Prediction of academic resilience based on coping styles and personality traits. Pract Clin Psychol 2019:1–10.

Moon WH, Kwon MJ, Chung KS. Influence of academic resilience, self-efficacy and depression on college life adjustment in Korea’s nursing college students. Indian J Sci Technol 2015, 8(19).

Martin AJ, Marsh HW. Academic resilience and its psychological and educational correlates: a construct validity approach. PSYCHOL SCHOOLS. 2006;43(3):267–81.

Romano L, Angelini G, Consiglio P, Fiorilli C. Academic Resilience and Engagement in High School students: the mediating role of Perceived Teacher Emotional support. Eur J Invest Health Psychol Educ. 2021;11(2):334–44.

Simon C. The academic resilience scale (ARS-30): a New Multidimensional Construct measure. FRONT PSYCHOL 2016, 7(1781).

Diffley DM, Duddle M. Fostering resilience in nursing students in the academic setting: a systematic review. J Nurs Educ. 2022;61 5:229–36.

Kotera Y, Cockerill V, Chircop J, Kaluzeviciute G, Dyson S. Predicting self-compassion in UK nursing students: relationships with resilience, engagement, motivation, and mental wellbeing. NURSE EDUC PRACT. 2021;51:102989.

Ali-Abadi T, Ebadi A, Sharif NH, Soleimani M, Ghods AA. Development and psychometric properties of the nursing Student Academic Resilience Inventory (NSARI): a mixed-method study. PLoS ONE. 2021;16(6):e252473.

Tofthagen R, Fagerstrøm LM. Rodgers’ evolutionary concept analysis–a valid method for developing knowledge in nursing science. SCAND J CARING SCI. 2010;24(Suppl 1):21–31.

Walker LO, Avant KC. Concept Analysis. In: Walker LO, Avant KC, editors. Strategies for theory construction in nursing. 5th ed. London: Pearson; 2014.

Zhao J, Li LY. Preliminary development of academic resilience scale for college students. Journal of Beijing Institute of Technology. J Beijing Inst Technol (Social Sci Edition). 2009;11(1):94–8.

Wang MC, Haertel GD, Walberg HJ. Educational resilience in inner cities. Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc; 1994. pp. 45–72.

Han YQ. Study on the relationship between academic resilience and academic achievement in high school students. Master dissertation. Shanxi University of Finance and Economics . 2016.

Martin AJ, Marsh HW. Academic buoyancy: towards an understanding of students’ everyday academic resilience. J SCHOOL PSYCHOL. 2008;46(1):53–83.

Whang S. The relationship between clinical stress, Self-Efficacy, and self-esteem of nursing College Students. J Korean Acad Soc Nurs Educ. 2006;12:205–13.

Farzaneh P, Naiemeh S, Mehrnoosh I, Hamid H. Relationship between Perceived stress with resilience among undergraduate nursing students. J Hayat. 2013;19:41–52.

Son HJ, Lee KE, Kim NS. Affecting factors on academic resilience of nursing students. Int J u- e-Service Sci Technol. 2015;8(11):231–40.

Li WD, Hu PX, Yang MF. Study on the correlation between academic resilience and self-control of nursing students. Chin J School Doctor 2019, 33(3).

Wang L, Lin C, Han C, Huang Y, Hsiao P, Chen L. Undergraduate nursing student academic resilience during medical surgical clinical practicum: a constructivist analysis of Taiwanese experience. J PROF NURS. 2021;37(3):521–8.

Luo SM, Li XY, Zhang J, Qiu YT, Li ST. Study on the correlation between academic resilience and inverse quotient level of undergraduate nursing students. Chin Gen Pract Nurs. 2022;20(12):1703–6.

Shahidi DE, Nobahar M, Raiesdana N, Yarahmadi S, Saberian M. Academic resilience, moral perfectionism, and self-compassion among undergraduate nursing students: a cross-sectional, multi-center study. J PROF NURS. 2023;46:39–44.

Rodgers BL. Concepts, analysis and the development of nursing knowledge: the evolutionary cycle. J ADV NURS. 1989;14(4):330–5.

Article   CAS   PubMed   Google Scholar  

Hughes V, Cologer S, Swoboda S, Rushton C. Strengthening internal resources to promote resilience among prelicensure nursing students. J PROF NURS. 2021;37(4):777–83.

Bandura A, Adams NE. Analysis of self-efficacy theory of behavioral change. Cogn THER RES. 1977;1(4):287–310.

Lees C, Keane P, Porritt B, Cleary JP. Exploring nursing students’ understanding and experiences of academic resilience. A qualitative study. TEACH LEARN NURS. 2023;18(2):276–80.

Sullivan PD, Bissett K, Cooper M, Dearholt SL, Mammen K, Parks J, Pulia K. Grace under fire: Surviving and thriving in nursing by cultivating resilience.; 2012.

Karoly P. MECHANISMS OF SELF-REGULATION: A VIEW SYSTEMS. In:1993; 1993.

He FX, Turnbull B, Kirshbaum MN, Phillips B, Klainin-Yobas P. Assessing stress, protective factors and psychological well-being among undergraduate nursing students. NURS EDUC TODAY. 2018;68:4–12.

Richardson GE. The metatheory of resilience and resiliency. J CLIN PSYCHOL. 2002;58(3):307–21.

Kumpfer KL. Factors and processes contributing to resilience: the resilience framework. Dordrecht, Netherlands: Kluwer Academic; 1999. pp. 179–224.

Berdida DJE, Grande RAN. Quality of life and academic resilience of Filipino nursing students during the COVID-19 pandemic: a cross-sectional study. Int J Nurs Educ Scholarsh 2021, 18(1).

P MAE: Resilience and mindfulness in nurse training on an undergraduate curriculum. PERSPECT PSYCHIATR C 2020, 57(3).

Li MY, Li H, Cao YD, Jin WJ. Study on the correlation between academic resilience and vocational maturity of nursing interns in private universities. Health Vocat Educ. 2021;39(15):113–5.

CAS   Google Scholar  

Li M, Yuan YM. Investigation on academic resilience of secondary vocational nursing students. Shandong Youth 2020(5):13, 16.

Hwang EH, Kim KH. Relationship between optimism, emotional intelligence, and academic resilience of nursing students: the mediating effect of self-directed learning competency. FRONT PUBLIC HEALTH. 2023;11:1182689.

Article   PubMed   PubMed Central   Google Scholar  

Wang Y, Shi K, Shen X, Zhao JH, Zhang T. Study on the chain mediating effects of professional identity and academic resilience. J Qiqihar Med Univ. 2022;43(15):1482–7.

Sajodin N, Wilandika A, Atikah A. The Relationship between Religious Coping and academic resilience in nursing students. Med J Malay. 2023;78(4):500–2.

Seo K, Kwon M. Study on the effects of interpersonal-communication competence and family communication patterns on academic resilience. Indian J Sci Technol 2016, 9(40).

Edwards M, Williams E, Akerman K. Promoting academic resilience through peer support in a new pre-registration nursing programme. Br J Nurs. 2022;31(22):1144–8.

Hasimi L, Ahmadi M, Hovyzian SA, Ahmadi A. Sense of coherence or resilience as predictors of psychological distress in nursing students during the COVID-19 pandemic. FRONT PUBLIC HEALTH. 2023;11:1233298.

Martin AJ, Marsh HW. Academic resilience and academic buoyancy: multidimensional and hierarchical conceptual framing of causes, correlates and cognate constructs. Oxf REV EDUC. 2009;35(3):353–70.

Biggs AT, Seech TR, Johnston SL, Russell DW. Psychological endurance: how grit, resilience, and related factors contribute to sustained effort despite adversity. J GEN PSYCHOL 2023:1–43.

Ashoori J. Investigation the role of social capital, social support and psychological hardiness in predicting life quality of elderly women.; 2016.

Ucan E, Avci D. Turkish adaptation of the nursing student academic resilience inventory: a validity and reliability study. NURS EDUC TODAY. 2023;126:105810.

Jiang DM, He LX. The role orientation of nurses in basic nursing. Chin Nurs Manage. 2010;10(3):13.

Hewitt SL, Mills JE, Hoare KJ, Sheridan NF. The process of nurses’ role negotiation in general practice: A grounded theory study. J ADV NURS 2023.

Hongye T, Youwei L, Zhenzhen S, Youqing P. Effects of competency-based training model in training for new nurses: a meta-analysis. Chin J Mod Nurs. 2019;25(12):1572–7.

Cheng W, Young P, Luk K. Moderating role of coping style on the relationship between stress and Psychological Well-being in Hong Kong nursing students. INT J ENV RES PUB HE 2022, 19(18).

Liang H, Wu K, Hung C, Wang Y, Peng N. Resilience enhancement among student nurses during clinical practices: a participatory action research study. NURS EDUC TODAY. 2019;75:22–7.

Download references

Acknowledgements

We would like to thank the lovely doctoral buddies for their support and help.

This paper received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

School of Nursing, China Medical University, 77 Puhe Road, Shenbei New District, Shenyang, China

Yang Shen, Hanbo Feng & Xiaohan Li

You can also search for this author in PubMed   Google Scholar

Contributions

Yang Shen: Methodology, Data curation; Formal analysis; Investigation; Methodology; Resources; Writing - original draft; Writing - review & editing. Hanbo Feng: Investigation, Data curation, Writing - review & editing. Xiaohan Li: Methodology, Supervision, Resources; Review & editing.

Corresponding author

Correspondence to Xiaohan Li .

Ethics declarations

Ethics approval.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Shen, Y., Feng, H. & Li, X. Academic resilience in nusing students: a concept analysis. BMC Nurs 23 , 466 (2024). https://doi.org/10.1186/s12912-024-02133-2

Download citation

Received : 31 January 2024

Accepted : 28 June 2024

Published : 09 July 2024

DOI : https://doi.org/10.1186/s12912-024-02133-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Nursing students
  • Academic resilience
  • Concept analysis
  • Nursing education

BMC Nursing

ISSN: 1472-6955

research methodology analysis definition

  • Privacy Policy

Research Method

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Data Analysis

Definition:

Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

  • Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
  • SQL : A programming language used to manage and manipulate relational databases.
  • R : An open-source programming language and software environment for statistical computing and graphics.
  • Python : A general-purpose programming language that is widely used in data analysis and machine learning.
  • Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
  • SAS : A statistical analysis software used for data management, analysis, and reporting.
  • SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
  • Matlab : A numerical computing software that is widely used in scientific research and engineering.
  • RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

  • Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
  • Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
  • Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
  • Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
  • Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
  • Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
  • Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
  • Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

  • Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
  • Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
  • Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
  • Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
  • Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
  • Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
  • Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

  • To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
  • To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
  • To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
  • To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
  • To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

  • Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
  • Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
  • Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
  • Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
  • Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
  • Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
  • Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

  • Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
  • Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
  • Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
  • Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
  • Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
  • Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
  • Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
  • Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

  • Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
  • Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
  • Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
  • Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
  • Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
  • Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
  • Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
  • Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

  • Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
  • Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
  • Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
  • Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
  • Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
  • Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
  • Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Conceptual Framework

Conceptual Framework – Types, Methodology and...

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Appendices

Appendices – Writing Guide, Types and Examples

References in Research

References in Research – Types, Examples and...

Purpose of Research

Purpose of Research – Objectives and Applications

  • Open access
  • Published: 13 July 2024

A compartmental model for smoking dynamics in Italy: a pipeline for inference, validation, and forecasting under hypothetical scenarios

  • Alessio Lachi 1 , 2 ,
  • Cecilia Viscardi 1 , 3 ,
  • Giulia Cereda 1 , 3 ,
  • Giulia Carreras 4 &
  • Michela Baccini 1 , 3  

BMC Medical Research Methodology volume  24 , Article number:  148 ( 2024 ) Cite this article

96 Accesses

Metrics details

We propose a compartmental model for investigating smoking dynamics in an Italian region (Tuscany). Calibrating the model on local data from 1993 to 2019, we estimate the probabilities of starting and quitting smoking and the probability of smoking relapse. Then, we forecast the evolution of smoking prevalence until 2043 and assess the impact on mortality in terms of attributable deaths. We introduce elements of novelty with respect to previous studies in this field, including a formal definition of the equations governing the model dynamics and a flexible modelling of smoking probabilities based on cubic regression splines. We estimate model parameters by defining a two-step procedure and quantify the sampling variability via a parametric bootstrap. We propose the implementation of cross-validation on a rolling basis and variance-based Global Sensitivity Analysis to check the robustness of the results and support our findings. Our results suggest a decrease in smoking prevalence among males and stability among females, over the next two decades. We estimate that, in 2023, 18% of deaths among males and 8% among females are due to smoking. We test the use of the model in assessing the impact on smoking prevalence and mortality of different tobacco control policies, including the tobacco-free generation ban recently introduced in New Zealand.

Peer Review reports

Smoking is a significant risk factor for many common chronic diseases, including cancer, cardiovascular, cerebrovascular and respiratory diseases, diabetes, and a leading preventable cause of premature death [ 1 , 2 ]. Also, smoking reduces length and quality of life [ 3 ], and contributes to health inequities [ 4 ]. The Global Burden of Disease study [ 5 ] reports that in 2019 smoking was responsible for around 8,709,000 deaths in the World (15.4% of all deaths), 907,000 in Europe, and 96,000 in Italy.

The importance of Tobacco Control Policies (TCP) has been firmly established within the World Health Organization’s (WHO) Framework Convention on Tobacco Control (FCTC), an international treaty that came into force in 2005 and has been ratified by 182 countries. Specifically, tobacco control has been included as one of the global development goals, recognized as crucial and necessary to achieve a one-third reduction in premature mortality by 2030 [ 6 ].

Focusing on Italy, data from the Italian surveillance system PASSI (Progressi delle Aziende Sanitarie per la Salute in Italia) highlighted that in 2021 23.7% of Italians (27.2% in men and 20.2% in women) described themselves as current smokers [ 7 ]. Among adolescents, smoking prevalence stalled in the last years, with a prevalence of current smokers between 27.3% and 32.4% in young people aged 13-16 years [ 8 , 9 ].

Dynamic simulation models are widely used to describe and project the evolution of smoking habits in the population over time and to estimate the impact of past and hypothetical future TCPs. Since the 2000s, several models have been proposed [ 10 , 11 , 12 , 13 ], some of which developed within the Cancer Intervention and Surveillance Modelling Network (CISNET), a consortium of investigators funded by the National Cancer Institute, that uses mathematical modelling to study the impact of cancer control interventions [ 14 ]. These models are mainly of two types: compartmental models and agent-based models. Compartmental models, starting from a baseline year, perform macro-simulations so that the population evolves through deaths, births and changes in smoking habits [ 10 , 11 , 12 ]. The SimSmoke model [ 15 ] is the most used compartmental model [ 16 ], implemented for a wide number of countries including Italy [ 17 , 18 , 19 , 20 ]. Agent-based models, also called micro-simulation models, simulate individual life trajectories and interactions with a view to assessing their effects on the system as a whole [ 21 , 22 ].

In this paper, grounding on previous works [ 12 , 23 , 24 , 25 ], we developed a compartmental model that describes the evolution of smoking habits in Tuscany, a region of Central Italy, from 1993 to 2019, and forecasts them until 2043. The model assumes that at each point in time, the population is divided into non-overlapping groups called compartments, defined according to smoking status (never, current, and former smokers), age and sex [ 26 ]. Transitions between compartments are described by simple probabilistic rules and the evolution of the size of the compartments is governed by a system of differential equations.

While some of the transition parameters in the model were assumed as fixed, we estimated via a two-step calibration the age-specific probabilities of starting and quitting smoking, modelled in a flexible way through cubic regression splines [ 27 ], the probability of relapsing smoking, modelled as a nonlinear function of the time from quitting [ 28 ], and the mortality rate. We calibrated the model on the observed prevalence of never, current, and former smokers for the years from 1993 to 2019, arising from yearly local surveys.

Once we estimated the transition parameters, we predicted the prevalence of never, current, and former smokers in the regional population over time, and quantified the impact of smoking in terms of the number of smoking-attributable deaths (SAD) and population attributable fraction (PAF). With simple examples, we also illustrated the use of the compartmental model to predict the future impact of hypothetical interventions that act on the probabilities of starting and quitting smoking.

Compared to previous studies that dealt with the same problem, we aimed at presenting some methodological advances both in the modelling and estimation strategies. First of all, grounding on a formal definition of the model equations, we addressed the problem of accounting for sampling variability and provided confidence intervals for the estimates of the parameters and compartment sizes. To this end, due to the unavailability of the likelihood function associated with the model, we relied on a parametric bootstrap procedure [ 29 , 30 ]. Also, we introduced a flexible modelling of the probabilities of starting and stopping smoking, usually assumed as constant, allowing them to change over time as functions of age. Moreover, we assessed the predictive performance of the model using cross-validation on a rolling basis. Finally, we assessed parameter identifiability through Global Sensitivity Analysis (GSA) [ 31 ].

The analyses relied on data from heterogeneous sources. We used data from the National Institute of Statistics (ISTAT) Multipurpose Surveys “Aspect of Daily Life" (AVQ) ( www.istat.it/it/archivio/91926 ), which every year collect fundamental information related to the daily life of individuals and families in Italy, enrolling about 25,000 families distributed in about 800 Italian municipalities of different population sizes. Specifically, we obtained from the ISTAT AVQ surveys an estimate of the distribution by smoking habit (never, current, and former smokers) of the population residing in Tuscany for each year from 1993 to 2019, separately for males and females and by age class (14-17, 18-19, 20-24, 25-34, 35-44, 45-54, 55-59, 60-64, 65-74, 75+). We obtained from the same surveys the smoking intensity distribution for current smokers, by sex and age class.

We used data from the ISTAT Multipurpose Surveys European Health Survey (EHIS) ( www.istat.it/it/archivio/167485 ), a survey on the main aspects of public health carried out every 5 years from 1980 in all member states of the European Union, to obtain an estimate of the smoking intensity distribution among former smokers, as well as information about time since smoking cessation, separately for males and females and by the same age classes reported above. In particular, we considered the surveys for 1994, 1999, 2004, and 2013.

We obtained the size of the Tuscany population on January 1st 1993 and January 1st 2005, by age and sex, from the ISTAT website ( www.istat.it ). From the same website, we got the mortality rates by age and sex and the number of new births in Tuscany for the period 1993-2019. The relative risks (RRs) of death for smokers and ex-smokers versus never smokers are those reported in the Appendix of [ 32 ].

Model specification

We specified a compartmental model for smoking habit dynamics in the population, which we call Smoking Habits Compartmental (SHC) model. In order to better present the SHC model adopted for the analysis, we first introduce a simpler version of it, and then proceed step by step, adding elements of complexity.

The starting model assumes that at each time the alive population is divided into the following non-overlapping compartments: never ( N ), current ( C ), and former ( F ) smokers. We consider only cigarette smoking. Never smokers can become current smokers, current smokers can become former smokers, and former smokers may restart smoking (smoking relapse). The compartments C and F are further divided into sub-compartments denoted by \(C_i\) and \(F_i\) , where \(i\in \{l,m,h\}\) indicates the level of smoking intensity, corresponding to low (<10 cigarettes/day), medium ( \(\ge\) 10 and <20 cigarettes/day), and high ( \(\ge\) 20 cigarettes/day) smoking intensity, respectively. During their life, individuals can change their smoking status, but, for the sake of simplicity, we assume that they cannot change their level of smoking intensity. The model admits deaths and new births. From each compartment, subjects can transit to a deceased compartment denoted by the letter D and a subscript corresponding to the compartment of origin. New births ( \(\nu (t)\) is the number of new births at time t ) increase the size of the compartment N . Transitions of the individuals from a given compartment to another one determine flows regulated by the transition parameters, among which the rates of starting smoking ( \(\gamma _i^*\) ), stopping smoking ( \(\epsilon _i^*\) ), and relapsing into smoking after having stopped ( \(\eta _i^*\) ). Note that these rates can depend on the level of smoking intensity i . Death happens with different rates for never ( \(\delta ^*_N\) ), current ( \(\delta ^*_{C_i}\) ), and former ( \(\delta ^*_{F_i}\) ) smokers. For current and former smokers, the mortality rates may depend also on smoking intensity. This compartmental model, graphically represented in Fig.  1 , is expressed by the following system of differential equations for each \(i\in \{l,m,h\}\) :

where \(\gamma ^*=\gamma ^*_l+\gamma ^*_m+\gamma ^*_h\) is the overall transition rate from the status of never smoker to the status of current smoker. The initial conditions of the system, i.e. the sizes of the compartments at time 0, set to the 1 \(^{st}\) of January 1993, are \(N(0)=n_0\) , \(D_N(0)=0\) , \(F_i(0)=f^i_0\) , \(C_i(0)=c^i_0\) and \(D_{C_i}(0)=D_{F_i}(0)=0\) \(\forall i \in \{l,m,h\}\) , where \(n_0\) is the number of never smokers in the considered population on the first day of the study period, and \(f^i_0\) and \(c^i_0\) are the number of ex-smokers and current smokers with smoking intensity i .

figure 1

Smoking Habits Compartmental model in its simplest form

For computational reasons, it is convenient to discretise the system of differential equations in Eq. ( 1 ), assuming that the size of the compartments is constant during 1-year time steps. Hereafter, t will denote discrete time, with the year as a time-unit ( \(t\in \{1,...,T\}\) ), and we replace the system in Eq. ( 1 ) with a system of difference equations, where the annual probability of stopping smoking ( \(\epsilon _i\) ), and the annual probabilities of smoking relapse ( \(\eta _i\) ) are derived from the corresponding rates in Eq. ( 1 ), as well as the annual probabilities of death for never ( \(\delta _{N}\) ), current ( \(\delta _{C_i}\) ), and former ( \(\delta _{F_i}\) ) smokers. In particular, \(\delta _N=1-\exp (-\delta ^*_{N})\) , and \(\epsilon _i=1-\exp (-\epsilon _i^*)\) , \(\eta _i=1-\exp (-\eta _i^*)\) , \(\delta _{C_i}=1-\exp (-\delta ^*_{C_i})\) , \(\delta _{F_i}=1-\exp (-\delta ^*_{F_i})\) with \(i \in \{l,m,h\}\) . Regarding the probabilities of starting smoking for never smokers, the overall annual probability \(\gamma\) comes from the corresponding rate, \(\gamma =1-\exp (-\gamma ^*)\) , while \(\gamma _i=\pi _{C_i}\gamma\) , where \(\varvec{\pi }=(\pi _{C_l}, \pi _{C_m}, \pi _{C_h})\) is the distribution of the level of smoking intensity among the new current smokers. Notice that, if \(\lambda\) is the rate of occurrence of an event, the probability of experiencing at least one event in the time unit is \(1-\exp (-\lambda )\) . The resulting system of discretised equations for each \(i\in \{l,m,h\}\) and \(t\in \{1,..., T\}\) is:

where \(\nu (t)\) denotes the newborns in the year t . The initial conditions of the system coincide with those of the previous model in Eq. ( 1 ).

The SHC model extends the system in Eq. ( 2 ) to account for two additional discrete time axes: age and time since smoking cessation. The final model is a compartmental model with separate compartments for each discrete age ( a ), where also a stratification by years since smoking cessation ( c ) is introduced for former smokers. Two separate SHC models are specified by sex. The final SHC model is defined by the following system of equations for each \(i\in \{l,m,h\}\) and \(t\in \{1,..., T\}\) :

The initial conditions of the system are obtained by generalizing those of the model in Eq. ( 2 ), to take into account the stratification by age for current smokers, and the stratification by age and time since cessation for former smokers.

For simplicity, \(\nu (t)\) was assumed to be constant over time. The age a takes values from 0 to 100. We set \(\gamma (a)\) to 0 until 13 and from 35 years of age, and, in order to account for the possible non-linearity between 14 and 34, we modelled the logit transformation of \(\gamma (a)\) through a natural cubic regression spline of age, with 2 equidistant internal knots. Similarly, we set \(\epsilon (a)\) to 0 until 19 years of age; we introduced a natural cubic regression spline with 2 equidistant internal knots to model non-linearity for \(a\ge 20\) . The resulting functions are the following:

where \(\varvec{\psi }=(\psi _0,\psi _1,\psi _2,\psi _3)\) and \(\varvec{\phi }=(\phi _0,\phi _1,\phi _2,\phi _3)\) are vectors of unknown parameters governing the probabilities of starting and quitting smoking, respectively. The relapsing rate, \(\eta ^*(c)\) , was modelled as a negative exponential function of the time since cessation, with parameters \(\varvec{\omega }=(\omega _0,\omega _1\) ):

where \(\omega _0\) governs the lifetime probability of no relapse and \(\omega _1\) tunes how fast the rate of smoking relapse declines with the time from cessation [ 12 , 23 , 28 , 33 ]. Both \(\omega _0\) and \(\omega _1\) are assumed to be positive so that \(\eta ^*(c)\) is a positive, decreasing function of c . The assumptions on which the SHC model is based are summarized in Section Model assumptions , Supplemental Material.

Estimation strategy

An important issue in compartmental models concerns parameter identifiability [ 30 ]. Complex models with many compartments, such as the model in Eq. ( 3 ), have many parameters governing the admitted transitions, but unfortunately observed data are often insufficient to estimate all of them. To overcome this problem we fixed some of the parameters to values from the literature or external data, leaving as unknown the mortality risks and the spline coefficients \(\varvec{\phi }\) and \(\varvec{\psi }\) , and \(\varvec{\omega }\) . Regarding the initial size of the compartments, it was obtained by combining the population size at the beginning of the study period with estimated prevalences arising from the ISTAT AVQ and EHIS surveys. Details on the values assigned to the fixed parameters and the initial size of the compartments are provided in Section S2, Supplemental Material. The unknown parameters have been estimated following the two step-procedure described in the next section.

Two-step estimation

In order to estimate the unknown parameters, we adopted a two-step procedure. Both steps use as observed data the prevalence of never, current and former smokers from ISTAT AVQ, here denoted by \(p^{obs}(t;a^*)=\left( p^{obs}_C(t;a^*),p^{obs}_N(t;a^*),p^{obs}_F(t;a^*)\right)\) , where t denotes the year and \(a^*\) the age class. In particular, we considered years from 1993 to 2019 and age classes \(a^*\in \{14-17, 18-19, 20-24, 25-34, 35-44, 45-54, 55-59, 60-64, 65-74, 75+\}\) . According to the ISTAT AVQ survey, current smokers are defined as individuals who reported being smokers at the time of the interview, while former smokers as those who reported having quitted.

First step.

We estimated the age-specific risks of mortality for never smokers \(\delta _N(a)\) using the prevalence values, as well as relative risks coming from the literature. In particular, the age-specific risks of dying for current and former smokers in the population at time t are respectively \(\delta _C(t;a)=RR_{C}\times \delta _{N}(t;a)\) and \(\delta _F(t;a)=RR_{F}\times \delta _{N}(t;a)\) , with \(RR_{C}\) and \(RR_{F}\) the relative risks of dying for current smokers and former smokers versus never smokers. Let \(p(t;a)=\left( p_N(t;a), p_C(t;a), p_F(t;a)\right)\) be the distribution of never, current and former smokers in the population. The overall mortality at age a in the year t , \(\delta _{pop}(t;a)\) , is a weighted average of \(\delta _N(t;a)\) , \(\delta _C(t;a)\) , and \(\delta _F(t;a)\) with weights p ( t ;  a ). Thus, \(\delta _N(t;a)\) can be derived as the ratio:

Therefore, separately for each year t in the period 1993-2019, we obtained an estimate of \(\delta _N(t;a)\) , plugging into Eq. ( 4 ) the mortality risk at age a reported for Tuscany, the relative risks for current and former smokers versus never smokers [ 32 ], and the observed age-specific prevalence of never, current and former smokers \(p^{obs}(t;a^*)\) . Finally, we averaged the year-specific \(\hat{\delta }_N(t;a)\) over t , obtaining the overall estimate \(\hat{\delta }_N(a)\) . The risks of dying for current and former smokers by i and c were then derived as:

Second step.

After fixing the mortality risks to the values computed at the first step, \(\hat{\varvec{\delta }}(a,c)\) , we calibrated the model on the observed prevalence \(p^{obs}(t;a^*)\) to estimate the vector of parameters which were still unknown, \(\varvec{\theta }=(\varvec{\psi },\varvec{\phi },\varvec{\omega })\) . Let \(p(t;a^*,\varvec{\theta })=\left( p_{C}(t;a^*,\varvec{\theta }),p_{N}(t;a^*,\varvec{\theta }),p_{F}(t;a^*,\varvec{\theta })\right)\) be the vector of the prevalence of never, current and former smokers belonging to the class of age \(a^*\) at time t , calculated on the population predicted by the model in Eq. ( 3 ), given a specific value of \(\varvec{\theta }\) . With calibration, we searched for the value of \(\varvec{\theta }\) that leads to predicted prevalences as close as possible to the observed ones. To compare observed and simulated trajectories, we considered the following objective function, where \(H(\cdot ,\cdot )\) denotes the Hellinger distance [ 34 ] between two discrete probability distributions:

where \(A^*\) is the number of age classes \(a^*\) . We minimized the objective function in Eq. ( 5 ) over \(\varvec{\theta }\) via a global optimization procedure, resorting to the JULIA package Optim.jl [ 35 ]. It is well-known that, in the context of compartmental models, optimization results often depend on the chosen starting points of the algorithm [ 36 , 37 ]. To avoid the problem of getting stuck in local minima, we performed several optimizations using different starting points, then we selected the solution that brought to the minimum Hellinger distance [ 30 , 36 ]. The two-step procedure was performed separately by sex, obtaining different estimates for males and females and sex-specific evolution of the compartment sizes. We estimated the compartment sizes up to 2043 by projecting the model dynamics, assuming that parameters and model structure do not change after 2019.

Parametric bootstrap procedure

We quantified the sampling variability around point estimates and projections by using a parametric bootstrap procedure [ 29 , 30 ]. Let \(\hat{\varvec{\theta }}\) be the vector of parameters minimizing the objective function in Eq. ( 5 ) and \(p(t;a^*,\hat{\varvec{\theta }})\) the corresponding estimated vector of prevalence for never, current and former smokers of age \(a^*\) in the population at time t . Let \(n(t;a^*)\) be the number of subjects belonging to the age class \(a^*\) , enrolled in the ISTAT AVQ in the year t in Tuscany (i.e. the denominator of the observed prevalence \(p^{obs}(t;a^*)\) ). The bootstrap procedure consisted of the following steps:

for each \(a^*\) and t , we sampled a vector of prevalence from a Dirichlet distribution:

we considered the collection of these sampled vectors as the observed values and performed the two-step estimation, computing the vector \(\varvec{\delta }^b(a,c)\) and finding \(\varvec{\theta }^b\) that minimized the objective function;

we repeated the previous two steps \(B=1000\) times, collecting a sample of B bootstrap estimates of \(\varvec{\delta }(a,c)\) and \(\varvec{\theta }\) to be used to estimate as many curves describing the transition parameters and compartment size trajectories;

we calculated the \(90\%\) confidence intervals for the quantities of interest as the 5 \(^{th}\) and 95 \(^{th}\) percentiles of the bootstrap estimates; pointwise confidence intervals were calculated for the curves.

Model validation

In order to evaluate the predictive performance of the estimation procedure described in Estimation strategy section, we applied cross-validation (CV) on a rolling basis. We started defining the first 3 years of the period 1993-2019 as the training set, and the subsequent q years as the test set. Then, we calibrated the compartmental model in Eq. ( 3 ) on the training set and used the estimated model to forecast the prevalence of never, current and former smokers in the years belonging to the test time window. The discrepancy between observed and projected prevalence was evaluated in terms of absolute percentage error. Then, we progressively extended the length of the training set by adding one year at a time, and we obtained the projections for the q subsequent years every time. We stopped when the last training set considered the years between 1993 to 2019- q . We finally computed the Mean Absolute Percentage Error (MAPE), by averaging the absolute percentage errors across different types of smokers over time, age classes, and training sets. Note that in general, for a set of n observations, MAPE is defined as \(\frac{100}{n}\sum \nolimits _{i=1}^{n}{\frac{|O_i-E_i|}{O_i}}\) , where \(O_i\) is the observed value and \(E_i\) is the expected one for unit i . We calculated the MAPE for different forecasting horizons by setting \(q=3,6,9,12\) years.

Sensitivity analysis

A key assumption of our model is that the dynamics of the studied phenomenon, particularly the transition probabilities between compartments, remain constant from 1993 to 2019 (and continue to do so until 2043). To verify its appropriateness, we conducted two separate analyses, first calibrating the model on the period 1993-2004 and then on the period 2005-2019, and compared the results. Notice that in the analysis 2005-2019 the initial sizes of the compartments were set to values obtained from 2005 surveys (see Section Details on the fixed parameters , Supplemental Material).

Another crucial point concerns the fact that the inference results could be affected by the model parameters assumed as fixed. To address this issue, we utilized a variance-based approach to Global Sensitivity Analysis (GSA) [ 31 ]. Given \(K_X\) mutually independent inputs \((X_1, X_2,..., X_{K_X})\) and a model which, given the inputs, returns \(K_Y\) outputs \((Y_1, Y_2,..., Y_{K_Y})\) , this approach quantifies the relative importance of each input to the model’s outcomes by propagating uncertainty from the inputs to the outputs and computing variance indices. In our application, given the model in Eq. ( 3 ), we considered as inputs all the parameters, both fixed and unknown, and the Hellinger distance in Eq. ( 5 ) as the output Y . Note that, considering the Hellinger distance as the output, we directly measure the influence of the inputs on the discrepancy between observed and predicted data, thus, ultimately, on the inference results. Then we calculated, for each input \(X_i\) , the so-called total variance index, which  \(S^{tot}_i\) measures the overall effect of the i -th input on the output Y , including all the interactions of \(X_i\) with the other inputs. This index corresponds to the expected variance of Y that would be left on average when all the parameters but \(X_i\) , \(X_{\sim i}\) , are fixed:

A total variance index close to zero indicates that the parameter \(X_i\) does not influence Y , and therefore, the inference results. Conversely, a large total variance index indicates that the parameter does have an impact on them. In the former case, the parameter can be fixed without affecting our estimates, or in other words, our model and data do not provide information on this parameter. The computation of \(S^{tot}_i\) relies on Monte Carlo simulations [ 38 ]. We simulated \(K=10,000\) different combinations of the model inputs, then, for each of them, we predicted the prevalence values via the model in Eq. ( 3 ) and calculated the corresponding Hellinger distance. Specifically, we draw the model parameters from the distributions reported in Table S3.1, Supplemental Material, adopting a quasi-random numbers sampling which provides a more efficient exploration of the sample space [ 39 , 40 ]. On the basis of the simulated Hellinger distance and the combination of the parameters, we computed the total variance indices as described in [ 38 ]. It is worth noting that in the GSA we did not include the age-specific mortality rates, \(\delta _{pop}(t;a)\) , among the model inputs. It is reasonable, as done elsewhere [ 23 ], to treat these parameters as not affected by uncertainty, given that they were estimated based on the entire population.

Health impact assessment

The impact of smoking on population health was quantified in terms of attributable deaths. We calculated the Smoking-Attributable Deaths (SADs) in the year t as the difference between the number of deaths occurring in that year under the actual scenario, i.e. the number of deaths predicted by the model in Eq. ( 3 ) given \(\hat{\varvec{\theta }}\) and \(\hat{\varvec{\delta }}(a)\) , and the deaths we would observe under a specific counterfactual condition. We considered the counterfactual condition where current smokers and former smokers in the year t were never smokers. Therefore, for each age a , we applied to the size of the compartments of smokers or ex-smokers the excess risk relative to never-smokers. The excess risk is defined as the difference between risks. For example, the excess risk of current smokers of age a and smoking intensity i relative to never-smokers is \(\delta _{C_i}(a,c)-\delta _N(a)\) . Thus, for the year t , the number of SADs among people of age a was calculated as:

The age-specific \(\text {SAD}(t;a)\) can be summed over a to obtain the total number of attributable deaths in population or in a certain class of age: \(\text {SAD}(t)=\sum \limits _a\text {SAD}(t;a)\) . The impact of smoking on population health can be expressed also in terms of Population Attributable Fraction (PAF), defined as the proportion of deaths that would be avoided if all current and former smokers in the population or in a subset of it were never smokers [ 41 ]. For details, see Section Population Attributable Fraction computation , Supplemental Material. We calculated SADs and PAFs over the period 1993-2043, separately by sex and for the ages 35+ and 65+.

Impact of future hypothetical policies

In order to illustrate the use of the compartmental model to assess the impact of hypothetical TCPs on SAD, we focused on three policies acting on the rates of starting and stopping smoking, \(\gamma ^*(a)\) and \(\epsilon ^*(a)\) . We assumed that all the defined policies are implemented in 2023 and that, in the absence of policies, the smoking habit dynamics would not change.

Taking inspiration from [ 42 ], and from a recent policy introduced in New Zealand ( www.bbc.com/news/world-asia-63954862 ) we defined the following hypothetical TCPs starting from 2023:

TCP1, a policy able to reduce the rate of starting smoking by 25% in 10 years for subjects between 14 and 34 years of age; for simplicity, we assumed a linear decrease, starting with a decrease of 2.5% the first year, a decrease of 5% the second one and so on, up to a final decrease of 25% after 10 years;

TCP2, a policy able to increase the rate of stopping smoking by 25% in 10 years for subjects between 25 and 100 years of age; for simplicity, we assumed a linear growth, starting with an increase of 2.5% the first year, an increase of 5% the second one and so on, up to a final increase of 25% after 10 years;

TCP3, a policy that imposes a complete smoking ban on cohorts born since 2009.

For each policy, we calculated the evolution of smoking prevalence and the number of avoided deaths expected from its implementation, taking the scenario without policies as a reference (TCP0). To better appreciate the impact of policies in terms of SAD, limited to this analysis, we extended the projections up to 2063.

The Tuscany population in 1993 counted 1,697,495 million males and 1,824,090 million females, and the proportions of never, current, and former smokers estimated from the ISTAT AVQ survey were respectively 35%, 34%, 31% for males and 67%, 20%, 13% for females.

Figure  2 , Panel (a) and (b) show, separately for males and females, the estimates of the parameters left unknown in the SHC model in Eq. ( 3 ), with their 90 \(\%\) confidence intervals (CI), as obtained from the two-step estimation procedure and bootstrap. In particular, Panel (b) compares the estimated risk of death for never smokers with the one in the general population. It is worth noting that while the two risks are similar for females (the mortality among never-smokers is \(8\%\) lower than among the general population), a not negligible difference is observed for males ( \(25\%\) lower) as noted also in [ 43 ].

figure 2

Results of the two-step estimation procedure for males in blue and females in red, with their bootstrap \(90\%\) confidence intervals: parameters tuning the probabilities of starting ( \(\varvec{\psi }\) ) and stopping smoking ( \(\varvec{\phi }\) ), and the probability of smoking relapse ( \(\varvec{\omega }\) ) ( a ), age-specific mortality for never smokers and for the general population ( b ), probabilities of starting ( \(\gamma (a)\) ), and stopping smoking ( \(\epsilon (a)\) ) and probability of smoking relapse ( \(\eta (c)\) ) ( c ), observed and predicted prevalence for never ( N ), current ( C ) and former ( F ) smokers ( d ), Population Attributable Fraction (PAF) and Smoking Attributable Deaths (SAD) for people over the age of 35 ( e ) and 65 ( f )

Figure  2 , Panel (c) shows the estimates of the probabilities of starting and quitting smoking and the probability of smoking relapse, derived from the estimated coefficients in Panel (a). Table 1 reports some summaries of the curves. Males are more likely to start and quit smoking than females. In particular, the probability of starting smoking has a peak around 19.9 years of age for males and 19.1 for females, with a maximum of just over 9% for males and just over 6% for females. The mean age of initiation is 20.7 for males and 20.5 for females. The probability of stopping smoking increases after 50 years of age, reaching a maximum of 29.5% for males and a maximum of 24.0% for females. The probability of smoking relapse is affected by large sampling variability. However, our results seem to indicate that it is about 80% after 1 year, then declines to about 40% after two years and progressively becomes negligible after 3 years (Fig.  2 ). On average, former smokers relapse into smoking after 1.7–1.5 years, for males and females respectively (Table 1 ).

Panel (d) shows the estimated prevalence of never, current, and former smokers among those over 14 years old from 1993 to 2043, predicted through the SHC model, together with the observed data used to calibrate the model (blue and red dots respectively for males and females with their 90% CI). The model fit appears to be adequate, with the predicted values close to the observed ones. Our forecasts, starting from 2020, suggest that the smoking prevalence will decrease in the coming years. Panels (e) and (f) show the predicted SAD and PAF over the period 1993-2043, separately for males and females, calculated for the population over 35 years of age and for the population over 65 years of age. The impact on males is higher than on females both in absolute and relative terms. However, while a clear reduction of the attributable deaths is expected in the coming years for males, for females they slightly decline only after having reached a maximum around 2030 [ 44 ]. Note that the majority of attributable deaths in the population over 35 are due to deaths in individuals over 65, as shown by the similarity of the curves.

Tables 2 and 3 report the percentages of never, current, and former smokers, the SAD and PAF, estimated every 10 years from 1993 to 2043, with their 90% confidence intervals. As an example, we estimated that in Tuscany in 2023 smoking was responsible for 4,070 (90% CI:3,795-4,247) deaths among men over 35 years old (one death per 745 people over the age of 35) and 1,976 (90% CI:1,741-2,407) deaths among women in the same age class (one death per 1655 people over the age of 35), corresponding to a PAF of 18% and 8%, respectively. Most of the attributable burden, however, was on people older than 65 (3,497 SAD for men and 1,765 for women).

Regarding the CV procedure, the average values of MAPE for different prediction horizons are lower than 30% (Table 4 ), indicating that the predictive performance of the model is adequate, even if not optimal [ 45 ]. The MAPE is lower for the model on the male population than for the model on the female one.

Figure  3 reports the results of the two separate calibrations of the SHC model, one on the prevalence data from 1993 to 2004 and one on the prevalence data from 2005 to 2019. The confidence bands are wider in the second period of calibration than in the first one. For males, there is evidence of a downward shift of age corresponding to the maximum probability of starting smoking. For females, calibrating the model in the first years brought a lower projection of the prevalence of never smokers, which likely reflects a change over time in the smoking habits among women. Apart from these differences, the two calibrations provided qualitatively similar results. For numerical details see Tables S5.1-S5.6, Supplemental Material and Figures S5.1 and S5.2, Supplemental Material.

figure 3

Results of the two-step estimation procedure for males in blue and females in red, by period of calibration (from 1993 to 2004 in a light colour and from 2005 to 2019 in a dark colour): probabilities of starting ( \(\gamma (a)\) ) and stopping smoking ( \(\epsilon (a)\) ), and probability of smoking relapse ( \(\eta (c)\) ), with 90% confidence bands, ( a ) and ( c ); prevalence of never ( N ), current ( C ) and former ( F ) smokers, with 90% confidence bands, ( b ) and ( d )

The total variance indices derived from the GSA (Table 5 ) reveal that the primary factor contributing to the variability of the Hellinger distance is the probability of starting smoking and its interaction with the other model inputs, resulting in \(S_i^{tot}\) values of 0.58 for males and 0.76 for females. This is followed by the probability of quitting smoking, with values of 0.36 for males and 0.21 for females, and by the probability of smoking relapse, with values of 0.15 for males and 0.09 for females. Conversely, the parameters assumed to be fixed have a negligible impact on the Hellinger distance, with total variance indices very close to 0. This latter result indicates that fixing the aforementioned parameters to specific values does not significantly affect the calibration results, and consequently the prevalence estimates, demonstrating their robustness against variations in \(\varvec{\pi }\) , \(\nu\) , and RRs specifications.

Figure  4 compares the evolution of smoking habits in the male and female populations under three alternative scenarios that simulate hypothetical tobacco control policies. These scenarios are compared with the status quo, corresponding to the absence of actions to reduce tobacco consumption (TCP0). We assumed that the TCPs are applied since 2023. They have no substantial effect on the prevalence of never, former, and current smokers during the 10 years following their implementation. TCP3 has the largest impact: in 2043 it is expected to increase by 12 percentage points the prevalence of never-smokers among males and by 8 among females, compared with TCP0 (see Table 6 ).

figure 4

Estimated prevalence of never ( N ), current ( C ) and former ( F ) smokers under different tobacco control policies (TCP) with \(90\%\) confidence bands, for males ( a ) and females ( b )

In order to better appreciate the impact of the TCPs on mortality, we extended the forecasting horizon up to 2063. Table 7 reports the predicted number of attributable deaths every 10 years, from 2023 to 2063, for both males and females under different TCPs, for the classes of age 35+ and 65+. TCP2, which increases the probability of stopping smoking, is the policy that most impact mortality in both classes of age. TCP3, which bans access to smoking to the new generations, despite its effectiveness in reducing current smokers, does not reduce SADs within the time window considered. Indeed, this policy is expected to have a longer-term impact, which is not visible before 2063. Additional Tables and Figures are reported in Section Additional results , Supplemental Material.

Interesting findings emerged from our analysis. We found that the probability of starting smoking reaches its maximum, just over 9% for males and just over \(6\%\) for females, between 19 and 20 years of age. Considering that younger people have a large probability of becoming stable smokers [ 46 ], these probabilities are quite worrying. The difference in the mean age of initiation between males and females is lower than one year, confirming what is reported for high-income countries [ 47 ]. Regarding the probability of stopping smoking, we found that it increases after 50 years of age and has a maximum of 29.5% for males and 24.0% for females, even if the confidence bands around these curves are quite wide. The 80% of ex-smokers relapse into smoking after 1 year, in line with the results of the Italian surveillance system PASSI for the years 2020-2021 ( www.epicentro.iss.it/passi/dati/SmettereFumo ). On average, former smokers relapse into smoking during the second year from cessation (after 1.7 and 1.5 years for males and females, respectively).

According to our model, in 2023 in Tuscany, 23% of men smoke, while 35% are ex-smokers. These percentages are lower among women: 16% smoke and 24% are ex-smokers. The prevalence of smokers estimated by our model is lower than the one reported in the PASSI survey for the period 2020-2021 (26.1% and 20.5% in the age class 18-69 for males and females, respectively), but consistent if we consider that our estimates are calculated on all population, while PASSI focuses on the age class 18-69 ( www.epicentro.iss.it/passi/pdf2020/Scheda-fumo-PASSI-regione-2016-2019.pdf ).

We estimated that, in 2023, 18% of deaths among males and 8% among females are due to smoking, corresponding to 4,070 and 1,976 deaths, respectively. These PAFs are in line with those estimated by the Global Burden of Disease Study for Italy in 2019 ( https://vizhub.healthdata.org/gbd-results/ ), 20.5% (CI: 19.5-21.7) in males and 8.17% (CI: 7.51-9.02) in females, slightly lower than those reported for Italy by the Tobacco Atlas initiative ( https://tobaccoatlas.org/challenges/deaths/ ), and overall coherent with previous results for Italy and Tuscany ([ 48 ]; www.deathsfromsmoking.net ).

As shown by the cross-validation results, the model produces quite reliable predictions of prevalence. Thus, subject to the assumption that all mechanisms underlying smoking dynamics and demographic evolution do not change in the future, we projected the dynamics. For the next two decades, we estimated an evident decrease in the prevalence of current smokers for males, due to an increase in the percentage of never-smokers. For females, substantial stability is expected. Similar considerations apply to PAFs: a decrease is observed for males and stability for females. These results confirm that Italy is in the fourth stage of the tobacco epidemic model, characterized by a continuing slow decline of smoking prevalence in both men and women with converging rates between sex [ 49 , 50 ]. The robustness of the long-term predictions to events that could change the described dynamics over time could be assessed by implementing a specific GSA procedure. However, in this work, we used the GSA only to assess the sensitivity of the inference results to changes in the parameters treated as fixed.

The proposed model can be used for assessing the impact of alternative TCPs. For illustrative purposes, we considered the impact of three policies aimed at reducing smoking in the population. The first two policies are completely hypothetical and defined in terms of their effect on the probability of starting (TCP1) and stopping (TCP2) smoking. They are not real policies but represent the intentions of the legislator to change the rates of smoking initiation and cessation. The third one (TCP3), which bans smoking in new cohorts since 2009, is inspired by the tobacco-free generation real intervention implemented in New Zealand as part of a plan for the tobacco endgame, including also additional strategies aimed at decreasing the affordability and availability of smoking, reducing the levels of nicotine in tobacco products, and restricting sales to designated tobacco outlets. We evaluated the expected marginal impact that this tobacco-free generation intervention would have in Tuscany, assuming complete compliance of new generations to the smoking ban. The results indicate that under TCP1 and TCP2 the prevalence of current smokers is reduced by a few percentage points either for women or men. On the contrary, TCP3 produces a clear increase in never-smokers, thus a reduction in smoking prevalence, which is expected to decrease in ten years by 9 and 6 percentage points among males and females, respectively. The impact on mortality of the three policies, in particular TCP1 and TCP3, that act by increasing the number of never smokers, can be appreciated only by extending the time horizon of forecasting. Interventions able to increase the probability of stopping smoking, like TCP2, are expected to produce the largest reduction of SADs in the medium term, especially among the over-65s. However, this kind of policy does not contribute to reducing smoking among the youngest, thus effectively stopping the tobacco epidemic.

From a methodological point of view, we introduced several elements of novelty. First of all, we provided a formal definition of the equations that describe the system dynamics and made explicit assumptions on the distribution of the involved random variables. We also introduced cubic regression splines for modelling in a flexible way the probabilities of starting and quitting smoking as functions of age, thus obtaining more realistic trajectories. Furthermore, we included in the model dependencies from the smoking intensity, which may allow assessing the impact of personalized TCPs specific for heavy or moderate smokers, such as lung cancer screening, use of pharmacological treatment, or smoking cessation campaigns.

Regarding the inference on the unknown parameters, we proposed a two-step estimation strategy to estimate the curves describing the probability of starting and stopping smoking and the probability of smoking relapse, as well as the mortality risk among never, current and former smokers. At the second step of the estimation procedure, we defined the calibration objective function in terms of a Hellinger distance between observed and predicted prevalence, instead of the widely used sum of squares function. The use of this discrepancy measure is relatively new in this framework and allowed handling a bounded loss function, defined in [0, 1], that is simple to minimise and to be interpreted. Finally, we provided confidence intervals/bands for the parameters/curves of interest. To the best of our knowledge, this is the first time that quantification of sampling variability is performed in this field. To this aim, we resorted to a parametric bootstrap procedure defined by adapting to our framework a method proposed for compartmental models describing infectious dynamics [ 30 ]. The assumed Dirichlet distribution enabled us to model the prevalence values complying with the constraint that their sum equals one. Furthermore, specifying appropriate values for the concentration parameters of the Dirichlet distributions, we were able to quantify the sampling variability accounting for the sample size of the surveys from which we derived the observed prevalence used in calibration. The estimation procedure has also limitations. We estimated the parameters in a deterministic way, in the sense that we considered the distributional assumptions on the prevalence only in the bootstrap procedure but not in the calibration phase. While likelihood-based approaches are unfeasible in this framework, likelihood-free inference methods such as Approximate Bayesian Computation algorithms would allow a full uncertainty quantification [ 25 ].

The reliability of the model’s results depends on several factors. First of all, it depends on the quality of the data used for calibration. In our case, we used data from yearly surveys conducted according to well-established methodology on reasonably large sample sizes. Secondly, it depends on the values of the fixed parameters, and we demonstrated, through the GSA, that the inference was robust to variations of the fixed parameters within plausible ranges of values. Lastly, the reliability of the results depends on the structural assumptions on which the model is based, not assessed via GSA. Underneath, we qualitatively review the main assumptions of the model and discuss the limitations that may arise from them. With respect to demographic dynamics, we assumed that the population was close to immigration and emigration and that the number of new births did not vary during the study period, effectively feeding the model with identical cohorts of subjects each year. For more realistic modelling, we could use the observed yearly number of births to create the new cohorts up to 2019. However, in light of the GSA, we expect that the impact of this choice on the results has not been significant. We also assumed that the age-specific mortality rates did not vary over the study period.

Regarding smoking dynamics, we assumed that the probabilities of starting and stopping smoking were functions of age and that the probability of smoking relapse was a function of time since cessation, but not of age. We did not allow any of these probabilities to vary over time. By defining the transition probabilities in this way, we have made a clear choice about which time axes were most important in our opinion to capture appropriately the smoking dynamics in the population. This choice is not without problems because in some cases there is evidence suggesting otherwise. For example, a decreasing trend in the probability of starting smoking has been reported for both males and females in Europe [ 51 ], while evidence of a dependence between age and risk of smoking relapse has been found in the US population [ 52 ]. However, if introducing multiple time-axes dependence in the transition probabilities could lead to more realistic results, this would be at the price of further complicating the model by introducing new unknown parameters to be estimated. We partially explored the goodness of the assumption of no calendar time dependence through a simple sensitivity analysis, which confirmed that the probabilities of starting and stopping smoking, and the probability of smoking relapse were quite similar when two separate calibrations were performed on the periods 1993-2004 and 2005-2019. It is worth stressing that, even if these two periods correspond to before and after the introduction of the so-called Sirchia law that banned smoking in all indoor public places in Italy, it was not our goal to speculate about the causal effect of this intervention on smoking dynamics. We also assumed that people could not change their smoke intensity during their entire life, that the probability of stopping and relapsing did not depend on smoking intensity, and, again, that the distribution of smokers by smoking intensity did not change over the study period.

In general, it is important to note that underlying all the simplifications introduced in model specification is the fact that adding details to a compartmental model goes along with the definition of new compartments and new transitions, and without available and reliable data, the model could become non-identifiable producing more uncertain and unstable results [ 53 ]. Moreover, microsimulation models or social network models, that explore smoking dynamics from an individual point of view, could be more suitable solutions to introduce detail and complexity, including those related, for example, to the course of disease [ 54 , 55 ], or to explore the exposure to second-hand smoke that, being related to the social network of the individuals, was not considered in our analysis.

Conclusions

We developed an approach for modelling smoking dynamics in the population that overcomes many of the limitations of previously proposed models. It includes validation tools like cross-validation on a rolling basis and GSA, aimed at checking the robustness of our results and supporting our findings.The model can be generalized and applied to other Italian regions changing the initial conditions of the system. The fact that the surveys we relied on provide information about all regions makes this extension easily feasible. The proposed approach can be straightforwardly applied also to other countries after a careful check of the validity of the model assumptions, which, however, can be mostly adapted to different contexts. Finally, it can be also used to assess the impact of other tobacco control policies on smoking prevalence and mortality, beyond those considered in this paper.

Availability of data and materials

Data are available on request from the corresponding author.

Code availability

Codes are available on request from the corresponding author.

IARC, editor. Tobacco smoke and involuntary smoking: this publication represents the views and expert opinions of an IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, which met in Lyon, 11 - 18 June 2002. No. 83 in IARC monographs on the evaluation of carcinogenic risks to humans. Lyon: IARC; 2004.

Institute of Medicine (U S ), Bonnie RJ, Stratton KR, Wallace RB, editors. Ending the tobacco problem: a blueprint for the nation. Washington, DC: National Academies Press; 2007.

IARC, editor. A review of human carcinogens. No. 100 in IARC monographs on the evaluation of carcinogenic risks to humans. Lyon: IARC; 2012.

Loring B. Tobacco and inequities: guidance for addressing inequities in tobacco-related harm. Copenhagen: World Health Organization, Regional Office for Europe; 2014.

Google Scholar  

GBD 2019 Risk Factors Collaborators. Global burden of 87 risk factors in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396(10258):1223–49. https://doi.org/10.1016/S0140-6736(20)30752-2 .

World Health Organization. Tobacco control for sustainable development. New Delhi: Regional Office for South-East Asia; 2017.

Gorini G, Carreras G, Lugo A, Gallus S, Masocco M, Spizzichino L, et al. Electronic cigarette use as an aid to quit smoking: Evidence from PASSI survey, 2014–2021. Prev Med. 2023;166:107391. https://doi.org/10.1016/j.ypmed.2022.107391 .

Article   PubMed   Google Scholar  

Gorini G, Gallus S, Carreras G, Mei BD, Masocco M, Faggiano F, et al. Prevalence of tobacco smoking and electronic cigarette use among adolescents in Italy: Global Youth Tobacco Surveys (GYTS), 2010, 2014, 2018. Prev Med. 2020;131:105903. https://doi.org/10.1016/j.ypmed.2019.105903 .

Cerrai S, Benedetti E, Colasante E, Scalese M, Gorini G, Gallus S, et al. E-cigarette use and conventional cigarette smoking among European students: findings from the 2019 ESPAD survey. Addiction. 2022;117(11):2918–32. https://doi.org/10.1111/add.15982 .

Mendez D, Warner KE, Courant PN. Has Smoking Cessation Ceased? Expected Trends in the Prevalence of Smoking in the United States. Am J Epidemiol. 1998;148(3):249–58. https://doi.org/10.1093/oxfordjournals.aje.a009632 .

Article   PubMed   CAS   Google Scholar  

Levy DT, Friend K. A Simulation Model of Policies Directed at Treating Tobacco Use and Dependence. Med Dec Mak. 2002;22(1):6–17. https://doi.org/10.1177/02729890222062874 .

Article   Google Scholar  

Carreras G, Gallus S, Iannucci L, Gorini G. Estimating the probabilities of making a smoking quit attempt in Italy: stall in smoking cessation levels, 1986–2009. BMC Public Health. 2012;12(1):183. https://doi.org/10.1186/1471-2458-12-183 .

Article   PubMed   PubMed Central   Google Scholar  

National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health. The Health Consequences of Smoking-50 Years of Progress: A Report of the Surgeon General. Reports of the Surgeon General. Atlanta (GA): Centers for Disease Control and Prevention (US); 2014. http://www.ncbi.nlm.nih.gov/books/NBK179276/ .

Feuer EJ, Levy DT, McCarthy WJ. Chapter 1: The Impact of the Reduction in Tobacco Smoking on U.S. Lung Cancer Mortality, 1975-2000: An Introduction to the Problem: Introduction: Impact of the Reduction in Tobacco Smoking on U.S. Lung Cancer Mortality. Risk Anal. 2012;32:S6–S13. https://doi.org/10.1111/j.1539-6924.2011.01745.x .

Levy DT, Nikolayev L, Mumford E, Compton C. The Healthy People 2010 smoking prevalence and tobacco control objectives: results from the SimSmoke tobacco control policy simulation model (United States). Cancer Causes Control. 2005;16(4):359–71. https://doi.org/10.1007/s10552-004-7841-4 .

Singh A, Wilson N, Blakely T. Simulating future public health benefits of tobacco control interventions: a systematic review of models. Tob Control. 2021;30(4):460–70. https://doi.org/10.1136/tobaccocontrol-2019-055425 .

Levy DT, Gallus S, Blackman K, Carreras G, Vecchia CL, Gorini G. Italy SimSmoke: the effect of tobacco control policies on smoking prevalence and smoking-attributable deaths in Italy. BMC Public Health. 2012;12(1):709. https://doi.org/10.1186/1471-2458-12-709 .

Levy DT, Sánchez-Romero LM, Li Y, Yuan Z, Travis N, Jarvis MJ, et al. England SimSmoke: the impact of nicotine vaping on smoking prevalence and smoking-attributable deaths in England. Addiction. 2021;116(5):1196–211. https://doi.org/10.1111/add.15269 .

Near AM, Blackman K, Currie LM, Levy DT. Sweden SimSmoke: the effect of tobacco control policies on smoking and snus prevalence and attributable deaths. Eur J Public Health. 2014;24(3):451–8. https://doi.org/10.1093/eurpub/ckt178 .

Sánchez-Romero LM, Zavala-Arciniega L, Reynales-Shigematsu LM, Miera-Juárez BSD, Yuan Z, Li Y, et al. The Mexico SimSmoke tobacco control policy model: Development of a simulation model of daily and nondaily cigarette smoking. PLoS One. 2021;16(6):e0248215. https://doi.org/10.1371/journal.pone.0248215 .

Article   PubMed   PubMed Central   CAS   Google Scholar  

Institute of Medicine (US), Wallace RB, Geller A, Ogawa VA, editors. Assessing the use of agent-based models for tobacco regulation. Washington, D.C: National Academies Press; 2015.

Tam J, Levy DT, Jeon J, Clarke J, Gilkeson S, Hall T, et al. Projecting the effects of tobacco control policies in the USA through microsimulation: a study protocol. BMJ Open. 2018;8(3):e019169. https://doi.org/10.1136/bmjopen-2017-019169 .

Carreras G, Gorini G, Paci E. Can a National Lung Cancer Screening Program in Combination with Smoking Cessation Policies Cause an Early Decrease in Tobacco Deaths in Italy? Cancer Prev Res. 2012;5(6):874–82. https://doi.org/10.1158/1940-6207.CAPR-12-0019 .

Lachi A, Viscardi C, Malevolti MC, Carreras G, Baccini M. Compartmental models in epidemiology: Application on Smoking Habits in Tuscany. In: Book of short papers SIS. Italia: Pearson; 2022. pp. 1437–42.

Lachi A, Viscardi C, Baccini M. Approximate Bayesian inference for smoking habit dynamics in Tuscany. Italia: Springer Nature; 2023.

Book   Google Scholar  

Broemeling LD. Bayesian analysis of infectious diseases: COVID-19 and beyond. Chapman and Hall/CRC biostatistics series. Boca Raton, London, New York: CRC Press, Taylor and Francis Group; 2021.

Baccini M, Cereda G, Viscardi C. The first wave of the SARS-CoV-2 epidemic in Tuscany (Italy): A SI2R2D compartmental model with uncertainty evaluation. PLoS One. 2021;16(4):e0250029. https://doi.org/10.1371/journal.pone.0250029 .

Hoogenveen RT, Baal PHV, Boshuizen HC, Feenstra TL. Dynamic effects of smoking cessation on disease incidence, mortality and quality of life: The role of time since cessation. Cost Eff Resour Allocation. 2008;6(1):1. https://doi.org/10.1186/1478-7547-6-1 .

Efron B, Tibshirani R. An introduction to the bootstrap. No. 57 in Monographs on statistics and applied probability. New York: Chapman and Hall; 1993.

Chowell G. Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts. Infect Dis Model. 2017;2(3):379–98. https://doi.org/10.1016/j.idm.2017.08.001 .

Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, et al. Global sensitivity analysis: the primer. Italia: Wiley; 2008.

Thun MJ, Carter BD, Feskanich D, Freedman ND, Prentice R, Lopez AD, et al. 50-Year Trends in Smoking-Related Mortality in the United States. N Engl J Med. 2013;368(4):351–64. https://doi.org/10.1056/NEJMsa1211127 .

Carreras G, Gorini G, Gallus S, Iannucci L, Levy DT. Predicting the future prevalence of cigarette smoking in Italy over the next three decades. Eur J Public Health. 2012;22(5):699–704. https://doi.org/10.1093/eurpub/ckr108 .

Hellinger E. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. J Reine Angew Math. 1909;1909(136):210–71. https://doi.org/10.1515/crll.1909.136.210 .

Mogensen PK, Riseth AN. Optim: A mathematical optimization package for Julia. J Open Source Softw. 2018;3(24):615. https://doi.org/10.21105/joss.00615 .

Zucchini W, MacDonald IL, Langrock R. Hidden Markov models for time series: an introduction using R. 2nd ed. New York: Chapman and Hall/CRC; 2016. https://doi.org/10.1201/b20790 .

Roosa K, Chowell G. Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models. Theor Biol Med Model. 2019;16(1):1. https://doi.org/10.1186/s12976-018-0097-6 .

Saltelli A, Annoni P, Azzini I, Campolongo F, Ratto M, Tarantola S. Variance-based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Comput Phys Commun. 2010;181(2):259–70. https://doi.org/10.1016/j.cpc.2009.09.018 .

Article   CAS   Google Scholar  

Sobol IM. A primer for the Monte Carlo method. Boca Raton: CRC Press; 1994.

Kucherenko S, Albrecht D, Saltelli A. Exploring multi-dimensional spaces: a Comparison of Latin Hypercube and Quasi Monte Carlo Sampling Techniques. arXiv - University of Cornell (USA) JRC98050. 2015. https://doi.org/10.48550/arXiv.1505.02350 .

GBD 2019 Tobacco Collaborators. Spatial, temporal, and demographic patterns in the prevalence of smoking tobacco use and attributable disease burden in 204 countries and territories, 1990-2019: a systematic analysis from the Global Burden of Disease Study 2019. Lancet. 2021;397(10292):2337–60. https://doi.org/10.1016/S0140-6736(21)01169-7 .

Kulik MC, Nusselder WJ, Boshuizen HC, Lhachimi SK, Fernández E, Baili P, et al. Comparison of Tobacco Control Scenarios: Quantifying Estimates of Long-Term Health Impact Using the DYNAMO-HIA Modeling Tool. PLoS ONE. 2012;7(2):e32363. https://doi.org/10.1371/journal.pone.0032363 .

Hara M, Sobue T, Sasaki S, Tsugane S, the JPHC Study Group. Smoking and Risk of Premature Death among Middle-aged Japanese: Ten-year Follow-up of the Japan Public Health Center-based Prospective Study on Cancer and Cardiovascular Diseases (JPHC Study) Cohort I. Jpn J Cancer Res. 2002;93(1):6–14. https://doi.org/10.1111/j.1349-7006.2002.tb01194.x .

Wensink M, Alvarez JA, Rizzi S, Janssen F, Lindahl-Jacobsen R. Progression of the smoking epidemic in high-income regions and its effects on male-female survival differences: a cohort-by-age analysis of 17 countries. BMC Public Health. 2020;20(1):39. https://doi.org/10.1186/s12889-020-8148-4 .

Meade N. Industrial and business forecasting methods, Lewis, C.D., Borough Green, Sevenoaks, Kent: Butterworth, 1982. Price: £9.25. Pages: 144. J Forecast. 1983;2(2):194–6. https://doi.org/10.1002/for.3980020210 .

Mahajan SD, Homish GG, Quisenberry A. Multifactorial Etiology of Adolescent Nicotine Addiction: A Review of the Neurobiology of Nicotine Addiction and Its Implications for Smoking Cessation Pharmacotherapy. Front Public Health. 2021;9:664748. https://doi.org/10.3389/fpubh.2021.664748 .

Reitsma MB, Flor LS, Mullany EC, Gupta V, Hay SI, Gakidou E. Spatial, temporal, and demographic patterns in the prevalence of smoking tobacco use and initiation among young people in 204 countries and territories, 1990–2019. Lancet Public Health. 2021;6(7):e472–81. https://doi.org/10.1016/S2468-2667(21)00102-X .

Gorini G, Costantini A, Franchi G, Terrone R. Environmental tobacco smoke (ETS) at the workplace: considerations about a survey carried out in a pharmaceutical industry. Epidemiol Prev. 2002;26(1):35–9.

PubMed   Google Scholar  

Lopez AD, Collishaw NE, Piha T. A descriptive model of the cigarette epidemic in developed countries. Tob Control. 1994;3(3):242–7. https://doi.org/10.1136/tc.3.3.242 .

Article   PubMed Central   Google Scholar  

Gorini G, Carreras G, Allara E, Faggiano F. Decennial trends of social differences in smoking habits in Italy: a 30-year update. Cancer Causes Control. 2013;24(7):1385–91. https://doi.org/10.1007/s10552-013-0218-9 .

Marcon A, Pesce G, Calciano L, Bellisario V, Dharmage SC, Garcia-Aymerich J, et al. Trends in smoking initiation in Europe over 40 years: A retrospective cohort study. PLoS One. 2018;13(8):e0201881. https://doi.org/10.1371/journal.pone.0201881 .

Alboksmaty A, Agaku IT, Odani S, Filippidis FT. Prevalence and determinants of cigarette smoking relapse among US adult smokers: a longitudinal study. BMJ Open. 2019;9(11):e031676. https://doi.org/10.1136/bmjopen-2019-031676 .

Puy A, Beneventano P, Levin SA, Piano SL, Portaluri T, Saltelli A. Models with higher effective dimensions tend to produce more uncertain estimates. Sci Adv. 2022;8(42):eabn9450. https://doi.org/10.1126/sciadv.abn9450 .

Bongers ML, Ruysscher DD, Oberije C, Lambin P, Groot CAU, Coupé VMH. Multistate Statistical Modeling: A Tool to Build a Lung Cancer Microsimulation Model That Includes Parameter Uncertainty and Patient Heterogeneity. Med Dec Making. 2016;36(1):86–100. https://doi.org/10.1177/0272989X15574500 .

Chrysanthopoulou SA. MILC: A Microsimulation Model of the Natural History of Lung Cancer. Int J Microsimulation. 2016;10(3):5–26. https://doi.org/10.34196/ijm.00164 .

Download references

Acknowledgements

We thank Maria Chiara Malevolti for her contribution to data collection and validation, and all ACAB project participants for their support.

The present work is part of the Attributable Cancer Burden in Tuscany (ACAB) project, funded by Regione Toscana under the grant "Bando Ricerca Salute 2018" ( www.acab-toscana.it ). The funding body played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Authors and affiliations.

Department of Statistics, Computer Science, Applications “Giuseppe Parenti” (DiSIA), University of Florence, Viale Giovanni Battista Morgagni 59/65, Florence, 50134, Italy

Alessio Lachi, Cecilia Viscardi, Giulia Cereda & Michela Baccini

Epidemiology and Health Research Lab, Institute of Clinical Physiology of the Italian National Research Council (IFC-CNR), Via Giuseppe Moruzzi 1, Pisa, 56124, Italy

Alessio Lachi

Florence Center for Data Science, University of Florence, Viale Giovanni Battista Morgagni 59, Florence, 50134, Italy

Cecilia Viscardi, Giulia Cereda & Michela Baccini

Oncologic Network, Prevention and Research Institute (ISPRO), Servizio Sanitario della Toscana, Via Cosimo il Vecchio 2, Florence, 50139, Italy

Giulia Carreras

You can also search for this author in PubMed   Google Scholar

Contributions

A.L. wrote the code and conducted the statistical analysis; A.L., C.V. and M.B. conceptualized the model and wrote the first version of the paper; G.Ce. and G.Ca. provided critical feedback; M.B. supervised the project. All the authors read and approved the final version of the paper.

Corresponding authors

Correspondence to Alessio Lachi or Michela Baccini .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Lachi, A., Viscardi, C., Cereda, G. et al. A compartmental model for smoking dynamics in Italy: a pipeline for inference, validation, and forecasting under hypothetical scenarios. BMC Med Res Methodol 24 , 148 (2024). https://doi.org/10.1186/s12874-024-02271-w

Download citation

Received : 28 August 2023

Accepted : 27 June 2024

Published : 13 July 2024

DOI : https://doi.org/10.1186/s12874-024-02271-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Compartmental models
  • Smoking dynamics
  • Tobacco control policies
  • Global sensitivity analysis
  • Parametric bootstrap
  • Cross validation
  • Smoking attributable deaths
  • Forecasting
  • Calibration
  • Regression splines

BMC Medical Research Methodology

ISSN: 1471-2288

research methodology analysis definition

Transferable preference learning in multi-objective decision analysis and its application to hydrocracking

  • Original Article
  • Open access
  • Published: 15 July 2024

Cite this article

You have full access to this open access article

research methodology analysis definition

  • Xinzhe Wang 2 ,
  • Chao Jiang   ORCID: orcid.org/0000-0001-8106-4740 1 ,
  • Yang Liu 2 ,
  • Lianbo Ma 2 ,
  • Cuimei Bo 3 &
  • Quanling Zhang 1  

Hydrocracking represents a complex and time-consuming chemical process that converts heavy oil fractions into various valuable products with low boiling points. It plays a pivotal role in enhancing the quality of products within the oil refining process. Consequently, the development of efficient surrogate models for simulating the hydrocracking process and identifying appropriate solutions for multi-objective oil refining is now an important area of research. In this study, a novel transferable preference learning-driven evolutionary algorithm is proposed to facilitate multi-objective decision analysis in the oil refining process. Specifically, our approach involves considering user preferences to divide the objective space into a region of interest (ROI) and other subspaces. We then utilize Kriging models to approximate the sub-problems within the ROI. In order to enhance the robustness and generalization capability of the Kriging models during the evolutionary process, we transfer the mutual information between the sub-problems in the ROI. To validate the effectiveness as well as efficiency of our proposed method, we undertake a series of experiments on both benchmarks and the oil refining process. The experimental results conclusively demonstrate the superiority of our approach.

Avoid common mistakes on your manuscript.

Introduction

Hydrocracking is a vital process in the oil refining industry, converting low-quality feedstocks, exemplified by vacuum gas oil, into high-value transportation fuels under high temperatures and pressures [ 1 ]. This industrial process typically involves the use of two-bed catalytic reactors [ 2 ]. The first reactor, known as the hydro processor (HT), serves the purpose of decomposing sulfur- and nitrogen-containing compounds. On the other hand, the second reactor, the hydrocracker (HC), undertakes hydroisomerization and hydrocracking of the treated liquid fraction from the first reactor [ 2 ]. Hydrocracking offers numerous advantages, including excellent adaptability to various materials, flexibility in the production process, and the production of high-quality products. As a result, it has gained popularity in the production of low-sulfur, low-aromatics diesel fuel, as well as high-smoke point jet fuel [ 1 ].

From an optimization perspective, the hydrocracking process represents a typical multi-objective optimization problem (MOP). In the refining process, it is essential to optimize the operating conditions, such as cell feed flow, reaction pressure, and temperature, to meet the requirements for hydrocracking yield and improve the economic efficiency of refiners [ 3 ]. However, modeling the hydrocracking process as a complex simulation and evaluating the performance of new operating conditions on the simulation can be highly time-consuming and costly. The conventional approach to address this issue often involves speculations and commissioning operations by experienced engineers, which can be both unreliable and inefficient [ 3 ]. As a result, finding efficient and effective methods to handle the optimization of hydrocracking has become a growing concern within the refining industry. To obtain optimal operating conditions for industrial hydrocracking units, researchers have developed various mathematical models and attempted to optimize them. However, the optimization process of hydrocracking still requires improvement, primarily because evaluating the performance of operation units relies on complex simulation models using commercial engineering software, which is highly nonlinear and time-consuming [ 3 ]. Consequently, when the evaluation cost is high, these optimization problems are referred to as expensive multi-objective optimization problems (EMOPs). For evolutionary algorithms with a limited number of fitness evaluations, particularly in multi-objective scenarios, optimizing such problems becomes challenging.

figure 1

Illustration of the hydrocracking process

Traditional MOEAs are typically designed to discover a wide and representative set [ 4 , 5 , 6 ], which is referred as Pareto set (PS). The decision-makers (DM) then select the final solution from this set based on their preferences. Nonetheless, within real-world optimization scenarios such as the hydrocracking process, providing the decision-maker (DM) with the entire of PS for a MOP is frequently impractical [ 7 ]. This limitation arises due to two main reasons: (1) Complex optimization problems: Practical optimization problems are often non-convex, nonlinear, and involve a great number of objective or decision variable dimensions. This poses significant challenges for traditional multi-objective evolutionary algorithms. For instance, in high-dimensional objective spaces, Pareto dominance-based MOEAs may struggle to distinguish the relationships between solutions [ 8 ]. Additionally, they can fail to converge to the true Pareto front (PF) due to the presence of Dominance Resistant Solutions (DRSs) [ 9 ]. (2) Costly evaluations: As previously mentioned, the endeavor to identify the entirety of the Pareto front may necessitate a substantial volume of evaluations, thereby incurring significant costs when applied to practical application problems. The cost of conducting too many evaluations is often unacceptable. Moreover, when MOEAs provide the DM with numerous solutions, it becomes challenging for them to choose the final preferred alternative.

In solving complex MOPs, preference-based MOEAs have gained significant attention [ 10 , 11 , 12 , 13 ], where the preference-driven algorithms incorporate the preference information from the DM into the optimization process, concentrating on one or several constrained regions of the PF, which are often termed as Regions of Interest (ROIs) [ 14 ]. Sandra et al. [ 15 ] come up with a new approach to approximate the ROI by means of aspiration and reservation points. Molina et al. [ 16 ] introduce the concept of g-dominance, which divides the objective space into subregions with varying priorities by setting reference points, guiding the algorithm to prioritize subregions with high importance. Said et al. [ 17 ] propose r-dominance, establishing a strict biased order among non-dominated individuals through distances between individuals and reference points as the second criterion for environmental selection. Yi et al. [ 18 ] merge r-domination and angular domination, further enhancing the performance of preference-based multi-objective evolutionary algorithms. Guo et al. [ 19 ] efficiently design a bolt supporting network in an interactive way, where the preferences from the DM are expressed by dynamically updating the ROI based on preferred solutions during the evolution. Tang et al. [ 20 ] propose a decomposition-based interactive evolutionary algorithm, which efficiently guides the population toward the ROI by transforming the originally evenly distributed reference vectors into a biased distribution. Palakonda et al. [ 21 ] propose an efficient and effective preference-inspired differential evolution algorithm for multi and many-objective optimization, where a preference-inspired mutation operator is designed to generate individuals with good performance and local knee points are acquired to articulate preferences in the mutation operator. Yu et al. [ 22 ] introduce LBD-MOEA, which is based on localized alpha-domination and knee point domination, demonstrating significant advantages in discovering knee point regions [ 23 ]. In summary, preference-based MOEAs offer two main advantages. Firstly, they conduct a targeted search towards the ROIs, avoiding exploration of uninterested regions and thus reducing computational resources. Secondly, the resulting solution set comprises a significantly reduced quantity of solutions compared to traditional methods, rendering the selection of preferred solutions by the DM a more straightforward endeavor [ 7 ]. Preference-based MOEAs can incorporate the DM’s preferences in three ways based on the timing of integration: (1) A priori: Preferences are provided before the search process begins. (2) Interactive: Preferences are incorporated interactively during the search. (3) A posteriori: Preferences are considered after the search is completed, and the final solution is chosen from the provided compromise solutions according to the preferences. Most traditional MOEAs are based on a posteriori approach. In this work, we will utilize a priori approach for preference incorporation in hydrocracking process, where the DM’s preferences are injected before the search begins.

In solving the issue of expensive evaluations, data-driven MOEAs have emerged as a prominent research focus in the discipline of multi-objective optimization [ 24 , 25 , 26 , 27 , 28 ]. These approaches considerably decrease the quantity of real evaluations by utilizing surrogate models. Various technologies have been proposed to build surrogate models, including polynomial response surface methods (PRSM) [ 29 ], radial basis functions (RBF) [ 30 ], and Kriging models [ 31 ], among others. Each surrogate model possesses unique characteristics. For instance, RBF models use appropriate coefficients for aggregating various basis functions, enabling them to simulate intricate design scenarios. In contrast, Kriging models treat the response of the input system as a stochastic process, making them suitable for predicting nonlinear problems [ 1 ].

In addition, some studies have embedded preferences into surrogate models to handle expensive MOPs. For example, Gibson et al. [ 32 ] define a novel acquisition function which considers the maximin distance from individuals to the PF and to the aspirational points chosen by the DM, which can effectively drive the population toward the preferred region. Wang et al. [ 33 ] propose an surrogate-assisted framework by integrating epsilon-dominance with iterative radial basis function, where the size of the ROI is controlled by the epsilon value. Tinkle et al. [ 34 ] successfully incorporate preference vectors from DM to Kriging-assisted evolutionary algorithm [ 27 ] for multiobjective shape design in a ventilation system. Besides, He [ 35 ] also incorporates the desirable objective function values into scalarising functions, and innovatively use generalised value distribution to approximate the scalarising function to get general surrogate models. In addition, Mohammad et al. [ 36 ] use achievement scalarising function [ 37 ] as the objective surrogate function, which is employed in generating solutions reflecting the preferences of the DM. Tang et al. [ 38 ] take the advantage of knee points [ 23 ] as their preference information for the Pareto front estimation, considering when the preference from the decision maker is not available. Yang et al. [ 39 ] design truncated expected hypervolume improvement as an infill criterion to effectively approximate preferred Pareto front.

Although recent studies have shown strong capability in handling expensive MOPs, different preference articulations are set for different scenarios, such as aspirational points, epsilon-dominance, preference vectors, achievement scalarising function, knee points, truncated expected hypervolume improvement. In handling the hydrocracking process, empirical studies have shown that the ranges of the objective values of the hydrocracking problem vary greatly on each objective. Especially, simply approximating the objectives or scalarising functions is not efficient as small number of solutions are around the preference vector. Hence, when we take account of the preference from the DM and the consumption of expensive evaluations, appropriate design of prefernece model is required. Accordingly, we come up with a new preference model composed of m +1 reference vectors, where m is the number of objectives. The model restrains the ROI through a preference vector and a parameter to control the size of ROI. Consequently, the development of efficient surrogates along the m +1 reference vectors can be easily built, which not only approximates the sub-problems within the ROI but also faciliate the DM to control the size of ROI. In addition, we can transfer the mutual information between the sub-problems to generate diverse solutions within the ROI. In this way, we theoretically and experimentally extend our recent work [ 40 ] and provide with a new way of multi-objective decision analysis in the oil refining process.

In summary, the contributions of this paper are as follows:

In our preference-based framework, we incorporate the Kriging model. Since the preference region typically constitutes only a small portion of the entire Pareto frontier, we leverage the exploration and exploitation capabilities of the Kriging model. Within ROI, we generate only \(m+1\) reference vectors. These reference vectors help guide the search within the ROI. The Kriging model is then employed to directly approximate the fitness values of individuals corresponding to these reference vectors. By doing so, we efficiently explore the ROI while effectively capturing the fitness landscape of the hydrocracking optimization problem.

To better explore the ROI, transfer learning is introduced into the search process to improve the surrogate model’s prediction ability and global exploration ability. It mainly includes two levels of transfer: (1) sample transfer: individuals who perform well on different sub-problems are involved in building surrogate models for other sub-problems. (2) parameter transfer: when building an surrogate model, information about the parameters of other surrogate models is consulted.

The subsequent sections of this paper are structured as follows. Section  Related works provides an overview of previous related research, encompassing fundamental notions of multi-objective optimization and Kriging models. Section  Proposed method is devoted to clarifying our divised algorithm and its implementation. Section  Experimental results and analysis presents and discusses the experimental results of our TPD-MOEA and the comparative algorithm for the benchmark problem and the refining process. Finally, Sect.  Conclusion summarizes this paper.

Related works

In this section, we begin by providing the definition of MOPs [ 41 ]. For the sake of convenience, our discussion will be confined to scenarios where all objectives are aimed at minimization, as the conversion of maximizing objectives to minimizing ones can be readily accomplished. Subsequently, we introduce the well-known Kriging model, which is utilized to approximate the value of the aggregation function.

Definition of MOPs

A minimization MOP takes the following form:

where \(\textbf{x} = \left( x_1, x_2,\ldots , x_d \right) ^T\) symbolizes the vector of decision variables, d represents the dimension of the decision vector, \(\Omega \in {R}^d\) is decision space, \(F:\Omega \rightarrow {R}^m\) denotes the objective vector, comprising m objective functions, \({R}^m\) is objective space, \(h_i\) and \(g_j\) is the equality and inequality constraints of the problem respectively, while \(n_h\) and \(n_g\) are the numbers of the equality and inequality constraints.

In MOPs, the objectives frequently exhibit conflicting characteristics, meaning that no solution can be optimal across all objectives concurrently. Instead, optimal solutions in MOPs represent a balance or compromise among the objectives, and these solutions are termed Pareto Solutions. The entirety of Pareto optimal solutions constitutes the Pareto set (PS) within the decision space, while the corresponding mapping of these solutions in the objective space is designated as the Pareto front (PF). When \(m > 3\) , MOPs are termed many-objective optimization problems (MaOPs).

Kriging model

In surrogate-assisted evolutionary algorithms (SAEAs), the Kriging model, also known as the Gaussian process, is a highly appealing mathematical regression model based on the optimal linear unbiased estimation method. It not only offers prediction values but also provides valuable uncertainty information about its predictions. This uncertainty information proves to be beneficial in achieving a balance between exploration and exploitation while managing the model. By incorporating the Kriging model in the optimization process, SAEAs can efficiently approximate the fitness functions and guide the search towards promising regions in the search space.

Kriging model treats the input x as a variable from Gaussian distribution [ 42 ], which can be formulated as:

where \(\mu \) is the prediction values of the Kriging model, \(\varepsilon \left( \textbf{x} \right) \) is a stochastic process characterized by a zero mean and a standard deviation \(\sigma \) :

To approximate the real function \(f\left( \textbf{x} \right) \) , Kriging model necessitates training through a set of samples. Considering a collection of N n -dimensional inputs denoted as \(X=\left. \left\{ \textbf{x}^1, \textbf{x}^2, \cdots ,\textbf{x}^N \right. \right\} ^T\) and their associated objective function values \(Y=\left. \left\{ y^1, y^2, \cdots ,y^N \right. \right\} ^T\) , there exists a multivariate Gaussian distribution on \(R^n\) . In general, For two arbitrary inputs \(\textbf{x}^i\) and \(\textbf{x}^j\) , Kriging model uses squared exponential correlation with additional hyperparameters to calculate their correlations \(R\left( \textbf{x}^i, \textbf{x}^j \right) \) :

where \(i,j = 1, \cdots , n\) . \(\theta _{k}\) and \(p_{k}\) are the hyperparameters. When there are N inputs, an \(N \times N\) covariance matrix S can be obtained:

The hyperparameter \(\theta _{k}\) denotes the importance of \(k\text {-th}\) dimension and can be estimated by maximizing the following likelihood function:

where \(\det \left( \cdot \right) \) is the determinant of a square matrix.

Variable \(\hat{\mu }\) and \(\hat{\sigma }\) can be estimated by

where \(\textbf{1}\) stands for an \(n \times 1\) vector comprised of ones.

Given a new input \({\bar{\textbf{x}}}\) , the prediction value \(\bar{y}\) and variance \(\bar{\sigma }^2\) of \(\bar{x}\) can be calculated by the following formulas:

where r represents an \(n \times 1\) vector containing the covariance values between \({\bar{\textbf{x}}}\) and all the training points.

Proposed method

In this section, we will describe the proposed TDP-MOEA. Specifically, the framework of the proposed algorithm is outlined first, and then the preference model and transfer learning models are elaborated in detail.

The overarching framework of the presented algorithm, i.e., a transfer learning-driven preference-inspired multi-objective evolutionary algorithm (TDP-MOEA), is presented in Algorithm 1, which includes four main phases:

figure a

Main Framework of TDP-MOEA

Preference articulation method (line 1): The proposed TDP-MOEA algorithm requires the DM to provide their preference vector and specify the desired region size. Using this information, we can determine the boundary of ROI. Within the ROI, we generate a set of uniform reference vectors. Notably, our approach stands out by generating only \(m + 1\) reference vectors, a unique feature that sets it apart from other methods. The process of determining the boundaries of preference regions and generating the reference vectors will be thoroughly discussed in Sect.  Preference model .

Initialization (lines 2–4): During this stage, our primary objectives are to generate the initial population and establish the initial surrogate models. Initially, the Latin hypercube sampling method [ 43 ] will be employed to produce N individuals. Subsequently, the real objective functions will evaluate all these individuals. Once evaluated, the individuals will be linked with the closest reference vector, determined by their angle with the reference vectors. Their fitness will then be calculated based on different vectors. All the evaluated individuals will be duplicated into an archive for storage. Among these archived individuals, we will select those demonstrating the highest fitness on this vector. These selected individuals will be utilized to train the Kriging model, which is employed in approximating the fitness of the individuals.

Evolving (lines 5–14): In each generation of the evolutionary process, offspring will be generated using the Kriging models and the Expected Improvement (EI) acquisition function. Subsequently, the newly created individuals will be evaluated using the real objective functions. They will then be associated with the closest reference vector and have their fitness calculated based on different vectors before being stored in the archive. To enhance the Kriging models, a transfer learning method will be employed, which encompasses two levels of transfer: sample transfer and parameter transfer. A detailed discussion of these transfer methods can be found in Sect.  Transfer learning model . The evolutionary process will persist until the maximum number of evaluations, denoted as MaxFEs , is reached.

Selection (lines 15–16): Finally, all non-dominated individuals in ROI form the final solutions.

Preference model

The ways in which DM express the preferences can be broadly categorized into two groups. The first category encompasses knee points, extreme points, or nadir points, while the second category involves goal attainment, reference vectors, preference relations, and other similar approaches. The key distinction lies in whether the preferences are explicitly stated [ 7 ].

In this work, we introduce a novel preference model based on reference vectors. Since ROI constitutes only a sub-region of the entire PF. As a result, the variation between different sub-problems within the ROI is smaller compared to that of the entire PF. Consequently, there is no need to generate an excessive number of reference vectors. Instead, we generate \(m + 1\) reference vectors to guide the direction of population evolution, where m represents the number of objectives of the test problems.

figure 2

Illustration of the preference model

The method of generating reference vectors in this work is presented in Algorithm 2. Given the DM’s unit preference vector \(W_{pre} = \left\{ w_{1}, w_{2}, \cdots , w_{m} \right\} , \sum _{i}^{m}{w_{i}=1}\) , and the desired region size \(\epsilon \) , \( 0< \epsilon \le 1\) , the boundary vectors of ROI can be defined by follows:

where \(W_b\) consists m extreme vectors, e.g., (1,0,0), (0,1,0), and (0,0,1) in 3-objectives case. The larger the \(\epsilon \) is, the large the ROI will be. Once the m boundary reference vectors have been determined, the m reference vectors \(W_{ROI}\) will be calculated using equation 12 . These m reference vectors will then be combined with the preference reference vector \(W_{pre}\) . As a result, a total of \(m + 1\) reference vectors will be evenly distributed in ROI to guide the direction of the evolutionary search. For a visual representation of this process, please refer to Fig.  2 . In Fig.  2 , the intersection of the boundary vector, the extreme vector, and the preference vector with the hyperplane corresponds to the boundary point, the extreme point, and the preference point, respectively.

figure b

Reference_Vectors_Generation

Angle penalized distance [ 27 ] is employed as the fitness function, which can be calculated by equation 13 :

where \(\left\| f_j\right\| \) is the distance from the objective vector of the \(j\text {-th}\) individual to the origin point and \(\theta _{j}^{i}\) is the angle between the \(j\text {-th}\) individual and the reference vectors \(W_{ROI}[i]\) . \(P\left( \theta _{j}^{i}\right) \) is a penalty function, which is defined as follows:

where m is the number of objects, \(\alpha \) serves as a hyper-parameter employed to regulate the rate of change of the penalty function and we use the default setting of \(\alpha \) in [ 27 ], FEs is the number of evaluations that have been consumed, and MaxFEs is the maximum number of evaluations, \(\theta \) is the angle between the reference vectors, which is a constant value. During the initial phase of the evolutionary process, an emphasis on convergence is prioritized to draw individuals nearer to the PF. Once individuals have successfully converged to the PF, the focus shifts towards promoting diversity, leading to the dispersion of individuals along the PF.

Surrogate model

In data-driven evolutionary algorithms, surrogate models can replace real function evaluations, providing predictions for objective values. For example, the MOEA/D-EGO method treats the predicted objective values as independent variables and uses the additivity of the Gaussian distribution to generate a new Gaussian model (as mentioned in [ 25 ]). However, such may can accumulate errors in predicted values, leading to inaccuracies in fitness values. In addition, treating predicted objective values as independent variables may neglect critical information because of conflicting objectives.

Instead, we prefer the direct method, using surrogate models to directly approximate the aggregation function. However, this approach requires an increasing number of surrogate models with the inclusion of more reference vectors, making it computationally demanding. To handle this issue, in this work, \(m + 1\) Kriging models are trained to approximate the fitness of solutions on different reference vectors.

Expected improvement (EI) acquisition function [ 44 ] is used as the infill criterion for sampling promising offspring. For a minimization problem, it is given by

where \(\varPhi (\cdot )\) and \(\phi (\cdot )\) represent the normal cumulative distribution function and the probability density function, respectively. \(y_{min}\) denotes the minimum value in the current population. For a new individual \(\textbf{x}\) , \(\hat{y}(\textbf{x})\) and \(s(\textbf{x})\) represent the predicted value and uncertainty information, respectively, provided by the Kriging model.

Transfer learning model

In order to decrease the quantity of models, we adopt a strategy of generating \(m + 1\) reference vectors uniformly within ROI. This allows us to convert the original multi-objective problem into \(m + 1\) single-objective problems. Each sub-problem is then addressed using a surrogate model, which predicts the fitness values of individuals. However, using such a small number of reference vectors presents potential issues.

The first issue involves the accuracy of surrogate models. Initially, the reference vectors partition the objective space into \(m + 1\) sub-regions, with individuals in each sub-region forming a sub-population. These individuals are utilized for training the surrogate model, which can accurately predict fitness values within that specific sub-region. However, the predicted values might become less accurate for individuals located in other regions. To overcome this limitation, we propose a transfer learning method based on individuals in the evolving stage. For each \(W_{ROI}[i]\) reference vector, we select \(11 \cdot d -1 + 25\) individuals (suggested in [ 24 ]) with the best fitness value in that sub-problem to train the model. This inclusion of individuals from other sub-regions helps to enhance the model’s accuracy and expedite the algorithm’s convergence.

Additionally, even though the reference vectors are uniformly distributed, their small number may not adequately cover the entire Pareto front within the ROI. Consequently, the distribution of final solutions might be concentrated in a few points rather than evenly spread across the entire Pareto front, making it challenging to maintain a uniform distribution of populations.

Moreover, during the offspring selection process, the individual with the maximum EI value is chosen as the new offspring. This process strongly depends on the surrogate model’s parameters. The hyper-parameter \(\theta _i\) in the Kriging models represents the importance of the \(i\text {-th}\) feature (decision variable) and can be considered as the extracted characteristics. Given that ROI usually represents only a small fraction of the entire Pareto front, sub-problems within ROI may share many common characteristics. Transferring these parameters might be a viable approach to alleviate this problem. Transfer learning in Kriging models has been proposed by several researchers [ 45 , 46 ]. In this work, we adopt a feature selection-based parameter transfer learning method [ 47 ] to facilitate information sharing among models.

The transfer learning process is presented in Algorithm 3, encompassing two-level transfer learning, namely, individual-based transfer learning and parameter-based transfer learning. Initially, we select \(11 \cdot d - 1 + 25\) individuals with the best fitness for this sub-problem to train the model. Subsequently, these individuals are categorized into different classes based on the minimum angle of their objective vector to the reference vectors. The number of individuals in each class indicates the significance of the model’s parameters and is utilized in the parameter transfer process. To prevent negative transfer, we employ a binary particle swarm optimization algorithm based on filtered feature selection every five generations to discern the most pertinent subset of parameters for this sub-problem. Moving on to the parameter-based transfer process, let us consider the \(i\text {-th}\) trained model, and suppose that the index set of the most relevant variables related to the corresponding sub-problem is denoted as \(param_{index}\) . Let \(prop = \left\{ prop_{1}, prop_{2}, \cdots , prop_{m + 1} \right\} \) represent the set of proportions for each class. The new parameters of this model can be calculated as follows:

where \(j \in param_{index}\) , and \(\theta _i^{j}\) is j parameter of \(i\text {-th}\) Kriging model. \(\alpha \) is an adaptation parameter, which is used to assign adaptive weights of parameters of different models in the aggregation function, which is formulated as follows:

where MaxFEs and FEs denote the max number and the current number of the function evaluations using the real objective function, respectively. At the beginning of the optimization, the spaced areas between sub-regions are relatively large, so the utilization of knowledge from other models to generate more offspring within this region is preferable. As optimization proceeds, the spaced areas become smaller, so the model’s parameters will become increasingly important. Note that when updating the \(i\text {-th}\) model, if the percent of \(i\text {-th}\) class is less than p , we think the model has not extracted enough information in its response regions, so the model’s parameters will not be transferred. In this work, parameter p is set to \(1 / (m + 1) \cdot 0.5\) .

figure c

Transfer_Learning_Method

Experimental results and analysis

Experimental settings.

To evaluate the proposed algorithm’s performance, we conduct comparative studies with six representative reference or preference vectors driven algorithms, namely, K-RVEA [ 27 ], MOEA/D-EGO [ 25 ], r-NSGAII [ 17 ], MOEA-D-PRE [ 48 ], MCEA/D [ 49 ] and PB-RVEA [ 50 ]. For fair comparison, all algorithms are constrained to find the optimal solution within the same preference region, and use the same \(m + 1\) reference vectors generated in the ROI. The numerical experiments are performed on the popular ZDT benchmark [ 51 ] and DTLZ benchmark [ 52 ] for 2, 3, and 5 objectives. Each algorithm is independently run on each test problem 20 times. The number of decision variables is set to \(m + k\) , where m denotes the number of objectives, and k is set to 9. The population size and maximum number of fitness evaluations are set to 100 and 200, respectively, when \(m=2\) , and 105 and 300, respectively, when \(m=3\) , and 126 and 400, respectively, when \(m=5\) . The preference vector \(W_{pre}\) is set to \(\left\{ \frac{1}{m},\frac{1}{m},\cdots , \frac{1}{m} \right\} \) to ensure uniform preference across all objectives. All the test codes of compared algorithms and benchmark instances are provided in PlatEMO [ 53 ].

figure 3

The distribution of the PF obtained by TDP-MOEA, TDP-MOEA-0, TDP-MOEA-1 and TDP-MOEA-2 on DTLZ-2

Different performance indicators are utilized for the experiment analysis and application results analysis. In the experiment analysis, we employ the inverted generation distance (IGD) [ 54 ], which assesses the average distance between the obtained front and the reference front, providing a comprehensive evaluation of both convergence and diversity. This metric allows for comparing the performance of different algorithms on benchmarks. When we calculate the IGD, we need to have the reference front in the ROI. The way to get the reference front in the ROI is as follows: (1) we need to get the reference points of the true PF of a test problem through PlatEMO [ 53 ]. (2) Following the preference model in Sect.  Preference model , all reference points of the true PF will be mapped to the space of preference/reference vectors. (3) On the basis of the given preference vector and the predefined desired region size ( \(\epsilon \) ), we can find the boundary points of the ROI. (4) Once the boundaries are determined, all solutions in the ROI are found and denoted by the reference front.

Additionally, we use the Hypervolume (HV) [ 55 ], another comprehensive metric, to measure the volume dominated by the obtained approximated front during the application results analysis. In general, superior algorithm performance is indicated by smaller IGD values and larger HV values. It’s worth noting that only individuals within ROI contribute to the calculation of performance indicators. For the experiment analysis, we set the number of reference points for IGD to 1000, 1540, and 1820, respectively. Regarding HV, we use a reference point of (10172, 0.25, 0.49, 0.7, 0.17), which is an empirical value derived from the oil refining process.

Performance comparison on benchmark

The results obtained by TDP-MOEA and other compared algorithms are presented in Tables 1 , 2 , 3 , providing the mean and standard deviation of the IGD values for the ZDT and DTLZ test suites with \(\epsilon =0.3\) , 0.5, and 0.7, respectively. In these tables, the symbol ‘NaN’ denotes cases where the algorithm failed to find a solution within ROI. Additionally, the symbols ‘+,’ ‘ \(\approx \) ’, and ‘-’ indicate that the compared algorithm performs significantly better, similarly to, or worse than TDP-MOEA. The highlight displays the best result for each test instance, while the statistical results are presented at the bottom of the table.

figure 4

The distribution of the PF obtained by TDP-MOEA, TDP-MOEA-0, TDP-MOEA-1 and TDP-MOEA-2 on DTLZ-4

Several observations can be drawn from the above results. Firstly, the proposed algorithm TDP-MOEA consistently outperforms other algorithms and ranks first with the most best results on the test instances with different numbers of objectives and sizes of ROI, particularly on the ZDT test suites, DTLZ1, DTLZ2, DTLZ3 and DTLZ5. The results indicate that the proposed transferable preference learning technique is able to improve the searching efficiency of TDP-MOEA in handling expensive MOPs. However, on ZDT4 when \(\epsilon = 0.3\) , \(\epsilon = 0.5\) , and \(\epsilon = 0.7\) , and on DTLZ4 with 8 objectives when \(\epsilon = 0.3\) and \(\epsilon = 0.5\) , none of the algorithms can identify a preferred solution. This is because these test instances are exceedingly challenging to approximate their Pareto fronts, and the difficulty escalates with limited evaluations. By contrast, on DTLZ4 with 8 objectives when \(\epsilon = 0.7\) , TDP-MOEA, KRVEA, PB-RVEA are able to find the solutions in the ROI. This results may indicate that smaller ROI makes the optimization more difficult since a smaller number of solutions may be generated in the ROI during the optimization process. This phenomenon also happens in the statistical results. From Tables 1 , 2 , 3 , TDP-MOEA performs better than other algorithms on most instances, and the superiority of TDP-MOEA turns more apparent when the ROIs expand. In addition, on ZDT6 when \(\epsilon = 0.3\) , all algorithms except TDP-MOEA give results of ‘NaN’, which demonstrates that TDP-MOEA can find solutions within the preference region even when the preference region is small. Overall, the results in Tables 1 , 2 , 3 have demonstrated that the proposed transferable preference learning technique is able to assist in improving the searching ability of TDP-MOEA in handling expensive MOPs.

To assess the impact of the transfer learning method, we conducted an additional comparison experiment between the algorithm based on no transfer learning strategy (TDP-MOEA-0), the algorithm based on the individual-driven transfer learning strategy (TDP-MOEA-1), the algorithm based on the parameter-driven transfer learning strategy (TDP-MOEA-2), and the algorithm based on the mixture of the strategies (TDP-MOEA). Figures  3 , 4 illustrate the distribution of the non-dominated solutions generated by TDP-MOEA, TDP-MOEA-0, TDP-MOEA-1 and TDP-MOEA-2 on DTLZ-2 and DTLZ-4 test problems, respectively. In these figures, the red dots represent the non-dominated individuals, solid orange lines depict the reference vectors, and blue dashed lines outline the boundary of the preference region. In terms of the results, it can be seen that the variants with transfer learning strategies can generate more non-dominated solutions in the ROI compared with TDP-MOEA-0, proving that either of the individual-based and the parameter-based transfer learning strategies is able to improve the prediction ability of the model. Especially in Figs.  3 , 4 , TDP-MOEA with the mixture of the individual- and parameter-based transfer learning strategies generates more non-dominated solutions in the ROI compared with TDP-MOEA-0, TDP-MOEA-1 and TDP-MOEA-2, proving that the hybrid transfer learning strategies are able to effectively guide the evolutionary process toward the ROI.

Application on oil refining process

In this section, we apply TDP-MOEA to optimize the oil refining process and compare its performance with other methods. We use a 5-objective simulation model with eight decision variables [ 3 ], and Aspen HYSYS [ 56 ] is used for rigorous hydrocracking process simulation. The decision variables, optimization objective functions, and parameter settings (such as feed material properties, reactor design parameters, and recycled bottom conditions) are detailed in Tables 4 , 5 , 6 , 7 , 8 .

The maximum number of evaluations of the true objective function ( MaxFEs ) is set to 500, the initial population size is 126, and the prefrerence vector of DM \(W_{pre}\) is set to \(\left\{ 0.2, 0.2, 0.2, 0.2, 0.2\right\} \) . \(\epsilon \) is set to 0.5, as it yielded the best performance on the benchmarks. The parameters of the comparison algorithms are set to the recommended values from the corresponding literature.

For fairness, all comparison algorithms use the same \(m+1\) reference vectors generated in the ROI. We calculate HV(pre) and HV based on the non-dominated solutions within the ROI and the entire space, respectively. The statistical results for HV values on 30 independent runs are presented in Table 9 , showing that TDP-MOEA achieves the best results either within ROI or the entire space, while MCEA/D exhibits the poorest overall performance. Additionally, we also provide the visualization results in Fig.  5 . It illustrates the distribution of non-dominated fronts obtained by different algorithms. Red individuals represent solutions in the preference region, while the blue ones are the non-preferred solutions. The figures show that TDP-MOEA produces more non-dominated solutions in the preference regions compared to other methods. Moreover, MOEA/D-PRE and MCEA/D are unable to generate non-dominated solutions in the ROI. PB-RVEA and r-NSGAII generate sparse non-dominated solutions in the ROI. K-RVEA and MOEA/D-EGO perform better than the above algorithms. However, when they are compared with TDP-RVEA, they seem to be less effective than TDP-MOEA in finding preferred solutions in the region of interest and the results in Table 9 can also support this conclusion. Overall, TDP-MOEA shows its advantage in providing more choices in the ROI for the DM and helping them gain a clearer understanding of their preferences in handling the oil refining process problem.

figure 5

The parallel coordinates of the PF by TDP-MOEA and other algorithms on hydrocracking optimization problem

In this paper, we present a novel transfer learning driven preference-inspired multi-objective evolutionary algorithm, named TDP-MOEA, designed to tackle EMOPs or EMaOPs in the context of the oil refining process. Our approach involves an in-depth analysis of various methods to construct surrogate models for individual fitness, comparing their performance, and proposing a new preference representation suitable for preference scenarios. To enhance the surrogate model’s accuracy and population diversity, we introduce a new transfer model that incorporates two-level transfers, including individual-based transfer and parameter-based transfer, considering the relationships among different sub-problems. The proposed TDP-MOEA is extensively tested on benchmark suites and successfully applied to real-world oil refining problem. The experimental results demonstrate that TDP-MOEA consistently outperforms classical algorithms used for comparison.

In scenarios where the preference region is small, the challenge of obtaining a preferred solution within that region increases significantly. Finding such solutions quickly, given a limited number of evaluations, becomes a critical concern. Moreover, the existence of numerous different preference expressions adds complexity. Our future research direction will focus on effectively integrating these expressions with surrogate models to enhance the optimization process.

Data availibility statement

There are no datasets used in this paper.

Zhong W, Qiao C, Peng X, Li Z, Fan C, Qian F (2019) Operation optimization of hydrocracking process based on Kriging surrogate model. Control Eng Pract 85:34–40

Article   Google Scholar  

Zhou H, Lu J, Cao Z, Shi J, Pan M, Li W, Jiang Q (2011) Modeling and optimization of an industrial hydrocracking unit to improve the yield of diesel or kerosene. Fuel 90(12):3521–3530

Han D, Du W, Wang X, Du W (2022) A surrogate-assisted evolutionary algorithm for expensive many-objective optimization in the refining process. Swarm Evol Comput 69:100988

Ma L, Li N, Guo Y, Wang X, Yang S, Huang M, Zhang H (2021) Learning to optimize: reference vector reinforcement learning adaption to constrained many-objective optimization of industrial copper burdening system. IEEE Trans Cybern 2:2

Google Scholar  

Ma L, Cheng S, Shi Y (2020) Enhancing learning efficiency of brain storm optimization via orthogonal learning design. IEEE Trans Syst Man Cybern Syst 51(11):6723–6742

Ma L, Liu Y, Yu G, Wang X, Mo H, Wang G-G, Jin Y, Tan Y (2023) Decomposition-based multiobjective optimization for variable-length mixed-variable pareto optimization and its application in cloud service allocation. IEEE Trans Syst Man Cybern Syst 2:2

Yu G, Jin Y, Olhofer M (2019) References or preferences—rethinking many-objective evolutionary optimization. In: Proceedings of the 2019 IEEE Congress on Evolutionary Computation (CEC)

Ishibuchi H, Tsukamoto N, Nojima Y (2008) Evolutionary many-objective optimization. In: 2008 3rd International Workshop on Genetic and Evolving Systems

Ikeda KI, Kita H, Kobayashi S (2001) Failure of pareto-based moeas: Does non-dominated really mean near to optimal? In: Proceedings of the 2001 IEEE Congress on Evolutionary Computation (CEC)

Bechikh S, Kessentini M, Said LB, Ghédira K (2015) Preference incorporation in evolutionary multiobjective optimization. Adv Comput 98:141–207

Wang H, Olhofer M, Jin Y (2017) A mini-review on preference modeling and articulation in multi-objective optimization: current status and challenges. Complex Intell Syst 3:233–245

Li K, Liao M, Deb K, Min G, Yao X (2020) Does preference always help? a holistic study on preference-based evolutionary multiobjective optimization using reference points. IEEE Trans Evol Comput 24(6):1078–1096. https://doi.org/10.1109/TEVC.2020.2987559

Yu G, Ma L, Jin Y, Du W, Liu Q, Zhang H (2022) A survey on knee-oriented multiobjective evolutionary optimization. IEEE Trans Evol Comput 26(6):1452–1472

Adra SF, Griffin I, Fleming PJ (2007) A comparative study of progressive preference articulation techniques for multiobjective optimisation. In: International Conference on Evolutionary Multi-criterion Optimization

González-Gallardo S, Saborido R, Ruiz AB, Luque M (2021) Preference-based evolutionary multiobjective optimization through the use of reservation and aspiration points. IEEE Access 9:108861–108872. https://doi.org/10.1109/ACCESS.2021.3101899

Molina J, Santana LV, Hernández-Díaz AG, Coello CAC, Caballero R (2009) g-dominance: Reference point based dominance for multiobjective metaheuristics. Eur J Oper Res 197(2):685–692

Said LB, Bechikh S, Ghedira K (2010) The r-dominance: A new dominance relation for interactive evolutionary multicriteria decision making. IEEE Trans Evol Comput 14(5):801–818

Yi J, Bai J, He H, Peng J, Tang D (2018) ar-moea: A novel preference-based dominance relation for evolutionary multiobjective optimization. IEEE Trans Evol Comput 23(5):788–802

Guo Y-N, Zhang X, Gong D-W, Zhang Z, Yang J-J (2020) Novel interactive preference-based multiobjective evolutionary optimization for bolt supporting networks. IEEE Trans Evol Comput 24(4):750–764. https://doi.org/10.1109/TEVC.2019.2951217

Tang H, Liu X, Zheng J, Chen T (2021) A preference-based multiobjective evolutionary algorithm based on weight vector adjustment strategy. In: 2021 6th International Conference on Computational Intelligence and Applications (ICCIA), pp. 53–58 . https://doi.org/10.1109/ICCIA52886.2021.00018

Palakonda V, Kang J-M (2023) Pre-demo: preference-inspired differential evolution for multi/many-objective optimization. IEEE Trans Syst Man Cybern Syst 53(12):7618–7630. https://doi.org/10.1109/TSMC.2023.3298690

Yu G, Jin Y, Olhofer M (2020) A multiobjective evolutionary algorithm for finding knee regions using two localized dominance relationships. IEEE Trans Evol Comput 25(1):145–158

Yu G, Jin Y, Olhofer M (2020) Benchmark problems and performance indicators for search of knee points in multiobjective optimization. IEEE Trans Cybern 50(8):3531–3544. https://doi.org/10.1109/TCYB.2019.2894664

Knowles J (2006) ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Trans Evol Comput 10(1):50–66

Zhang Q, Liu W, Tsang E, Virginas B (2010) Expensive multiobjective optimization by moea/d with gaussian process model. IEEE Trans Evol Comput 14(3):456–474

Pan L, He C, He C, Tian Y, Wang H, Zhang X, Jin Y (2018) A classification based surrogate-assisted evolutionary algorithm for expensive many-objective optimization. IEEE Trans Evol Comput 1:5

Chugh T, Jin Y, Miettinen K, Hakanen J, Sindhya K (2018) A surrogate-assisted reference vector guided evolutionary algorithm for computationally expensive many-objective optimization. IEEE Trans Evol Comput 22(1):129–142

Ahsanul Habib, Kumar SH, Tinkle Chugh, Tapabrata Ray, Kaisa Miettinen (2019) A multiple surrogate assisted decomposition-based evolutionary algorithm for expensive multi/many-objective optimization. IEEE Trans Evol Comput 2:2

Box G, Wilson KB (1951) On the experimental attainment of optimum conditions. J R Stat Soc Ser B (Methodol) 13(1):5

MathSciNet   Google Scholar  

Gutmann HM (2001) A radial basis function method for global optimization. J Global Optim 19(3):201–227

Article   MathSciNet   Google Scholar  

Cressie N (1990) The origins of kriging. Math Geol 22(3):239–252

Gibson FJ, Everson RM, Fieldsend JE (2022) Guiding surrogate-assisted multi-objective optimisation with decision maker preferences. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 786–795

Wang W, Akhtar T, Shoemaker CA (2022) Integrating \(\varepsilon \) -dominance and rbf surrogate optimization for solving computationally expensive many-objective optimization problems. J Global Optim 82(4):965–992

Chugh T, Kratky T, Miettinen K, Jin Y, Makonen P (2019) Multiobjective shape design in a ventilation system with a preference-driven surrogate-assisted evolutionary algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1147–1155

Chugh T (2022) R-mbo: a multi-surrogate approach for preference incorporation in multi-objective bayesian optimisation. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1817–1825

Tabatabaei M, Hartikainen M, Sindhya K, Hakanen J, Miettinen K (2019) An interactive surrogate-based method for computationally expensive multiobjective optimisation. J Oper Res Soc 70(6):898–914

Wierzbicki AP (1986) On the completeness and constructiveness of parametric characterizations to vector optimization problems. Oper Res Spektrum 8(2):73–87

Tang J, Wang H, Xiong L (2023) Surrogate-assisted multi-objective optimization via knee-oriented Pareto front estimation. Swarm Evol Comput 77:101252. https://doi.org/10.1016/j.swevo.2023.101252

Yang K, Li L, Deutz A, Back T, Emmerich M (2016) Preference-based multiobjective optimization using truncated expected hypervolume improvement. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 276–281 . https://doi.org/10.1109/FSKD.2016.7603186

Liu Y, Yu G, Cheng J, Jiang C, Wang X, Ma L (2023) Transferable preference learning assist multi-objective decision analysis for hydrocracking. In: 2023 5th international conference on data-driven optimization of complex systems (DOCS), Tianjin, China, pp 1–8. https://doi.org/10.1109/DOCS60977.2023.10294748

Deb K (2005) In: Burke, E.K., Kendall, G. (eds.) Multi-objective optimization, pp. 273–316. Springer, Boston. https://doi.org/10.1007/0-387-28356-0_10

Li N, Ma L, Yu G, Xue B, Zhang M, Jin Y (2022) Survey on evolutionary deep learning: principles, algorithms, applications and open issues. ACM Comput Surv 2:2

Mckay M, Conover RJBJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245

Mockus J, Tiesis V, Zilinskas A (1978) The application of bayesian methods for seeking the extremum. Towards Glob Optim 2:5

Cao B, Pan SJ, Zhang Y, Yeung D-Y, Yang Q (2010) Adaptive transfer learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 24, pp. 407–412

Yu K, Tresp V, Schwaighofer A (2005) Learning gaussian processes from multiple tasks. In: Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany, August 7–11, 2005

Wang X, Jin Y, Schmitt S, Olhofer M (2020) Transfer learning for gaussian process assisted evolutionary bi-objective optimization for objectives with different evaluation times. In: GECCO ’20: Genetic and Evolutionary Computation Conference

Yu G, Zheng J, Shen R, Li M (2015) Decomposing the user-preference in multiobjective optimization. Soft Comput 20(10):1–17

Sonoda T, Nakata M (2022) Multiple classifiers-assisted evolutionary algorithm based on decomposition for high-dimensional multiobjective problems. IEEE Trans Evol Comput 26(6):1581–1595

Song Z, Wang H, Xu H (2022) A framework for expensive many-objective optimization with pareto-based bi-indicator infill sampling criterion. Memet Comput 2:1–13

Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evol Comput 8(2):173–195

Deb K, Thiele L, Laumanns M, Zitzler E (2005) Scalable test problems for evolutionary multiobjective optimization. Evolutionary multiobjective optimization: theoretical advances and applications. Springer, Berlin, pp 105–145

Chapter   Google Scholar  

Tian Y, Cheng R, Zhang X, Jin Y (2017) PlatEMO: A MATLAB platform for evolutionary multi-objective optimization. IEEE Comput Intell Mag 12(4):73–87

Coello C (2005) Solving multiobjective optimization problems using an artificial immune system. Genet Progr Evol Mach 6:3

Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 3(4):257–271

Chang AF, Pashikanti K, Liu YA (2012) Refinery engineering (integrated process modeling and optimization) || supporting materials: List of computer files

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China No. 62103150, National Natural Science Foundation of China No. 62333010.

Author information

Authors and affiliations.

The Institute of Intelligent Manufacturing, Nanjing Tech University, Nanjing, 211816, Jiangsu, China

Guo Yu, Chao Jiang & Quanling Zhang

Software College, Northeastern University, Shenyang, 110819, Liaoning, China

Xinzhe Wang, Yang Liu & Lianbo Ma

College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing, 211816, Jiangsu, China

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Chao Jiang or Lianbo Ma .

Ethics declarations

Conflict of interest.

The authors declare that there is no Conflict of interest of this paper. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval

This paper does not contain any studies with human participants or animals performed by any of the authors

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Yu, G., Wang, X., Jiang, C. et al. Transferable preference learning in multi-objective decision analysis and its application to hydrocracking. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01537-6

Download citation

Received : 18 September 2023

Accepted : 19 June 2024

Published : 15 July 2024

DOI : https://doi.org/10.1007/s40747-024-01537-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Multi-objective optimizationm
  • Refining process
  • Transferable preference learning
  • Decision analysis
  • Find a journal
  • Publish with us
  • Track your research

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Do Thematic Analysis | Step-by-Step Guide & Examples

How to Do Thematic Analysis | Step-by-Step Guide & Examples

Published on September 6, 2019 by Jack Caulfield . Revised on June 22, 2023.

Thematic analysis is a method of analyzing qualitative data . It is usually applied to a set of texts, such as an interview or transcripts . The researcher closely examines the data to identify common themes – topics, ideas and patterns of meaning that come up repeatedly.

There are various approaches to conducting thematic analysis, but the most common form follows a six-step process: familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. Following this process can also help you avoid confirmation bias when formulating your analysis.

This process was originally developed for psychology research by Virginia Braun and Victoria Clarke . However, thematic analysis is a flexible method that can be adapted to many different kinds of research.

Table of contents

When to use thematic analysis, different approaches to thematic analysis, step 1: familiarization, step 2: coding, step 3: generating themes, step 4: reviewing themes, step 5: defining and naming themes, step 6: writing up, other interesting articles.

Thematic analysis is a good approach to research where you’re trying to find out something about people’s views, opinions, knowledge, experiences or values from a set of qualitative data – for example, interview transcripts , social media profiles, or survey responses .

Some types of research questions you might use thematic analysis to answer:

  • How do patients perceive doctors in a hospital setting?
  • What are young women’s experiences on dating sites?
  • What are non-experts’ ideas and opinions about climate change?
  • How is gender constructed in high school history teaching?

To answer any of these questions, you would collect data from a group of relevant participants and then analyze it. Thematic analysis allows you a lot of flexibility in interpreting the data, and allows you to approach large data sets more easily by sorting them into broad themes.

However, it also involves the risk of missing nuances in the data. Thematic analysis is often quite subjective and relies on the researcher’s judgement, so you have to reflect carefully on your own choices and interpretations.

Pay close attention to the data to ensure that you’re not picking up on things that are not there – or obscuring things that are.

Prevent plagiarism. Run a free check.

Once you’ve decided to use thematic analysis, there are different approaches to consider.

There’s the distinction between inductive and deductive approaches:

  • An inductive approach involves allowing the data to determine your themes.
  • A deductive approach involves coming to the data with some preconceived themes you expect to find reflected there, based on theory or existing knowledge.

Ask yourself: Does my theoretical framework give me a strong idea of what kind of themes I expect to find in the data (deductive), or am I planning to develop my own framework based on what I find (inductive)?

There’s also the distinction between a semantic and a latent approach:

  • A semantic approach involves analyzing the explicit content of the data.
  • A latent approach involves reading into the subtext and assumptions underlying the data.

Ask yourself: Am I interested in people’s stated opinions (semantic) or in what their statements reveal about their assumptions and social context (latent)?

After you’ve decided thematic analysis is the right method for analyzing your data, and you’ve thought about the approach you’re going to take, you can follow the six steps developed by Braun and Clarke .

The first step is to get to know our data. It’s important to get a thorough overview of all the data we collected before we start analyzing individual items.

This might involve transcribing audio , reading through the text and taking initial notes, and generally looking through the data to get familiar with it.

Next up, we need to code the data. Coding means highlighting sections of our text – usually phrases or sentences – and coming up with shorthand labels or “codes” to describe their content.

Let’s take a short example text. Say we’re researching perceptions of climate change among conservative voters aged 50 and up, and we have collected data through a series of interviews. An extract from one interview looks like this:

Coding qualitative data
Interview extract Codes
Personally, I’m not sure. I think the climate is changing, sure, but I don’t know why or how. People say you should trust the experts, but who’s to say they don’t have their own reasons for pushing this narrative? I’m not saying they’re wrong, I’m just saying there’s reasons not to 100% trust them. The facts keep changing – it used to be called global warming.

In this extract, we’ve highlighted various phrases in different colors corresponding to different codes. Each code describes the idea or feeling expressed in that part of the text.

At this stage, we want to be thorough: we go through the transcript of every interview and highlight everything that jumps out as relevant or potentially interesting. As well as highlighting all the phrases and sentences that match these codes, we can keep adding new codes as we go through the text.

After we’ve been through the text, we collate together all the data into groups identified by code. These codes allow us to gain a a condensed overview of the main points and common meanings that recur throughout the data.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Next, we look over the codes we’ve created, identify patterns among them, and start coming up with themes.

Themes are generally broader than codes. Most of the time, you’ll combine several codes into a single theme. In our example, we might start combining codes into themes like this:

Turning codes into themes
Codes Theme
Uncertainty
Distrust of experts
Misinformation

At this stage, we might decide that some of our codes are too vague or not relevant enough (for example, because they don’t appear very often in the data), so they can be discarded.

Other codes might become themes in their own right. In our example, we decided that the code “uncertainty” made sense as a theme, with some other codes incorporated into it.

Again, what we decide will vary according to what we’re trying to find out. We want to create potential themes that tell us something helpful about the data for our purposes.

Now we have to make sure that our themes are useful and accurate representations of the data. Here, we return to the data set and compare our themes against it. Are we missing anything? Are these themes really present in the data? What can we change to make our themes work better?

If we encounter problems with our themes, we might split them up, combine them, discard them or create new ones: whatever makes them more useful and accurate.

For example, we might decide upon looking through the data that “changing terminology” fits better under the “uncertainty” theme than under “distrust of experts,” since the data labelled with this code involves confusion, not necessarily distrust.

Now that you have a final list of themes, it’s time to name and define each of them.

Defining themes involves formulating exactly what we mean by each theme and figuring out how it helps us understand the data.

Naming themes involves coming up with a succinct and easily understandable name for each theme.

For example, we might look at “distrust of experts” and determine exactly who we mean by “experts” in this theme. We might decide that a better name for the theme is “distrust of authority” or “conspiracy thinking”.

Finally, we’ll write up our analysis of the data. Like all academic texts, writing up a thematic analysis requires an introduction to establish our research question, aims and approach.

We should also include a methodology section, describing how we collected the data (e.g. through semi-structured interviews or open-ended survey questions ) and explaining how we conducted the thematic analysis itself.

The results or findings section usually addresses each theme in turn. We describe how often the themes come up and what they mean, including examples from the data as evidence. Finally, our conclusion explains the main takeaways and shows how the analysis has answered our research question.

In our example, we might argue that conspiracy thinking about climate change is widespread among older conservative voters, point out the uncertainty with which many voters view the issue, and discuss the role of misinformation in respondents’ perceptions.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Measures of central tendency
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Discourse analysis
  • Cohort study
  • Peer review
  • Ethnography

Research bias

  • Implicit bias
  • Cognitive bias
  • Conformity bias
  • Hawthorne effect
  • Availability heuristic
  • Attrition bias
  • Social desirability bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Caulfield, J. (2023, June 22). How to Do Thematic Analysis | Step-by-Step Guide & Examples. Scribbr. Retrieved July 15, 2024, from https://www.scribbr.com/methodology/thematic-analysis/

Is this article helpful?

Jack Caulfield

Jack Caulfield

Other students also liked, what is qualitative research | methods & examples, inductive vs. deductive research approach | steps & examples, critical discourse analysis | definition, guide & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

IMAGES

  1. Types of Research Methodology: Uses, Types & Benefits

    research methodology analysis definition

  2. Research Methodology

    research methodology analysis definition

  3. Steps for preparing research methodology

    research methodology analysis definition

  4. Research and Methodology. Lecture 2

    research methodology analysis definition

  5. PPT

    research methodology analysis definition

  6. What is Research Methodology? Definition, Types, and Examples Trinka

    research methodology analysis definition

VIDEO

  1. Metho 6: The Research Process (Introduction)

  2. RESEARCH METHODOLOGY

  3. RESEARCH METHODOLOGY

  4. Definition and Concepts of Research? key points of research.#Research

  5. Research Methodology, characteristics of research definition of research

  6. Research Methodology-2

COMMENTS

  1. What Is Research Methodology? Definition + Examples

    Qualitative data analysis all begins with data coding, after which an analysis method is applied. In some cases, more than one analysis method is used, depending on the research aims and research questions. In the video below, we explore some common qualitative analysis methods, along with practical examples.

  2. What is Research Methodology? Definition, Types, and Examples

    Definition, Types, and Examples. Research methodology 1,2 is a structured and scientific approach used to collect, analyze, and interpret quantitative or qualitative data to answer research questions or test hypotheses. A research methodology is like a plan for carrying out research and helps keep researchers on track by limiting the scope of ...

  3. Research Methods

    Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make. First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:

  4. Research Methodology

    Qualitative Research Methodology. This is a research methodology that involves the collection and analysis of non-numerical data such as words, images, and observations. This type of research is often used to explore complex phenomena, to gain an in-depth understanding of a particular topic, and to generate hypotheses.

  5. What Is a Research Methodology?

    Step 1: Explain your methodological approach. Step 2: Describe your data collection methods. Step 3: Describe your analysis method. Step 4: Evaluate and justify the methodological choices you made. Tips for writing a strong methodology chapter. Other interesting articles. Frequently asked questions about methodology.

  6. What Is a Research Methodology?

    Revised on 10 October 2022. Your research methodology discusses and explains the data collection and analysis methods you used in your research. A key part of your thesis, dissertation, or research paper, the methodology chapter explains what you did and how you did it, allowing readers to evaluate the reliability and validity of your research.

  7. Research Methods

    To analyse data collected in a statistically valid manner (e.g. from experiments, surveys, and observations). Meta-analysis. Quantitative. To statistically analyse the results of a large collection of studies. Can only be applied to studies that collected data in a statistically valid manner. Thematic analysis.

  8. What is research methodology? [Update 2024]

    A research methodology encompasses the way in which you intend to carry out your research. This includes how you plan to tackle things like collection methods, statistical analysis, participant observations, and more. You can think of your research methodology as being a formula. One part will be how you plan on putting your research into ...

  9. A tutorial on methodological studies: the what, when, how and why

    Methodological studies - studies that evaluate the design, analysis or reporting of other research-related reports - play an important role in health research. They help to highlight issues in the conduct of research with the aim of improving health research methodology, and ultimately reducing research waste. We provide an overview of some of the key aspects of methodological studies such ...

  10. Research Methodology: An Introduction

    2.1 Research Methodology. Method can be described as a set of tools and techniques for finding something out, or for reducing levels of uncertainty. According to Saunders (2012) method is the technique and procedures used to obtain and analyse research data, including for example questionnaires, observation, interviews, and statistical and non-statistical techniques [].

  11. Qualitative Research

    Qualitative Research. Qualitative research is a type of research methodology that focuses on exploring and understanding people's beliefs, attitudes, behaviors, and experiences through the collection and analysis of non-numerical data. It seeks to answer research questions through the examination of subjective data, such as interviews, focus ...

  12. Research Methods: What are research methods?

    What are research methods. Research methods are the strategies, processes or techniques utilized in the collection of data or evidence for analysis in order to uncover new information or create better understanding of a topic. There are different types of research methods which use different tools for data collection.

  13. What Is Quantitative Research?

    Revised on June 22, 2023. Quantitative researchis the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalizeresults to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing ...

  14. What Is Research Methodology? (Why It's Important and Types)

    Research methodology is a way of explaining how a researcher intends to carry out their research. It's a logical, systematic plan to resolve a research problem. A methodology details a researcher's approach to the research to ensure reliable, valid results that address their aims and objectives. It encompasses what data they're going to collect ...

  15. Research Methods

    Quantitative research methods are used to collect and analyze numerical data. This type of research is useful when the objective is to test a hypothesis, determine cause-and-effect relationships, and measure the prevalence of certain phenomena. Quantitative research methods include surveys, experiments, and secondary data analysis.

  16. What are research methodologies?

    Qualitative research methodologies examine the behaviors, opinions, and experiences of individuals through methods of examination (Dawson, 2019). This type of approach typically requires less participants, but more time with each participant. It gives research subjects the opportunity to provide their own opinion on a certain topic.

  17. Research Methodology (Methods, Approaches And Techniques)

    Research methodology is a systematic app roach used to conduct research and gather relevant data to answer research questions or investigate a specific problem.

  18. What Is a Research Design

    A research design is a strategy for answering your research question using empirical data. Creating a research design means making decisions about: Your overall research objectives and approach. Whether you'll rely on primary research or secondary research. Your sampling methods or criteria for selecting subjects. Your data collection methods.

  19. Data analysis

    Data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for.

  20. Research Methodology Course A-Z Understanding and Learning

    Understand and formulate research problems. 4. Ethics of research and plagiarism issues. 5. Research Paper Efficiency and Formulation. 6. Design a research study, including defining research questions, hypotheses, and objectives. 7. Master various data collection methods such as surveys, interviews, experiments, and observational studies. 8.

  21. Academic resilience in nusing students: a concept analysis

    Academic resilience is a crucial concept for nursing students to cope with academic challenges. Currently, there is significant variation in the description of the concept attributes of academic resilience among nursing students, which impedes the advancement of academic research. Therefore, it is essential to establish a clear definition of the concept of academic resilience for nursing students.

  22. Data Analysis

    Definition: Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets.

  23. Textual Analysis

    Textual analysis is a broad term for various research methods used to describe, interpret and understand texts. All kinds of information can be gleaned from a text - from its literal meaning to the subtext, symbolism, assumptions, and values it reveals. The methods used to conduct textual analysis depend on the field and the aims of the ...

  24. A compartmental model for smoking dynamics in Italy: a pipeline for

    where \(\nu (t)\) denotes the newborns in the year t.The initial conditions of the system coincide with those of the previous model in Eq. ().The SHC model extends the system in Eq. () to account for two additional discrete time axes: age and time since smoking cessation.The final model is a compartmental model with separate compartments for each discrete age (a), where also a stratification ...

  25. Transferable preference learning in multi-objective decision analysis

    Hydrocracking represents a complex and time-consuming chemical process that converts heavy oil fractions into various valuable products with low boiling points. It plays a pivotal role in enhancing the quality of products within the oil refining process. Consequently, the development of efficient surrogate models for simulating the hydrocracking process and identifying appropriate solutions ...

  26. What Is Qualitative Research?

    Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research. Qualitative research is the opposite of quantitative research, which involves collecting and ...

  27. How to Do Thematic Analysis

    How to Do Thematic Analysis | Step-by-Step Guide & Examples. Published on September 6, 2019 by Jack Caulfield.Revised on June 22, 2023. Thematic analysis is a method of analyzing qualitative data.It is usually applied to a set of texts, such as an interview or transcripts.The researcher closely examines the data to identify common themes - topics, ideas and patterns of meaning that come up ...