Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 05 April 2024

Single-case experimental designs: the importance of randomization and replication

  • René Tanious   ORCID: orcid.org/0000-0002-5466-1002 1 ,
  • Rumen Manolov   ORCID: orcid.org/0000-0002-9387-1926 2 ,
  • Patrick Onghena 3 &
  • Johan W. S. Vlaeyen   ORCID: orcid.org/0000-0003-0437-6665 1  

Nature Reviews Methods Primers volume  4 , Article number:  27 ( 2024 ) Cite this article

271 Accesses

9 Altmetric

Metrics details

  • Data acquisition
  • Human behaviour
  • Social sciences

Single-case experimental designs are rapidly growing in popularity. This popularity needs to be accompanied by transparent and well-justified methodological and statistical decisions. Appropriate experimental design including randomization, proper data handling and adequate reporting are needed to ensure reproducibility and internal validity. The degree of generalizability can be assessed through replication.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 1 digital issues and online access to articles

92,52 € per year

only 92,52 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Kazdin, A. E. Single-case experimental designs: characteristics, changes, and challenges. J. Exp. Anal. Behav. 115 , 56–85 (2021).

Article   Google Scholar  

Shadish, W. & Sullivan, K. J. Characteristics of single-case designs used to assess intervention effects in 2008. Behav. Res. 43 , 971–980 (2011).

Tanious, R. & Onghena, P. A systematic review of applied single-case research published between 2016 and 2018: study designs, randomization, data aspects, and data analysis. Behav. Res. 53 , 1371–1384 (2021).

Ferron, J., Foster-Johnson, L. & Kromrey, J. D. The functioning of single-case randomization tests with and without random assignment. J. Exp. Educ. 71 , 267–288 (2003).

Michiels, B., Heyvaert, M., Meulders, A. & Onghena, P. Confidence intervals for single-case effect size measures based on randomization test inversion. Behav. Res. 49 , 363–381 (2017).

Aydin, O. Characteristics of missing data in single-case experimental designs: an investigation of published data. Behav. Modif. https://doi.org/10.1177/01454455231212265 (2023).

De, T. K., Michiels, B., Tanious, R. & Onghena, P. Handling missing data in randomization tests for single-case experiments: a simulation study. Behav. Res. 52 , 1355–1370 (2020).

Baek, E., Luo, W. & Lam, K. H. Meta-analysis of single-case experimental design using multilevel modeling. Behav. Modif. 47 , 1546–1573 (2023).

Michiels, B., Tanious, R., De, T. K. & Onghena, P. A randomization test wrapper for synthesizing single-case experiments using multilevel models: a Monte Carlo simulation study. Behav. Res. 52 , 654–666 (2020).

Tate, R. L. et al. The single-case reporting guideline in behavioural interventions (SCRIBE) 2016: explanation and elaboration. Arch. Sci. Psychol. 4 , 10–31 (2016).

Google Scholar  

Download references

Acknowledgements

R.T. and J.W.S.V. disclose support for the research of this work from the Dutch Research Council and the Dutch Ministry of Education, Culture and Science (NWO gravitation grant number 024.004.016) within the research project ‘New Science of Mental Disorders’ ( www.nsmd.eu ). R.M. discloses support from the Generalitat de Catalunya’s Agència de Gestió d’Ajusts Universitaris i de Recerca (grant number 2021SGR00366).

Author information

Authors and affiliations.

Experimental Health Psychology, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands

René Tanious & Johan W. S. Vlaeyen

Department of Social Psychology and Quantitative Psychology, Faculty of Psychology, University of Barcelona, Barcelona, Spain

Rumen Manolov

Methodology of Educational Sciences Research Group, Faculty of Psychology and Educational Science, KU Leuven, Leuven, Belgium

Patrick Onghena

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to René Tanious .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Tanious, R., Manolov, R., Onghena, P. et al. Single-case experimental designs: the importance of randomization and replication. Nat Rev Methods Primers 4 , 27 (2024). https://doi.org/10.1038/s43586-024-00312-8

Download citation

Published : 05 April 2024

DOI : https://doi.org/10.1038/s43586-024-00312-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

single case study research

American Psychological Association Logo

Single-Case Intervention Research

Available from.

  • Table of contents
  • Contributor bios
  • Reviews and awards
  • Book details

Thanks to remarkable methodological and statistical advances in recent years, single-case design (SCD) research has become a viable and often essential option for researchers in applied psychology, education, and related fields.

This text is a compendium of information and tools for researchers considering SCD research, a methodology in which one or several participants (or other units) comprise a systematically-controlled experimental intervention study. SCD is a highly flexible method of conducting applied intervention research where it is not feasible or practical to collect data from traditional groups of participants.

Initial chapters lay out the key components of SCDs, from articulating dependent variables to documenting methods for achieving experimental control and selecting an appropriate design model. Subsequent chapters show when and how to implement SCDs in a variety of contexts and how to analyze and interpret results.

Authors emphasize key design and analysis tactics, such as randomization, to help enhance the internal validity and scientific credibility of individual studies. This rich resource also includes in-depth descriptions of large-scale SCD research projects being undertaken at key institutions; practical suggestions from journal editors on how to get SCD research published; and detailed instructions for free, user-friendly, web-based randomization software.

Contributors

Series Foreword

Acknowledgements

Introduction: An Overview of Single-Case Intervention Research Thomas R. Kratochwill and Joel R. Levin

I. Methodologies and Analyses

  • Constructing Single-Case Research Designs: Logic and Options Robert H. Horner and Samuel L. Odom
  • Enhancing the Scientific Credibility of Single-Case Intervention Research: Randomization to the Rescue Thomas R. Kratochwill and Joel R. Levin
  • Visual Analysis of Single-Case Intervention Research: Conceptual and Methodological Issues Thomas R. Kratochwill, Joel R. Levin, Robert H. Horner, and Christopher M. Swoboda
  • Non-Overlap Analysis for Single-Case Research Richard I. Parker, Kimberly J. Vannest, and John L. Davis
  • Single-Case Permutation and Randomization Statistical Tests: Present Status, Promising New Developments John M. Ferron and Joel R. Levin
  • The Single-Case Data-Analysis ExPRT ( Excel Package of Randomization Tests ) Joel R. Levin, Anya S. Evmenova, and Boris S. Gafurov
  • Using Multilevel Models to Analyze Single-Case Design Data David M. Rindskopf and John M. Ferron
  • Analyzing Single-Case Designs: d , G , Hierarchical Models, Bayesian Estimators, Generalized Additive Models, and the Hopes and Fears of Researchers About Analyses William R. Shadish, Larry V. Hedges, James E. Pustejovsky, David M. Rindskopf, Jonathan G. Boyajian, and Kristynn J. Sullivan
  • The Role of Single-Case Designs in Supporting Rigorous Intervention Development and Evaluation at the Institute of Education Sciences Jacquelyn A. Buckley, Deborah L. Speece, and Joan E. McLaughlin

II. Reactions From Leaders in the Field

  • Single-Case Designs and Large- N Studies: The Best of Both Worlds Susan M. Sheridan
  • Using Single-Case Research Designs in Programs of Research Ann P. Kaiser
  • Reactions From Journal Editors: Journal of School Psychology Randy G. Floyd
  • Reactions From Journal Editors: School Psychology Quarterly Randy W. Kamphaus
  • Reactions From Journal Editors: School Psychology Review Matthew K. Burns

About the Editors

Thomas R. Kratochwill, PhD, is Sears Roebuck Foundation–Bascom Professor at the University of Wisconsin–Madison, director of the School Psychology Program, and a licensed psychologist in Wisconsin.

He is the author of more than 200 journal articles and book chapters.  He has written or edited more than 30 books and has made more than 300 professional presentations.

In 1977 he received the Lightner Witmer Award from APA Division 16 (School Psychology). In 1981 he received the Outstanding Research Contributions Award from the Arizona State Psychological Association and in 1995 received an award for Outstanding Contributions to the Advancement of Scientific Knowledge in Psychology from the Wisconsin Psychological Association. Also in 1995, he was the recipient of the Senior Scientist Award from APA Division 16, and the Wisconsin Psychological Association selected his research for its Margaret Bernauer Psychology Research Award.

In 1995, 2001, and 2002 the APA Division 16 journal School Psychology Quarterly selected one of his articles as the best of the year. In 2005 he received the Jack I. Bardon Distinguished Achievement Award from APA Division 16. He was selected as the founding editor of School Psychology Quarterly in 1984 and served as editor of the journal until 1992.

In 2011 Dr. Kratochwill received the Lifetime Achievement Award from the National Register of Health Service Providers in Psychology and the Nadine Murphy Lambert Lifetime Achievement Award from APA Division 16.

Dr. Kratochwill is a fellow of APA Divisions 15 (Educational Psychology), 16, and 53 (Society of Clinical Child and Adolescent Psychology). He is past president of the Society for the Study of School Psychology and was cochair of the Task Force on Evidence-Based Interventions in School Psychology. He was also a member of the APA Task Force on Evidence-Based Practice for Children and Adolescents and the recipient of the 2007 APA Distinguished Career Contributions to Education and Training of Psychologists.

He is the recipient of the University of Wisconsin–Madison Van Hise Outreach Teaching Award and a member of the University's teaching academy. Most recently he has chaired the What Works Clearinghouse Panel for the development of Standards for Single-Case Research Design for review of evidence-based interventions.

Joel R. Levin, PhD, is Professor Emeritus of Educational Psychology, University of Wisconsin–Madison and University of Arizona. He is internationally renowned for his research and writing on educational research methodology and statistical analysis as well as for his career-long program of research on students' learning strategies and study skills, with more than 400 scholarly publications in those domains. Within APA, he is a Fellow of Division 5 (Evaluation, Measurement and Statistics) and Division 15 (Educational Psychology).

From 1986 to 1988 Dr. Levin was head of the Learning and Instruction division of the American Educational Research Association (AERA), from 1991 to 1996 he was editor of APA's Journal of Educational Psychology , and from 2001 to 2003 he was coeditor of the journal Issues in Education: Contributions From Educational Psychology . During 1994–1995 he served as chair of APA's Council of Editors, and from 1993 to 1995 he was an ex-officio representative on APA's Publications and Communications Board.

Dr. Levin chaired an editors' committee that revised the statistical-reporting guidelines sections for the fourth (1994) edition of the APA Publication Manual , and he served on a similar committee that revised the fifth (2001) and sixth (2010) editions of the manual. From 2003 to 2008 he was APA's chief editorial advisor, a position in which he was responsible for mediating editor–author conflicts, managing ethical violations, and making recommendations bearing on all aspects of the scholarly research and publication process.

Dr. Levin has received two article-of-the-year awards from AERA (1972, with Leonard Marascuilo; 1973, with William Rohwer and Anne Cleary) as well as awards from the University of Wisconsin–Madison for both his teaching and his research (1971 and 1980). In 1992 he was presented with a University of Wisconsin–Madison award for his combined research, teaching, and professional service contributions, followed in 1996 by a prestigious University of Wisconsin–Madison named professorship (Julian C. Stanley Chair).

In 1997 the University of Wisconsin–Madison's School of Education honored Dr. Levin with a distinguished career award, and in 2002 he was accorded APA Division 15's highest research recognition, the E. L. Thorndike Award, for his professional achievements. In 2010 AERA's Educational Statisticians Special Interest Group presented him with an award for exceptional contributions to the field of educational statistics, and most recently, in 2013 the editorial board of the Journal of School Psychology selected his 2012 publication (with John Ferron and Thomas Kratochwill) as the Journal's outstanding article of the year.

A well-written and meaningfully structured compendium that includes the foundational and advanced guidelines for conducting accurate single-case intervention designs. Whether you are an undergraduate or a graduate student, or an applied researcher anywhere along the novice-to-expert column, this book promises to be an invaluable addition to your library. —PsycCRITIQUES

Provides valuable information about single case research design for researchers and graduate students, including methodology, statistical analyses, and the opinions of researchers who have been using it. —Doody's Review Service

This is a welcome addition to the libraries of behavioral researchers interested in knowing more about the lives of children inside and outside of school. Kratochwill and Levin and their contributing authors blend the sometimes esoteric issues of the philosophy of science, experimental design, and statistics with the real-life issues of how to get grant funding and publish research. This volume is useful for new and experienced researchers alike. —Ilene S. Schwartz, PhD, professor, University of Washington, Seattle, and director, Haring Center for Research on Inclusive Education, Seattle, WA

You may also like

Methodological Issues and Strategies, 5e

How to Mix Methods

Practical Ethics for Psychologists

The Complete Researcher

APA Handbook of Research Methods in Psychology

A systematic review of applied single-case research published between 2016 and 2018: Study designs, randomization, data aspects, and data analysis

  • Published: 26 October 2020
  • Volume 53 , pages 1371–1384, ( 2021 )

Cite this article

single case study research

  • René Tanious 1 &
  • Patrick Onghena 1  

6498 Accesses

21 Altmetric

Explore all metrics

Single-case experimental designs (SCEDs) have become a popular research methodology in educational science, psychology, and beyond. The growing popularity has been accompanied by the development of specific guidelines for the conduct and analysis of SCEDs. In this paper, we examine recent practices in the conduct and analysis of SCEDs by systematically reviewing applied SCEDs published over a period of three years (2016–2018). Specifically, we were interested in which designs are most frequently used and how common randomization in the study design is, which data aspects applied single-case researchers analyze, and which analytical methods are used. The systematic review of 423 studies suggests that the multiple baseline design continues to be the most widely used design and that the difference in central tendency level is by far most popular in SCED effect evaluation. Visual analysis paired with descriptive statistics is the most frequently used method of data analysis. However, inferential statistical methods and the inclusion of randomization in the study design are not uncommon. We discuss these results in light of the findings of earlier systematic reviews and suggest future directions for the development of SCED methodology.

Similar content being viewed by others

single case study research

A Systematic Review of Sequential Multiple-Assignment Randomized Trials in Educational Research

single case study research

Meta-Analysis and Meta-Synthesis Methodologies: Rigorously Piecing Together Research

single case study research

Desire to Find Causal Relations: Response to Robinson and Wainer’s (2023) Reflection on the Field—It’s Just an Observation

Avoid common mistakes on your manuscript.

Introduction

In single-case experimental designs (SCEDs) a single entity (e.g., a classroom) is measured repeatedly over time under different manipulations of at least one independent variable (Barlow et al., 2009 ; Kazdin, 2011 ; Ledford & Gast, 2018 ). Experimental control in SCEDs is demonstrated by observing changes in the dependent variable(s) over time under the different manipulations of the independent variable(s). Over the past few decades, the popularity of SCEDs has risen continuously as reflected in the number of published SCED studies (Shadish & Sullivan, 2011 ; Smith, 2012 ; Tanious et al., 2020 ), the development of domain-specific reporting guidelines (e.g., Tate et al., 2016a , 2016b ; Vohra et al., 2016 ), and guidelines on the quality of conduct and analysis of SCEDs (Horner, et al., 2005 ; Kratochwill et al., 2010 , 2013 ).

The What Works Clearinghouse guidelines

In educational science in particular, the US Department of Education has released a highly influential policy document through its What Works Clearinghouse (WWC) panel (Kratochwill et al., 2010 ) Footnote 1 . The WWC guidelines contain recommendations for the conduct and visual analysis of SCEDs. The panel recommended visually analyzing six data aspects of SCEDs: level, trend, variability, overlap, immediacy of the effect, and consistency of data patterns. However, given the subjective nature of visual analysis (e.g., Harrington, 2013 ; Heyvaert & Onghena, 2014 ; Ottenbacher, 1990 ), Kratochwill and Levin ( 2014 ) later called the formation of a panel for recommendations on the statistical analysis of SCEDs “ the highest imminent priority” (p. 232, emphasis in original) on the agenda of SCED methodologists. Furthermore, Kratochwill and Levin—both members of the original panel—contended that advocating for design-specific randomization schemes in line with the recommendations by Edgington ( 1975 , 1980 ) and Levin ( 1994 ) would constitute an important contribution to the development of updated guidelines.

Developments outside the WWC guidelines

Prior to the publication of updated guidelines, important progress had already been made in the development of SCED-specific statistical analyses and design-specific randomization schemes not summarized in the 2010 version of the WWC guidelines. Specifically, three interrelated areas can be distinguished: effect size calculation, inferential statistics, and randomization procedures. Note that this list includes effect size calculation even though the 2010 WWC guidelines include some recommendations for effect size calculation, but with the reference that further research is “badly needed” (p. 23) to develop novel effect size measures comparable to those used in group studies. In the following paragraphs, we give a brief overview of the developments in each area.

Effect size measures

The effect size measures mentioned in the 2010 version of the WWC guidelines mainly concern the data aspect overlap: percentage of non-overlapping data (Scruggs, Mastropieri, & Casto, 1987 ), percentage of all non-overlapping data (Parker et al., 2007 ), and percentage of data points exceeding the median (Ma, 2006 ). Other overlap-based effect size measures are discussed in Parker et al. ( 2011 ). Furthermore, the 2010 guidelines discuss multilevel models, regression models, and a standardized effect size measure proposed by Shadish et al. ( 2008 ) for comparing results between participants in SCEDs. In later years, this measure has been further developed for other designs and meta-analyses (Hedges et al., 2012 ; Hedges et al., 2013 ; Shadish et al., 2014 ) Without mentioning any specific measures, the guidelines further mention effect sizes that compare the different conditions within a single unit and standardize by dividing by the within-phase variance. These effect size measures quantify the data aspect level. Beretvas and Chung ( 2008 ) proposed for example to subtract the mean of the baseline phase from the mean of the intervention phase, and subsequently divide by the pooled within-case standard deviation. Other proposals for quantifying the data aspect level include the slope and level change procedure which corrects for baseline trend (Solanas et al., 2010 ), and the mean baseline reduction which is calculated by subtracting the mean of treatment observations from the mean of baseline observations and subsequently dividing by the mean of the baseline phase (O’Brien & Repp, 1990 ). Efforts have also been made to quantify the other four data aspects. For an overview of the available effect size measures per data aspect, the interested reader is referred to Tanious et al. ( 2020 ). Examples of quantifications for the data aspect trend include the split-middle technique (Kazdin, 1982 ) and ordinary least squares (Kromrey & Foster-Johnson, 1996 ), but many more proposals exist (see e.g., Manolov, 2018 , for an overview and discussion of different trend techniques). Fewer proposals exist for variability, immediacy, and consistency. The WWC guidelines recommend using the standard deviation for within-phase variability. Another option is the use of stability envelopes as suggested by Lane and Gast ( 2014 ). It should be noted, however, that neither of these methods is an effect size measure because they are assessed within a single phase. For the assessment of between-phase variability changes, Kromrey and Foster-Johnson ( 1996 ) recommend using variance ratios. More recently, Levin et al. ( 2020 ) recommended the median absolute deviation for the assessment of variability changes. The WWC guidelines recommend subtracting the mean of the last three baseline data points from the first three intervention data points to assess immediacy. Michiels et al. ( 2017 ) proposed the immediate treatment effect index extending this logic to ABA and ABAB designs. For consistency of data patterns, only one measure currently exists, based on the Manhattan distance between data points from experimentally similar phases (Tanious et al., 2019 ).

Inferential statistics

Inferential statistics are not summarized in the 2010 version of the WWC guidelines. However, inferential statistics do have a long and rich history in debates surrounding the methodology and data analysis of SCEDs. Excellent review articles detailing and explaining the available methods for analyzing data from SCEDs are available in Manolov and Moeyaert ( 2017 ) and Manolov and Solanas ( 2018 ). In situations in which results are compared across participants within or between studies, multilevel models have been proposed. The 2010 guidelines do mention multilevel models, but with the indication that more thorough investigation was needed before their use could be recommended. With few exceptions, such as the pioneering work by Van den Noortgate and Onghena ( 2003 , 2008 ), specific proposals for multilevel analysis of SCEDs had long been lacking. Not surprisingly, the 2010 WWC guidelines gave new impetus for the development of multilevel models for meta-analyzing SCEDs. For example, Moeyaert, Ugille, et al. ( 2014b ) and Moeyaert, Ferron, et al. ( 2014a ) discuss two-level and three-level models for combining results across single cases. Baek et al. ( 2016 ) suggested a visual analytical approach for refining multilevel models for SCEDs. Multilevel models can be used descriptively (i.e., to find an overall treatment effect size), inferentially (i.e., to obtain a p value or confidence interval), or a mix of both.

  • Randomization

One concept that is closely linked to inferential statistics is randomization. In the context of SCEDs, randomization refers to the random assignment of measurements to treatment levels (Onghena & Edgington, 2005 ). Randomization, when ethically and practically feasible, can reduce the risk of bias in SCEDs and strengthen the internal validity of the study (Tate et al., 2013 ). To incorporate randomization into the design, specific randomization schemes are needed, as previously stated (Kratochwill & Levin, 2014 ). In alternation designs, randomization can be introduced by randomly alternating the sequence of conditions, either unrestricted or restricted (e.g., maximum of two consecutive measurements under the same condition) (Onghena & Edgington, 1994 ). In phase designs (e.g., ABAB), multiple baseline designs, and changing criterion designs, where no rapid alternation of treatments takes place, it is possible to randomize the moment of phase change after a minimum number of measurements has taken place in each phase (Marascuilo & Busk, 1988 ; Onghena, 1992 ). In multiple baseline designs, it is also possible to predetermine different baseline phase lengths for each tier and then randomly allocate participants to different baseline phase lengths (Wampold & Worsham, 1986 ). Randomization tests use the randomization actually present in the design for quantifying the probability of the observed effect occurring by chance. These tests are among the earliest data analysis techniques specifically proposed for SCEDs (Edgington, 1967 , 1975 , 1980 ).

The main aim of the present paper is to systematically review the methodological characteristics of recently published SCEDs with an emphasis on the data aspects put forth in the WWC guidelines. Specific research questions are:

What is the frequency of the various single-case design options?

How common is randomization in the study design?

Which data aspects do applied researchers include in their analysis?

What is the frequency of visual and statistical data analysis techniques?

For systematic reviews of SCEDs predating the publication of the WWC guidelines, the interested reader is referred to Hammond and Gast ( 2010 ), Shadish and Sullivan ( 2011 ), and Smith ( 2012 ).

Justification for publication period selection

The present systematic review deals with applied SCED studies published in the period from 2016 to 2018. The reasons for the selection of this period are threefold: relevance, sufficiency, and feasibility. In terms of relevance, there is a noticeable lack of recent systematic reviews dealing with the methodological characteristics of SCEDs in spite of important developments in the field. Apart from the previously mentioned reviews predating the publication of the 2010 WWC guidelines, only two reviews can be mentioned that were published after the WWC guidelines. Solomon ( 2014 ) reviewed indicators of violations of normality and independence in school-based SCED studies until 2012. More recently, Woo et al. ( 2016 ) performed a content analysis of SCED studies published in American Counseling Association journals between 2003 and 2014. However, neither of these reviews deals with published SCEDs in relation to specific guidelines such as WWC. In terms of sufficiency, a three-year period can give sufficient insight into recent trends in applied SCEDs. In addition, it seems reasonable to assume a delay between the publication of guidelines such as WWC and their impact in the field. For example, several discussion articles regarding the WWC guidelines were published in 2013. Wolery ( 2013 ) and Maggin et al. ( 2013 ) pointed out perceived weaknesses in the WWC guidelines, which in turn prompted a reply by the original authors (Hitchcock et al., 2014 ). Discussions like these can help increase the exposure of the guidelines among applied researchers. In terms of feasibility, it is important to note that we did not set any specification on the field of study for inclusion. Therefore, the period of publication had to remain feasible and manageable to read and code all included publications across all different study fields (education, healthcare, counseling, etc.).

Data sources

We performed a broad search of the English-language SCED literature using PubMed and Web of Science. The choice for these two search engines was based on Gusenbauer and Haddaway ( 2019 ), who assessed the eligibility of 26 search engines for systematic reviews. Gusenbauer and Haddaway came to the conclusion that PubMed and Web of Science could be used as primary search engines in systematic reviews, as they fulfilled all necessary requirements such as functionality of Boolean operators and reproducibility of search results in different locations and at different times. We selected only these two of all eligible search engines to keep the size of the project manageable and to prevent excessive overlap between the results. Table 1 gives an overview of the search terms we used and the number of hits per search query. This list does not exclude duplicates between the search terms and between the two search engines. For all designs containing the term “randomized” (e.g., randomized block design) we added the Boolean operator AND specified that the search results must also contain either the term “single-case” or “single-subject”. An initial search for randomized designs without these specifications yielded well over 1000 results per search query.

Study selection

We specifically searched for studies published between 2016 and 2018. We used the date of first online publication to determine whether an article met this criterion (i.e., articles that were published online during this period, even if not yet published in print). Initially, the abstracts and article information of all search results were scanned for general exclusion criteria. In a first step, all articles that fell outside the date range of interest were excluded, as well as articles for which the full text was not available or only available against payment. We only included articles written in English. In a second step, all duplicate articles were deleted. From the remaining unique search results, all articles that did not use any form of single-case experimentation were excluded. Such studies include for example non-experimental forms of case studies. Lastly, all articles not reporting any primary empirical data were excluded from the final sample. Thus, purely methodological articles were discarded. Methodological articles were defined as articles that were within the realm of SCEDs but did not report any empirical data or reported only secondary empirical data. Generally, these articles propose new methods for analyzing SCEDs or perform simulation studies to test existing methods. Similarly, commentaries, systematic reviews, and meta-analyses were excluded from the final sample, as such articles do not contain primary empirical data. In line with systematic review guidelines (Staples & Niazi, 2007 ), the second author verified the accuracy of the selection process. Ten articles were randomly selected from an initial list of all search results for a joint discussion between the authors, and no disagreements about the selection emerged. Figure 1 presents the study attrition diagram.

figure 1

Study attrition diagram

Coding criteria

For all studies, the basic design was coded first. For coding the design, we followed the typology presented in Onghena and Edgington ( 2005 ) and Tate et al. ( 2016a ) with four overarching categories: phase designs, alternation designs, multiple baseline designs, and changing criterion designs. For each of these categories, different design options exist. Common variants of phase designs include for example AB and ABAB, but other forms also exist, such as ABC. Within the alternation designs category the main variants are the completely randomized design, the alternating treatments designs, and the randomized block design. Multiple baseline designs can be conducted across participants, behaviors, or settings. They can be either concurrent, meaning that all participants start the study at the same time, or non-concurrent. Changing criterion designs can employ either a single-value criterion or a range-bound criterion. In addition to these four overarching categories, we added a design category called hybrid Footnote 2 . The hybrid category consists of studies using several design strategies combined, for example a multiple baseline study with an integrated alternating treatments design. For articles reporting more than one study, each study was coded separately. For coding the basic design, we followed the authors’ original description of the study.

Randomization was coded as a dichotomous variable, i.e., either present or not present. In order to be coded as present, some form of randomization had to be present in the design itself, as previously defined in the randomization section. Studies with a fixed order of treatments or phase change moments with randomized stimulus presentation, for example, were coded as randomization not present.

Data aspect

A major contribution of the WWC guidelines was the establishment of six data aspects for the analysis of SCEDs: level, trend, variability, overlap, immediacy, and consistency. Following the guidelines, these data aspects can be defined operationally as follows. Level is the mean score within a phase. The straight line best fitting the data within a phase refers to the trend. The standard deviation or range in a phase represents the data aspect variability. The proportion of data points overlapping between adjacent phases is the data aspect overlap. The immediacy of an effect is assessed by a comparison of the last three data points of an intervention with the first three data points of the subsequent intervention. Finally, consistency Footnote 3 is assessed by comparing data patterns from experimentally similar interventions. In multiple baseline designs, consistency can be assessed horizontally (within series) when more than one phase change is present, and vertically (across series) by comparing experimentally similar phases across participants, behaviors, or settings. It was of course possible that studies reported more than one data aspect or none at all. For studies reporting more than one data aspect, each data aspect was coded separately.

Data analysis

The data analysis methods were coded directly from the authors’ description in the “data analysis” section. If no such section was present, the data analysis methods were coded according to the presentation of the results. Generally, two main forms of data analysis for SCEDs can be distinguished: visual and statistical analysis. In the visual analytical approach, a time series graph of the dependent variable under the different experimental conditions is analyzed to determine treatment effectiveness. The statistical analytical approach can be roughly divided into two categories: descriptive and inferential statistics. Descriptive statistics summarize the data without quantifying the uncertainty in the description. Examples of descriptive statistics include means, standard deviations, and effect sizes. Inferential statistics imply an inference from the observed results to unknown parameter values and quantify the uncertainty for doing so, for example, by providing p values and confidence intervals.

Number of participants

Finally, for each study we coded the number of participants, counting only participants who appeared in the results section. Participants who dropped out prematurely and whose data were not analyzed, were not counted.

General results

For each coding category, the interrater agreement was calculated with the formula \( \frac{\mathrm{no}.\kern0.5em \mathrm{of}\ \mathrm{agreements}}{\mathrm{no}.\kern0.5em \mathrm{of}\ \mathrm{agreements}+\mathrm{no}.\kern0.5em \mathrm{of}\ \mathrm{disagreements}} \) based on ten randomly selected articles. The interrater agreement was as follow: design (90%), analysis (60%), data aspect (80%), randomization (100%), number of participants (80%). Given the initial moderate agreement for analysis, the two authors discussed discrepancies and then reanalyzed a new sample of ten randomly selected articles. The interrater reliability for analysis then increased to 90%.

In total, 406 articles were included in the final sample, which represented 423 studies. One hundred thirty-eight of the 406 articles (34.00%) were published in 2016, 150 articles (36.95%) were published in 2017, and 118 articles (29.06%) were published in 2018. Out of the 423 studies, the most widely used form of SCEDs was the multiple baseline design, which accounted for 49.65% ( N  = 210) of the studies included in the final sample. Across all studies and designs, the median number of participants was three (IQR = 3). The most popular data analysis technique across all studies was visual analysis paired with descriptive statistics, which was used in 48.94% ( N  = 207) of the studies. The average number of data aspects analyzed per study was 2.61 ( SD =  1.63). The most popular data aspect across all designs and studies was level (83.45%, N =  353). Overall, 22.46% ( N  = 95) of the 423 studies included randomization in the design. However, these results vary between the different designs. In the following sections, we therefore present a summary of the results per design. A detailed overview of all the results per design can be found in Table 2 .

Results per design

Phase designs.

Phase designs accounted for 25.53% ( N  = 108) of the studies included in the systematic review. The median number of participants for phase designs was three (IQR = 4). Visual analysis paired with descriptive statistics was the most popular data analysis method for phase designs (40.74%, N  = 44), and the majority of studies analyzed several data aspects (54.62%, N  = 59); 20.37% ( N  = 22) did not report any of the six data aspects. The average number of data aspects analyzed in phase designs was 2.02 ( SD =  2.07). Level was the most frequently analyzed data aspect for phase designs (73.15%, N  = 79). Randomization was very uncommon in phase designs and was included in only 5.56% ( N  = 6) of the studies.

Alternation designs

Alternation designs accounted for 14.42% ( N  = 61) of the studies included in the systematic review. The median number of participants for alternation designs was three (IQR = 1). More than half of the alternation design studies used visual analysis paired with descriptive statistics (57.38%, N  = 35). The majority of alternation design studies analyzed several data aspects (75.41%, N  = 46), while 11.48% ( N  = 7) did not report which data aspect was the focus of analysis. The average number of data aspects analyzed in alternation designs was 2.38 ( SD =  2.06). The most frequently analyzed data aspect for alternation designs was level (85.25%, N =  52). Randomization was used in the majority of alternation designs (59.02%, N  = 36).

Multiple baseline designs

Multiple baseline designs, by a large margin the most prevalent design, accounted for nearly half of all studies (49.65%, N  = 210) included in the systematic review. The median number of participants for multiple baseline designs was four (IQR = 4). A total of 49.52% ( N  = 104) of multiple baseline studies were analyzed using visual analysis paired with descriptive statistics, and the vast majority (80.95%, N  = 170) analyzed several data aspects, while only 7.14% ( N  = 15) did not report any of the six data aspects. The average number of data aspects analyzed in multiple baseline designs was 3.01 ( SD =  1.61). The most popular data aspect was level, which was analyzed in 87.62% ( N =  184) of all multiple baseline designs. Randomization was not uncommon in multiple baseline designs (20.00%, N  = 42).

Changing criterion design

Changing criterion designs accounted for 1.42% ( N  = 6) of the studies included in the systematic review. The median number of participants for changing criterion designs was three (IQR = 0); 66.67% ( N =  4) of changing criterion designs were analyzed using visual analysis paired with descriptive statistics. Half of the changing criterion designs analyzed several data aspects ( N =  3), and one study (16.67%) did not report any data aspect. The average number of data aspects analyzed in changing criterion designs was 1.83 ( SD =  1.39). The most popular data aspect was level (83.33%, N  = 5). None of the changing criterion design studies included randomization in the design.

Hybrid designs

Hybrid designs accounted for 8.98% ( N  = 38) of the studies included in the systematic review. The median number of participants for hybrid designs was three (IQR = 2). A total of 52.63% ( N  = 20) of hybrid designs were analyzed with visual analysis paired with descriptive statistics, and the majority of studies analyzed several data aspects (73.68%, N  = 28); 10.53% ( N  = 4) did not report any of the six data aspects. The average number of data aspects considered for analysis was 2.55 ( SD =  2.02). The most popular data aspect was level (86.84%, N  = 33). Hybrid designs showed the second highest proportion of studies including randomization in the study design (28.95%, N  = 11).

Results per data aspect

Out of the 423 studies included in the systematic review, 72.34% ( N =  306) analyzed several data aspects, 16.08% ( N =  68) analyzed one data aspect, and 11.58% ( N =  49) did not report any of the six data aspects.

Across all designs, level was by far the most frequently analyzed data aspect (83.45%, N =  353). Remarkably, nearly all studies that analyzed more than one data aspect included the data aspect level (96.73%, N =  296). Similarly, for studies analyzing only one data aspect, there was a strong prevalence of level (83.82%, N =  57). For studies that only analyzed level, the most common form of analysis was visual analysis paired with descriptive statistics (54.39%, N =  31).

Trend was the third most popular data aspect. It was analyzed in 45.39% ( N =  192) of all studies included in the systematic review. There were no studies in which trend was the only data aspect analyzed, meaning that trend was always analyzed alongside other data aspects, making it difficult to isolate the analytical methods specifically used to analyze trend.

Variability

The data aspect variability was analyzed in 59.10% ( N =  250) of the studies, making it the second most prominent data aspect. A total of 80.72% ( N =  247) of all studies analyzing several data aspects included variability. However, variability was very rarely the only data aspect analyzed. Only 3.3% ( N =  3) of the studies analyzing only one data aspect focused on variability. All three studies that analyzed only variability did so using visual analysis.

The data aspect overlap was analyzed in 35.70% ( N =  151) of all studies and was thus the fourth most analyzed data aspect. Nearly half of all studies analyzing several data aspects included overlap (47.08%, N =  144). For studies analyzing only one data aspect, overlap was the second most common data aspect after level (10.29%, N =  7). The most common mode of analysis for these studies was descriptive statistics paired with inferential statistics (57.14%, N =  4).

The immediacy of the effect was assessed in 28.61% ( N =  121) of the studies, making it the second least analyzed data aspect; 39.22% ( N =  120) of the studies analyzing several data aspects included immediacy. Only one study analyzed immediacy as the sole data aspect, and this study used visual analysis.

Consistency

Consistency was analyzed in 9.46% ( N =  40) of the studies and was thus by far the least analyzed data aspect. It was analyzed in 13.07% ( N =  40) of the studies analyzing several data aspects and was never the focus of analysis for studies analyzing only one data aspect.

As stated previously, 72.34% ( N =  306) of all studies analyzed several data aspects. For these studies, the average number of data aspects analyzed was 3.39 ( SD =  1.18). The most popular data analysis technique for several data aspects was visual analysis paired with descriptive statistics (56.54%, N =  173).

Not reported

As mentioned previously, 11.58% ( N =  49) did not report any of the six data aspects. For these studies, the most prominent analytical technique was visual analysis alone (61.22%, N =  30). Of all studies not reporting any of the six data aspects, the highest proportion was phase designs (44.90%, N =  22).

Results per analytical method

Visual analysis, without the use of any descriptive or inferential statistics, was the analytical method used in 16.78% ( N =  71) of all included studies. Of all studies using visual analysis, the majority were multiple baseline design studies (45.07%, N =  32). The majority of studies using visual analysis did not report any data aspect (42.25%, N =  30), closely followed by several data aspects (40.85%, N =  29). Randomization was present in 20.53% ( N =  16) of all studies using visual analysis.

Descriptive statistics

Descriptive statistics, without the use of visual analysis, was the analytical method used in 3.78% ( N =  16) of all included studies. The most common designs for studies using descriptive statistics were phase designs and multiple baseline designs (both 43.75%, N =  7). Half of the studies using descriptive statistics (50.00%, N =  8) analyzed the data aspect level, and 37.5% ( N =  6) analyzed several data aspects. One study (6.25%) using descriptive statistics included randomization.

Inferential statistics, without the use of visual analysis, was the analytical method used in 2.84% ( N =  12) of all included studies. The majority of studies using inferential statistics were phase designs (58.33%, N =  7) and did not report any of the six data aspects (58.33%, N =  7). Of the remaining studies, three (25.00%) reported several data aspects, and two (16.67%) analyzed the data aspect level. Two studies (16.67) using inferential statistical analysis included randomization.

Descriptive and inferential statistics

Descriptive statistics combined with inferential statistics, but without the use of visual analysis, accounted for 5.67% ( N  = 24) of all included studies. The majority of studies using this combination of analytical methods were multiple baseline designs (62.5%, N =  15), followed by phase designs (33.33%, N =  8). There were no alternation or hybrid designs using descriptive and inferential statistics. Most of the studies using descriptive and inferential statistics analyzed several data aspects (41.67%, N =  10), followed by the data aspect level (29.17%, N =  7); 16.67% ( N =  4) of the studies using descriptive and inferential statistics included randomization.

Visual and descriptive statistics

As mentioned previously, visual analysis paired with descriptive statistics was the most popular analytical method. This method was used in nearly half (48.94%, N  = 207) of all included studies. The majority of these studies were multiple baseline designs (50.24%, N =  104), followed by phase designs (21.25%, N =  44). This method of analysis was prevalent across all designs. Nearly all of the studies using this combination of analytical methods analyzed either several data aspects (83.57%, N =  173) or level only (14.98%, N =  31). Randomization was present in 19.81% ( N =  41) of all studies using visual and descriptive analysis.

Visual and inferential statistics

Visual analysis paired with inferential statistics accounted for 2.60% ( N  = 11) of the included studies. The largest proportion of these studies were phase designs (45.45%, N  = 5), followed by multiple baseline designs and hybrid designs (both 27.27%, N =  3). This combination of analytical methods was thus not used in alternation or changing criterion designs. The majority of studies using visual analysis and inferential statistics analyzed several data aspects (72.73%, N =  8), while 18.18% ( N =  2) did not report any data aspect. One study (9.10%) included randomization.

Visual, descriptive, and inferential statistics

A combination of visual analysis, descriptive statistics, and inferential statistics was used in 18.44% ( N =  78) of all included studies. The majority of the studies using this combination of analytical methods were multiple baseline designs (56.41%, N =  44), followed by phase designs (23.08%, N =  18). This analytical approach was used in all designs except changing criterion designs. Nearly all studies using a combination of these three analytical methods analyzed several data aspects (97.44%, N =  76). These studies also showed the highest proportion of randomization (38.46%, N =  30).

None of the above

A small proportion of studies did not use any of the above analytical methods (0.95%, N =  4). Three of these studies (75%) were phase designs and did not report any data aspect. One study (25%) was a multiple baseline design that analyzed several data aspects. Randomization was not used in any of these studies.

To our knowledge, the present article is the first systematic review of SCEDs specifically looking at the frequency of the six data aspects in applied research. The systematic review has shown that level is by a large margin the most widely analyzed data aspect in recently published SCEDs. The second most popular data aspect from the WWC guidelines was variability, which was usually assessed alongside level (e.g., a combination of mean and standard deviation or range). The fact that these two data aspects are routinely assessed in group studies may be indicative of a lack of familiarity with SCED-specific analytical methods by applied researchers, but this remains speculative. Phase designs showed the highest proportion of studies not reporting any of the six data aspects and the second lowest number of data aspects analyzed on average, only second to changing criterion designs. This was an unexpected finding given that the WWC guidelines were developed specifically in the context of (and with examples of) phase designs. The multiple baseline design showed the highest number of data aspects analyzed and at the same time the lowest proportion of studies not analyzing any of the six data aspects.

These findings regarding the analysis and reporting of the six data aspects need more contextualization. The selection of data aspects for the analysis depends on the research questions and expected data pattern. For example, if the aim of the intervention is a gradual change over time, then trend becomes more important. If the aim of the intervention is a change in level, then it is import to also assess trend (to verify that the change in level is not just a continuation of a baseline trend) and variability (to assess whether the change in level is caused by excessive variability). In addition, assessing consistency can add information on whether the change in level is consistent over several repetitions of experimental conditions (e.g., in phase designs). Similarly, if an abrupt change in level of target behavior is expected after changing experimental conditions, then immediacy becomes a more relevant data aspect in addition to trend, variability, and level. The important point here is that oftentimes the research team has an idea of the expected data pattern and should choose the analysis of data aspects accordingly. The strong prevalence of level found in the present review could be indicative of a failure to assess other data aspects that may be relevant to demonstrate experimental control over an independent variable.

In line with the findings of earlier systematic reviews (Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ; Smith, 2012 ), the multiple baseline design continues to be the most frequently used design, and despite the advancement of sophisticated statistical methods for the analysis of SCEDs, two thirds of all studies still relied on visual analysis alone or visual analysis paired with descriptive statistics. A comparison to the findings of Shadish and Sullivan further reveals that the number of participants included in SCEDs has remained steady over the past decade at around three to four participants. The relatively small number of changing criterion designs in the present findings is partly due to the fact that changing criterion designs were often combined with other designs and thus coded in the hybrid category, even though we did not formally quantify that. This finding is supported by the results of Shadish and Sullivan, who found that changing criterion designs are more often used as part of hybrid designs than as a standalone design. Hammond and Gast even excluded changing criterion design from their review due to its low prevalence. They found a total of six changing criterion designs published over a period of 35 years. It should be noted, however, that the low prevalence of changing criterion designs is not indicative of the value of this design.

Regarding randomization, the results cannot be interpreted against earlier benchmarks, as neither Smith nor Shadish and Sullivan or Hammond and Gast quantified the proportion of randomized SCEDs. Overall, randomization in the study design was not uncommon. However, the proportion of randomized SCEDs differed greatly between different designs. The results showed that alternating treatments designs have the highest proportion of studies including randomization. This result was to be expected given that alternating treatments designs are particularly suited to incorporate randomization. In fact, when Barlow and Hayes ( 1979 ) first introduced the alternating treatments design, they emphasized randomization as an important part of the design: “Among other considerations, each design controls for sequential confounding by randomizing the order of treatment […]” (p. 208). Besides that, alternating treatments designs could work with already existing randomization procedures, such as the randomized block procedure proposed by Edgington ( 1967 ). The different design options for alternating treatments designs (e.g., randomized block design) and accompanying randomization procedures are discussed in detail in Manolov and Onghena ( 2018 ). For multiple baseline designs, a staggered introduction of the intervention is needed. Proposals to randomize the order of the introduction of the intervention have been around since the 1980s (Marascuilo & Busk, 1988 ; Wampold & Worsham, 1986 ). These randomization procedures have their counterparts in group studies where particpants are randomdly assigned to treatments or different blocks of treatments. Other randomization procedures for multiple baseline designs are discussed in Levin et al. ( 2018 ). These include the restricted Marascuilo–Busk procedure proposed by Koehler and Levin and the randomization test procedure proposed by Revusky. For phase designs and changing criterion designs, the incorporation of randomization is less evident. For phase designs, Onghena ( 1992 ) proposed a method to randomly determine the moment of phase change between two succesive phases. However, this method is rather uncommon and has no counterpart in group studies. Specific randomization schemes for changing criterion designs have only very recently been proposed (Ferron et al., 2019 ; Manolov et al., 2020 ; Onghena et al., 2019 ), and it remains to be seen how common they will become in applied SCEDs.

Implications for SCED research

The results of the systematic review have several implications for SCED research regarding methodology and analyses. An important finding of the present study is that the frequency of use of randomization differs greatly between different designs. For example, while phase designs were found to be the second most popular design, randomization is used very infrequently for this design type. Multiple baseline designs, as the most frequently used design, showed a higher percentage of randomized studies, but only every fifth study used randomization. Given that randomization in the study design increases the internal and statistical conclusion validity irrespective of the design, it seems paramount to further stress the importance of the inclusion of randomization beyond alternating treatments designs. Another implication concerns the analysis of specific data aspects. While level was by a large margin the most popular data aspect, it is important to stress that conclusions based on only one data aspect may be misleading. This seems particularly relevant for phase designs, which were found to contain the highest proportion of studies not reporting any of the six data aspects and the lowest proportion of studies analyzing several data aspects (apart from changing criterion designs, which only accounted for a very small proportion of the included studies). A final implication concerns the use of analytical methods, in particular triangulation of different methods. Half of the included studies used visual analysis paired with descriptive statistics. These methods should of course not be discarded, as they generate important information about the data, but they cannot make statements regarding the uncertainty of a possible intervention effect. Therefore, triangulation of visual analysis, descriptive statistics, and inferential statistics should form an important part of future guidelines on SCED analysis.

Reflections on updated WWC guidelines

Updated WWC guidelines were recently published, after the present systematic review had been conducted (What Works Clearinghouse, 2020a , 2020c ). Two major changes in the updated guidelines are of direct relevance to the present systematic review: (a) the removal of visual analysis for demonstrating intervention effectiveness and (b) recommendation for a design comparable effect size measure for demonstrating intervention effects (D-CES, Pustejovsky et al., 2014 ; Shadish et al., 2014 ). This highlights a clear shift away from visual analysis towards statistical analysis of SCED data, especially compared to the 2010 guidelines. These changes in the guidelines have prompted responses from the public, to which What Works Clearinghouse ( 2020b ) published a statement addressing the concerns. Several concerns relate to the removal of visual analysis. In response to a concern that visual analysis should be reinstated, the panel clearly states that “visual analysis will not be used to characterize study findings” (p. 3). Another point from the public concerned the analysis of studies where no effect size can be calculated (e.g., due to unavailability of raw data). Even in these instances, the panel does not recommend visual analysis. Rather, “the WWC will extract raw data from those graphs for use in effect size computation” (p. 4). In light of the present findings, these statements are particularly noteworthy. Given that the present review found a strong continued reliance on visual analysis, it remains to be seen if and how the updated WWC guidelines impact the analyses conducted by applied SCED researchers.

Another update of relevance in the recent guidelines concerns the use of design categories. While the 2010 guidelines were demonstrated with the example of a phase design, the updated guidelines include quality rating criteria for each major design option. Given that the present results indicate a very low prevalence of the changing criterion design in applied studies, the inclusion of this design in the updated guidelines may increase the prominence of the changing criterion design. For changing criterion designs, the updated guidelines recommend that “the reversal or withdrawal (AB) design standards should be applied to changing criterion designs” (What Works Clearinghouse, 2020c , p. 80). With phase designs being the second most popular design choice, this could further facilitate the use of the changing criterion design.

While other guidelines on conduct and analysis (e.g., Tate et al., 2013 ), as well as members of the 2010 What Works Clearinghouse panel (Kratochwill & Levin, 2014 ), have clearly highlighted the added value of randomization in the design, the updated guidelines do not include randomization procedures for SCEDs. Regarding changes between experimental conditions, the updated guidelines state that “the independent variable is systematically manipulated, with the researcher determining when and how the independent variable conditions change” (What Works Clearinghouse, 2020c , p. 82). While the frequency of use of randomization differs considerably between different designs, the present review has shown that overall randomization is not uncommon. The inclusion of randomization in the updated guidelines may therefore have offered guidance to applied researchers wishing to incorporate randomization into their SCEDs, and may have further contributed to the popularity of randomization.

Limitations and future research

One limitation of the current study concerns the used databases. SCEDs that were published in journals that are not indexed in these databases may not have been included in our sample. A similar limitation concerns the search terms used in the systematic search. In this systematic review, we focused on the common names “single-case” and “single-subject.” However, as Shadish and Sullivan ( 2011 ) note, SCEDs go by many names. They list several less common alternative terms: instrasubject replication design (Gentile et al., 1972 ), n -of-1 design (Center et al., 1985 -86), intrasubject experimental design (White et al., 1989 ), one-subject experiment (Edgington, 1980 ), and individual organism research (Michael, 1974 ). Even though these terms date back to the 1970s and 1980s, a few authors may still use them to describe their SCED studies. Studies using these terms may not have come up during the systematic search. It should furthermore be noted that we followed the original description provided by the authors for the coding of the design and analysis to reduce bias. We therefore made no judgments regarding the correctness or accuracy of the authors’ naming of the design and analysis techniques.

The systematic review offers several avenues for future research. The first avenue may be to explore more in depth the reasons for the unequal distribution of data aspects. As the systematic review has shown, level is assessed far more often than the other five data aspects. While level is an important data aspect, failing to assess it alongside other data aspects can lead to erroneous conclusions. Gaining an understanding of the reasons for the prevalence of level, for example through author interviews or questionnaires, may help to improve the quality of data analysis in applied SCEDs.

In a similar vein, a second avenue of future research may explore why randomization is much more prevalent in some designs. Apart from the aforementioned differences in randomization procedures between designs, it may be of interest to gain a better understanding of the reasons that applied researchers see for randomizing their SCEDs. As the incorporation of randomization enhances the internal validity of the study design, promoting the inclusion of randomization for designs other than alternation designs will help in advancing the credibility of SCEDs in the scientific community. Searching the methodological sections of the articles that used randomization may be a first step to gain a better understanding of why applied researchers use randomization. Such a text search may reveal how the authors discuss randomization and which reasons they name for randomizing. A related question is how the randomization was actually carried out. For example, was the randomization carried out a priori or in a restricted way taking into account the evolving data pattern? A deeper understanding of the reasons for randomizing and the mechanisms of randomization may be gained by author interviews or questionnaires.

A third avenue of future research may explore in detail the specifics of inferential analytical methods used to analyze SCED data. Within the scope of the present review, we only distinguished between visual, descriptive and inferential statistics. However, deeper insight into the inferential analysis methods and their application to SCED data may help to understand the viewpoint of applied researchers. This may be achieved through a literature review of articles that use inferential analysis. Research questions for such a review may include: Which inferential methods do applied SCED researchers use and what is the frequency of these methods? Are these methods adapted to SCED methodology? And how do applied researchers justify their choice for an inferential method? Similar questions may also be answered for effect size measures understood as descriptive statistics. For example, why do applied researchers choose a particular effect size measure over a competing one? Are these effect size measures adapted to SCED research?

Finally, future research may go into greater detail about the descriptive statistics used in SCEDs. In the present review, we distinguished between two major categories: descriptive and inferential statistics. Effect sizes that were not accompanied by a standard error, confidence limits, or by the result of a significance test were coded in the descriptive statistics category. Effect sizes do however go beyond merely summarizing the data by quantifying the treatment effect between different experimental conditions, contrary to within phase quantifications such as the mean and standard deviation. Therefore, future research may examine in greater detail the use of effect sizes separately from other descriptive statistics such the mean and standard deviation. Such research could focus in depth on the exact methods used to quantify each data aspect in the form of either a quantification (e.g., mean or range) or an effect size measure (e.g., standardized mean difference or variance ratios).

The What Works Clearinghouse panel ( 2020a , 2020c ) has recently released an updated version of the guidelines. We will discuss the updated guidelines in light of the present findings in the Discussion section.

As holds true for most single-case designs, the same design is often described with different terms. For example, Ledford and Gast ( 2018 ) call these designs combination designs, and Moeyaert et al. ( 2020 ) call them combined designs. Given that this is a purely terminological question, it is hard to argue in favor of one term over the other. We do, however, prefer the term hybrid, given that it emphasizes that neither of the designs remains in its pure form. For example, a multiple baseline design with alternating treatments is not just a combination of a multiple baseline design and an alternating treatments design. It is rather a hybrid of the two. This term is also found in recent literature (e.g., Pustejovski & Ferron, 2017 ; Swan et al., 2020 ).

For the present systematic review, we strictly followed the data aspects as outlined in the 2010 What Works Clearinghouse guidelines. While the assessment of consistency of effects is an important data aspect, this data aspect is not described in the guidelines. Therefore, we did not code it in the present review.

Baek, E. K., Petit-Bois, M., Van den Noortgate, W., Beretvas, S. N., & Ferron, J. M. (2016). Using visual analysis to evaluate and refine multilevel models of single-case studies. The Journal of Special Education, 50 , 18-26. https://doi.org/10.1177/0022466914565367 .

Article   Google Scholar  

Barlow, D. H., & Hayes, S. C. (1979). Alternating Treatments Design: One Strategy for Comparing the Effects of Two Treatments in a Single Subject. Journal of Applied Behavior Analysis, 12 , 199-210. https://doi.org/10.1901/jaba.1979.12-199 .

Article   PubMed   PubMed Central   Google Scholar  

Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies for studying behavior change ( 3rd ). Pearson.

Beretvas, S. N., & Chung, H. (2008). A review of meta-analyses of single-subject experimental designs: Methodological issues and practice. Evidence-Based Communication Assessment and Intervention, 2 , 129-141. https://doi.org/10.1080/17489530802446302 .

Center, B. A., Skiba, R. J., & Casey, A. (1985-86). A Methodology for the Quantitative Synthesis of Intra-Subject Design research. Journal of Special Education, 19 , 387–400. https://doi.org/10.1177/002246698501900404 .

Edgington, E. S. (1967). Statistical inference from N=1 experiments. The Journal of Psychology, 65 , 195-199. https://doi.org/10.1080/00223980.1967.10544864 .

Article   PubMed   Google Scholar  

Edgington, E. S. (1975). Randomization tests for one-subject operant experiments. The Journal of Psychology, 90 , 57-68. https://doi.org/10.1080/00223980.1975.9923926 .

Edgington, E. S. (1980). Random assignment and statistical tests for one-subject experiments. Journal of Educational Statistics, 5 , 235-251.

Ferron, J., Rohrer, L. L., & Levin, J. R. (2019). Randomization procedures for changing criterion designs. Behavior Modification https://doi.org/10.1177/0145445519847627 .

Gentile, J. R., Roden, A. H., & Klein, R. D. (1972). An analysis-of-variance model for the intrasubject replication design. Journal of Applied Behavior Analysis, 5 , 193-198. https://doi.org/10.1901/jaba.1972.5-193 .

Gusenbauer, M., & Haddaway, N. R. (2019). Which academic search systems are suitable for systematic Reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed and 26 other Resources. Research Synthesis Methods https://doi.org/10.1002/jrsm.1378 .

Hammond, D., & Gast, D. L. (2010). Descriptive analysis of single subject research designs: 1983—2007. Education and Training in Autism and Developmental Disabilities, 45 , 187-202.

Google Scholar  

Harrington, M. A. (2013). Comparing visual and statistical analysis in single-subject studies. Open Access Dissertations , Retrieved from http://digitalcommons.uri.edu/oa_diss .

Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean difference effect size for single case designs. Research Synthesis Methods, 3 , 224-239. https://doi.org/10.1002/jrsm.1052 .

Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2013). A standardized mean difference effect size for multiple baseline designs across individuals. Research Synthesis Methods, 4 , 324-341. https://doi.org/10.1002/jrsm.1086 .

Heyvaert, M., & Onghena, P. (2014). Analysis of single-case data: Randomization tests for measures of effect size. Neuropsychological Rehabilitation, 24 , 507-527. https://doi.org/10.1080/09602011.2013.818564 .

Hitchcock, J. H., Horner, R. H., Kratochwill, T. R., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2014). The What Works Clearinghouse single-case design pilot standards: Who will guard the guards? Remedial and Special Education, 35 , 145-152. https://doi.org/10.1177/0741932513518979 .

Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71 , 165-179. https://doi.org/10.1177/001440290507100203 .

Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. Oxford University Press.

Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings ( 2nd ). Oxford University Press.

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single-case designs technical documentation. Retrieved from What Works Clearinghouse: https://files.eric.ed.gov/fulltext/ED510743.pdf

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2013). Single-case intervention research design standards. Remedial and Special Education, 34 , 26-38. https://doi.org/10.1177/0741932512452794 .

Kratochwill, T. R., & Levin, J. R. (2014). Meta- and statistical analysis of single-case intervention research data: Quantitative gifts and a wish list. Journal of School Psychology, 52 , 231-235. https://doi.org/10.1016/j.jsp.2014.01.003 .

Kromrey, J. D., & Foster-Johnson, L. (1996). Determining the efficacy of intervention: The use of effect sizes for data analysis in single-subject research. The Journal of Experimental Education, 65 , 73-93. https://doi.org/10.1080/00220973.1996.9943464 .

Lane, J. D., & Gast, D. L. (2014). Visual analysis in single case experimental design studies: Brief review and guidelines. Neuropsychological Rehabilitation, 24 , 445-463. https://doi.org/10.1080/09602011.2013.815636 .

Ledford, J. R., & Gast, D. L. (Eds.) (2018). Single case research methodology: Applications in special education and behavioral sciences (3rd). Routledge.

Levin, J. R. (1994). Crafting educational intervention research that's both credible and creditable. Educational Psychology Review, 6 , 231-243. https://doi.org/10.1007/BF02213185 .

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2018). Comparison of randomization-test procedures for single-case multiple-baseline designs. Developmental Neurorehabilitation, 21 , 290-311. https://doi.org/10.1080/17518423.2016.1197708 .

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2020). Investigation of single-case multiple-baseline randomization tests of trend and variability. Educational Psychology Review . https://doi.org/10.1007/s10648-020-09549-7 .

Ma, H.-H. (2006). Quantitative synthesis of single-subject researches: Percentage of data points exceeding the median. Behavior Modification, 30 , 598-617. https://doi.org/10.1177/0145445504272974 .

Maggin, D. M., Briesch, A. M., & Chafouleas, S. M. (2013). An application of the What Works Clearinghouse standards for evaluating single-subject research: Synthesis of the self-management literature base. Remedial and Special Education, 34 , 44-58. https://doi.org/10.1177/0741932511435176 .

Manolov, R. (2018). Linear trend in single-case visual and quantitative analyses. Behavior Modification, 42 , 684-706. https://doi.org/10.1177/0145445517726301 .

Manolov, R., & Moeyaert, M. (2017). Recommendations for choosing single-case data analytical techniques. Behavior Therapy, 48 , 97-114. https://doi.org/10.1016/j.beth.2016.04.008 .

Manolov, R., & Onghena, P. (2018). Analyzing data from single-case alternating treatments designs. Psychological Methods, 23 , 480-504. https://doi.org/10.1037/met0000133 .

Manolov, R., & Solanas, A. (2018). Analytical options for single-case experimental designs: Review and application to brain impairment. Brain Impairment, 19 , 18-32. https://doi.org/10.1017/BrImp.2017.17 .

Manolov, R., Solanas, A., & Sierra, V. (2020). Changing Criterion Designs: Integrating Methodological and Data Analysis Recommendations. The Journal of Experimental Education, 88 , 335-350. https://doi.org/10.1080/00220973.2018.1553838 .

Marascuilo, L., & Busk, P. (1988). Combining statistics for multiple-baseline AB and replicated ABAB designs across subjects. Behavioral Assessment, 10 , 1-28.

Michael, J. (1974). Statistical inference for individual organism research: Mixed blessing or curse? Journal of Applied Behavior Analysis, 7 , 647-653. https://doi.org/10.1901/jaba.1974.7-647 .

Michiels, B., Heyvaert, M., Meulders, A., & Onghena, P. (2017). Confidence intervals for single-case effect size measures based on randomization test inversion. Behavior Research Methods, 49 , 363-381. https://doi.org/10.3758/s13428-016-0714-4 .

Moeyaert, M., Akhmedjanova, D., Ferron, J. M., Beretvas, S. N., & Van den Noortgate, W. (2020). Effect size estimation for combined single-case experimental designs. Evidence-Based Communication Assessment and Intervention, 14 , 28-51. https://doi.org/10.1080/17489539.2020.1747146 .

Moeyaert, M., Ferron, J. M., Beretvas, S. N., & Van den Noortgate, W. (2014a). From a single-level analysis to a multilevel analysis of single-case experimental designs. Journal of School Psychology, 52 , 191-211. https://doi.org/10.1016/j.jsp.2013.11.003 .

Moeyaert, M., Ugille, M., Ferron, J. M., Beretvas, S. N., & Van den Noortgate, W. (2014b). Three-level analysis of single-case experimental data: Empirical validation. The Journal of Experimental Education, 82 , 1-21. https://doi.org/10.1080/00220973.2012.745470 .

O’Brien, S., & Repp, A. C. (1990). Reinforcement-based reductive procedures: A review of 20 years of their use with persons with severe or profound retardation. Journal of the Association for Persons with Severe Handicaps, 15 , 148–159. https://doi.org/10.1177/154079699001500307 .

Onghena, P. (1992). Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behavioral Assessment, 14 , 153-172.

Onghena, P., & Edgington, E. S. (1994). Randomization tests for restricted alternating treatment designs. Behaviour Research and Therapy, 32 , 783-786. https://doi.org/10.1016/0005-7967(94)90036-1 .

Onghena, P., & Edgington, E. S. (2005). Customization of pain treatments: Single-case design and analysis. The Clinical Journal of Pain, 21 , 56-68. https://doi.org/10.1097/00002508-200501000-00007 .

Onghena, P., Tanious, R., De, T. K., & Michiels, B. (2019). Randomization tests for changing criterion designs. Behaviour Research and Therapy, 117 , 18-27. https://doi.org/10.1016/j.brat.2019.01.005 .

Ottenbacher, K. J. (1990). When is a picture worth a thousand p values? A comparison of visual and quantitative methods to analyze single subject data. The Journal of Special Education, 23 , 436-449. https://doi.org/10.1177/002246699002300407 .

Parker, R. I., Hagan-Burke, S., & Vannest, K. (2007). Percentage of all non-overlapping data (PAND): An alternative to PND. The Journal of Special Education, 40 , 194-204. https://doi.org/10.1177/00224669070400040101 .

Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect Size in Single-Case Research: A Review of Nine Nonoverlap Techniques. Behavior Modification, 35 , 303-322. https://doi.org/10.1177/0145445511399147 .

Pustejovski, J. E., & Ferron, J. M. (2017). Research synthesis and meta-analysis of single-case designs. In J. M. Kaufmann, D. P. Hallahan, & P. C. Pullen, Handbook of Special Education (pp. 168-185). New York: Routledge.

Chapter   Google Scholar  

Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Design-comparable effect sizes in multiple baseline designs: A general modeling framework. Journal of Educational and Behavioral Statistics, 39 , 368-393. https://doi.org/10.3102/1076998614547577 .

Scruggs, T. E., Mastropieri, M. A., & Casto, G. (1987). The quantitative synthesis of single-subject research: Methodology and validation. Remedial and Special Education, 8 , 24-33. https://doi.org/10.1177/074193258700800206 .

Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: A primer and applications. Journal of School Psychology, 52 , 123–147. https://doi.org/10.1016/j.jsp.2013.11.005 .

Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment and Intervention, 2 , 188-196. https://doi.org/10.1080/17489530802581603 .

Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43 , 971-980. https://doi.org/10.3758/s13428-011-0111-y .

Smith, J. D. (2012). Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods, 17 , 510-550. https://doi.org/10.1037/a0029312 .

Solanas, A., Manolov, R., & Onghena, P. (2010). Estimating slope and level change in N=1 designs. Behavior Modification, 34 , 195-218. https://doi.org/10.1177/0145445510363306 .

Solomon, B. G. (2014). Violations of school-based single-case data: Implications for the selection and interpretation of effect sizes. Behavior Modification, 38 , 477-496. https://doi.org/10.1177/0145445513510931 .

Staples, M., & Niazi, M. (2007). Experiences using systematic review guidelines. The Journal of Systems and Software, 80 , 1425-1437. https://doi.org/10.1016/j.jss.2006.09.046 .

Swan, D. M., Pustejovsky, J. E., & Beretvas, S. N. (2020). The impact of response-guided designs on count outcomes in single-case experimental design baselines. Evidence-Based Communication Assessment and Intervention, 14 , 82-107. https://doi.org/10.1080/17489539.2020.1739048 .

Tanious, R., De, T. K., Michiels, B., Van den Noortgate, W., & Onghena, P. (2019). Consistency in single-case ABAB phase designs: A systematic review. Behavior Modification https://doi.org/10.1177/0145445519853793 .

Tanious, R., De, T. K., Michiels, B., Van den Noortgate, W., & Onghena, P. (2020). Assessing consistency in single-case A-B-A-B phase designs. Behavior Modification, 44 , 518-551. https://doi.org/10.1177/0145445519837726 .

Tate, R. L., Perdices, M., Rosenkoetter, U., McDonald, S., Togher, L., Shadish, W. R., … Vohra, S. (2016b). The Single-Case Reporting guideline In BEhavioural Interventions (SCRIBE) 2016: Explanation and Elaboration. Archives of Scientific Psychology, 4 , 1-9. https://doi.org/10.1037/arc0000026 .

Tate, R. L., Perdices, M., Rosenkoetter, U., Shadish, W. R., Vohra, S., Barlow, D. H., … Wilson, B. (2016a). The Single-Case Reporting guideline In BEhavioural interventions (SCRIBE) 2016 statement. Aphasiology, 30 , 862-876. https://doi.org/10.1080/02687038.2016.1178022 .

Tate, R. L., Perdices, M., Rosenkoetter, U., Wakim, D., Godbee, K., Togher, L., & McDonald, S. (2013). Revision of a method quality rating scale for single-case experimental designs and n-of-1 trials: The 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychological Rehabilitation, 23 , 619-638. https://doi.org/10.1080/09602011.2013.824383 .

Van den Noortgate, W., & Onghena, P. (2003). Hierarchical linear models for the quantitative integration of effect sizes in single-case research. Behavior Research Methods, Instruments, & Computers, 35 , 1-10. https://doi.org/10.3758/bf03195492 .

Van den Noortgate, W., & Onghena, P. (2008). A multilevel meta-analysis of single-subject experimental design studies. Evidence-Based Communication Assessment and Intervention, 2 , 142-151. https://doi.org/10.1080/17489530802505362 .

Vohra, S., Shamseer, L., Sampson, M., Bukutu, C., Schmid, C. H., Tate, R., … Group, TC (2016). CONSORT extension for reporting N-of-1 trials (CENT) 2015 statement. Journal of Clinical Epidemiology, 76 , 9–17. https://doi.org/10.1016/j.jclinepi.2015.05.004 .

Wampold, B., & Worsham, N. (1986). Randomization tests for multiple-baseline designs. Behavioral Assessment, 8 , 135-143.

What Works Clearinghouse. (2020a). Procedures Handbook (Version 4.1). Retrieved from Institute of Education Sciences: https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-Procedures-Handbook-v4-1-508.pdf

What Works Clearinghouse. (2020b). Responses to comments from the public on updated version 4.1 of the WWC Procedures Handbook and WWC Standards Handbook. Retrieved from Institute of Education Sciences: https://ies.ed.gov/ncee/wwc/Docs/referenceresources/SumResponsePublicComments-v4-1-508.pdf

What Works Clearinghouse. (2020c). Standards Handbook, version 4.1. Retrieved from Institute of Education Sciences: https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-Standards-Handbook-v4-1-508.pdf

White, D. M., Rusch, F. R., Kazdin, A. E., & Hartmann, D. P. (1989). Applications of meta-analysis in individual-subject research. Behavioral Assessment, 11 , 281-296.

Wolery, M. (2013). A commentary: Single-case design technical document of the What Works Clearinghouse. Remedial and Special Education , 39-43. https://doi.org/10.1177/0741932512468038 .

Woo, H., Lu, J., Kuo, P., & Choi, N. (2016). A content analysis of articles focusing on single-case research design: ACA journals between 2003 and 2014. Asia Pacific Journal of Counselling and Psychotherapy, 7 , 118-132. https://doi.org/10.1080/21507686.2016.1199439 .

Download references

Author information

Authors and affiliations.

Faculty of Psychology and Educational Sciences, Methodology of Educational Sciences Research Group, KU Leuven, Tiensestraat 102, Box 3762, B-3000, Leuven, Belgium

René Tanious & Patrick Onghena

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to René Tanious .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

(DOCX 110 kb)

Rights and permissions

Reprints and permissions

About this article

Tanious, R., Onghena, P. A systematic review of applied single-case research published between 2016 and 2018: Study designs, randomization, data aspects, and data analysis. Behav Res 53 , 1371–1384 (2021). https://doi.org/10.3758/s13428-020-01502-4

Download citation

Accepted : 09 October 2020

Published : 26 October 2020

Issue Date : August 2021

DOI : https://doi.org/10.3758/s13428-020-01502-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Single-case experimental designs
  • Visual analysis
  • Statistical analysis
  • Data aspects
  • Systematic review
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Single-Case Experimental Designs: A Systematic Review of Published Research and Current Standards

Justin d. smith.

Child and Family Center, University of Oregon

This article systematically reviews the research design and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between 2000 and 2010. SCEDs provide researchers with a flexible and viable alternative to group designs with large sample sizes. However, methodological challenges have precluded widespread implementation and acceptance of the SCED as a viable complementary methodology to the predominant group design. This article includes a description of the research design, measurement, and analysis domains distinctive to the SCED; a discussion of the results within the framework of contemporary standards and guidelines in the field; and a presentation of updated benchmarks for key characteristics (e.g., baseline sampling, method of analysis), and overall, it provides researchers and reviewers with a resource for conducting and evaluating SCED research. The results of the systematic review of 409 studies suggest that recently published SCED research is largely in accordance with contemporary criteria for experimental quality. Analytic method emerged as an area of discord. Comparison of the findings of this review with historical estimates of the use of statistical analysis indicates an upward trend, but visual analysis remains the most common analytic method and also garners the most support amongst those entities providing SCED standards. Although consensus exists along key dimensions of single-case research design and researchers appear to be practicing within these parameters, there remains a need for further evaluation of assessment and sampling techniques and data analytic methods.

The single-case experiment has a storied history in psychology dating back to the field’s founders: Fechner (1889) , Watson (1925) , and Skinner (1938) . It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see Morgan & Morgan, 2001 ). In recent years the single-case experimental design (SCED) has been represented in the literature more often than in past decades, as is evidenced by recent reviews ( Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ), but it still languishes behind the more prominent group design in nearly all subfields of psychology. Group designs are often professed to be superior because they minimize, although do not necessarily eliminate, the major internal validity threats to drawing scientifically valid inferences from the results ( Shadish, Cook, & Campbell, 2002 ). SCEDs provide a rigorous, methodologically sound alternative method of evaluation (e.g., Barlow, Nock, & Hersen, 2008 ; Horner et al., 2005 ; Kazdin, 2010 ; Kratochwill & Levin, 2010 ; Shadish et al., 2002 ) but are often overlooked as a true experimental methodology capable of eliciting legitimate inferences (e.g., Barlow et al., 2008 ; Kazdin, 2010 ). Despite a shift in the zeitgeist from single-case experiments to group designs more than a half century ago, recent and rapid methodological advancements suggest that SCEDs are poised for resurgence.

Single case refers to the participant or cluster of participants (e.g., a classroom, hospital, or neighborhood) under investigation. In contrast to an experimental group design in which one group is compared with another, participants in a single-subject experiment research provide their own control data for the purpose of comparison in a within-subject rather than a between-subjects design. SCEDs typically involve a comparison between two experimental time periods, known as phases. This approach typically includes collecting a representative baseline phase to serve as a comparison with subsequent phases. In studies examining single subjects that are actually groups (i.e., classroom, school), there are additional threats to internal validity of the results, as noted by Kratochwill and Levin (2010) , which include setting or site effects.

The central goal of the SCED is to determine whether a causal or functional relationship exists between a researcher-manipulated independent variable (IV) and a meaningful change in the dependent variable (DV). SCEDs generally involve repeated, systematic assessment of one or more IVs and DVs over time. The DV is measured repeatedly across and within all conditions or phases of the IV. Experimental control in SCEDs includes replication of the effect either within or between participants ( Horner et al., 2005 ). Randomization is another way in which threats to internal validity can be experimentally controlled. Kratochwill and Levin (2010) recently provided multiple suggestions for adding a randomization component to SCEDs to improve the methodological rigor and internal validity of the findings.

Examination of the effectiveness of interventions is perhaps the area in which SCEDs are most well represented ( Morgan & Morgan, 2001 ). Researchers in behavioral medicine and in clinical, health, educational, school, sport, rehabilitation, and counseling psychology often use SCEDs because they are particularly well suited to examining the processes and outcomes of psychological and behavioral interventions (e.g., Borckardt et al., 2008 ; Kazdin, 2010 ; Robey, Schultz, Crawford, & Sinner, 1999 ). Skepticism about the clinical utility of the randomized controlled trial (e.g., Jacobsen & Christensen, 1996 ; Wachtel, 2010 ; Westen & Bradley, 2005 ; Westen, Novotny, & Thompson-Brenner, 2004 ) has renewed researchers’ interest in SCEDs as a means to assess intervention outcomes (e.g., Borckardt et al., 2008 ; Dattilio, Edwards, & Fishman, 2010 ; Horner et al., 2005 ; Kratochwill, 2007 ; Kratochwill & Levin, 2010 ). Although SCEDs are relatively well represented in the intervention literature, it is by no means their sole home: Examples appear in nearly every subfield of psychology (e.g., Bolger, Davis, & Rafaeli, 2003 ; Piasecki, Hufford, Solham, & Trull, 2007 ; Reis & Gable, 2000 ; Shiffman, Stone, & Hufford, 2008 ; Soliday, Moore, & Lande, 2002 ). Aside from the current preference for group-based research designs, several methodological challenges have repressed the proliferation of the SCED.

Methodological Complexity

SCEDs undeniably present researchers with a complex array of methodological and research design challenges, such as establishing a representative baseline, managing the nonindependence of sequential observations (i.e., autocorrelation, serial dependence), interpreting single-subject effect sizes, analyzing the short data streams seen in many applications, and appropriately addressing the matter of missing observations. In the field of intervention research for example, Hser et al. (2001) noted that studies using SCEDs are “rare” because of the minimum number of observations that are necessary (e.g., 3–5 data points in each phase) and the complexity of available data analysis approaches. Advances in longitudinal person-based trajectory analysis (e.g., Nagin, 1999 ), structural equation modeling techniques (e.g., Lubke & Muthén, 2005 ), time-series forecasting (e.g., autoregressive integrated moving averages; Box & Jenkins, 1970 ), and statistical programs designed specifically for SCEDs (e.g., Simulation Modeling Analysis; Borckardt, 2006 ) have provided researchers with robust means of analysis, but they might not be feasible methods for the average psychological scientist.

Application of the SCED has also expanded. Today, researchers use variants of the SCED to examine complex psychological processes and the relationship between daily and momentary events in peoples’ lives and their psychological correlates. Research in nearly all subfields of psychology has begun to use daily diary and ecological momentary assessment (EMA) methods in the context of the SCED, opening the door to understanding increasingly complex psychological phenomena (see Bolger et al., 2003 ; Shiffman et al., 2008 ). In contrast to the carefully controlled laboratory experiment that dominated research in the first half of the twentieth century (e.g., Skinner, 1938 ; Watson, 1925 ), contemporary proponents advocate application of the SCED in naturalistic studies to increase the ecological validity of empirical findings (e.g., Bloom, Fisher, & Orme, 2003 ; Borckardt et al., 2008 ; Dattilio et al., 2010 ; Jacobsen & Christensen, 1996 ; Kazdin, 2008 ; Morgan & Morgan, 2001 ; Westen & Bradley, 2005 ; Westen et al., 2004 ). Recent advancements and expanded application of SCEDs indicate a need for updated design and reporting standards.

Many current benchmarks in the literature concerning key parameters of the SCED were established well before current advancements and innovations, such as the suggested minimum number of data points in the baseline phase(s), which remains a disputed area of SCED research (e.g., Center, Skiba, & Casey, 1986 ; Huitema, 1985 ; R. R. Jones, Vaught, & Weinrott, 1977 ; Sharpley, 1987 ). This article comprises (a) an examination of contemporary SCED methodological and reporting standards; (b) a systematic review of select design, measurement, and statistical characteristics of published SCED research during the past decade; and (c) a broad discussion of the critical aspects of this research to inform methodological improvements and study reporting standards. The reader will garner a fundamental understanding of what constitutes appropriate methodological soundness in single-case experimental research according to the established standards in the field, which can be used to guide the design of future studies, improve the presentation of publishable empirical findings, and inform the peer-review process. The discussion begins with the basic characteristics of the SCED, including an introduction to time-series, daily diary, and EMA strategies, and describes how current reporting and design standards apply to each of these areas of single-case research. Interweaved within this presentation are the results of a systematic review of SCED research published between 2000 and 2010 in peer-reviewed outlets and a discussion of the way in which these findings support, or differ from, existing design and reporting standards and published SCED benchmarks.

Review of Current SCED Guidelines and Reporting Standards

In contrast to experimental group comparison studies, which conform to generally well agreed upon methodological design and reporting guidelines, such as the CONSORT ( Moher, Schulz, Altman, & the CONSORT Group, 2001 ) and TREND ( Des Jarlais, Lyles, & Crepaz, 2004 ) statements for randomized and nonrandomized trials, respectively, there is comparatively much less consensus when it comes to the SCED. Until fairly recently, design and reporting guidelines for single-case experiments were almost entirely absent in the literature and were typically determined by the preferences of a research subspecialty or a particular journal’s editorial board. Factions still exist within the larger field of psychology, as can be seen in the collection of standards presented in this article, particularly in regard to data analytic methods of SCEDs, but fortunately there is budding agreement about certain design and measurement characteristics. A number of task forces, professional groups, and independent experts in the field have recently put forth guidelines; each has a relatively distinct purpose, which likely accounts for some of the discrepancies between them. In what is to be a central theme of this article, researchers are ultimately responsible for thoughtfully and synergistically combining research design, measurement, and analysis aspects of a study.

This review presents the more prominent, comprehensive, and recently established SCED standards. Six sources are discussed: (1) Single-Case Design Technical Documentation from the What Works Clearinghouse (WWC; Kratochwill et al., 2010 ); (2) the APA Division 12 Task Force on Psychological Interventions, with contributions from the Division 12 Task Force on Promotion and Dissemination of Psychological Procedures and the APA Task Force for Psychological Intervention Guidelines (DIV12; presented in Chambless & Hollon, 1998 ; Chambless & Ollendick, 2001 ), adopted and expanded by APA Division 53, the Society for Clinical Child and Adolescent Psychology ( Weisz & Hawley, 1998 , 1999 ); (3) the APA Division 16 Task Force on Evidence-Based Interventions in School Psychology (DIV16; Members of the Task Force on Evidence-Based Interventions in School Psychology. Chair: T. R. Kratochwill, 2003); (4) the National Reading Panel (NRP; National Institute of Child Health and Human Development, 2000 ); (5) the Single-Case Experimental Design Scale ( Tate et al., 2008 ); and (6) the reporting guidelines for EMA put forth by Stone & Shiffman (2002) . Although the specific purposes of each source differ somewhat, the overall aim is to provide researchers and reviewers with agreed-upon criteria to be used in the conduct and evaluation of SCED research. The standards provided by WWC, DIV12, DIV16, and the NRP represent the efforts of task forces. The Tate et al. scale was selected for inclusion in this review because it represents perhaps the only psychometrically validated tool for assessing the rigor of SCED methodology. Stone and Shiffman’s (2002) standards were intended specifically for EMA methods, but many of their criteria also apply to time-series, daily diary, and other repeated-measurement and sampling methods, making them pertinent to this article. The design, measurement, and analysis standards are presented in the later sections of this article and notable concurrences, discrepancies, strengths, and deficiencies are summarized.

Systematic Review Search Procedures and Selection Criteria

Search strategy.

A comprehensive search strategy of SCEDs was performed to identify studies published in peer-reviewed journals meeting a priori search and inclusion criteria. First, a computer-based PsycINFO search of articles published between 2000 and 2010 (search conducted in July 2011) was conducted that used the following primary key terms and phrases that appeared anywhere in the article (asterisks denote that any characters/letters can follow the last character of the search term): alternating treatment design, changing criterion design, experimental case*, multiple baseline design, replicated single-case design, simultaneous treatment design, time-series design. The search was limited to studies published in the English language and those appearing in peer-reviewed journals within the specified publication year range. Additional limiters of the type of article were also used in PsycINFO to increase specificity: The search was limited to include methodologies indexed as either quantitative study OR treatment outcome/randomized clinical trial and NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study.

Study selection

The author used a three-phase study selection, screening, and coding procedure to select the highest number of applicable studies. Phase 1 consisted of the initial systematic review conducted using PsycINFO, which resulted in 571 articles. In Phase 2, titles and abstracts were screened: Articles appearing to use a SCED were retained (451) for Phase 3, in which the author and a trained research assistant read each full-text article and entered the characteristics of interest into a database. At each phase of the screening process, studies that did not use a SCED or that either self-identified as, or were determined to be, quasi-experimental were dropped. Of the 571 original studies, 82 studies were determined to be quasi-experimental. The definition of a quasi-experimental design used in the screening procedure conforms to the descriptions provided by Kazdin (2010) and Shadish et al. (2002) regarding the necessary components of an experimental design. For example, reversal designs require a minimum of four phases (e.g., ABAB), and multiple baseline designs must demonstrate replication of the effect across at least three conditions (e.g., subjects, settings, behaviors). Sixteen studies were unavailable in full text in English, and five could not be obtained in full text and were thus dropped. The remaining articles that were not retained for review (59) were determined not to be SCED studies meeting our inclusion criteria, but had been identified in our PsycINFO search using the specified keyword and methodology terms. For this review, 409 studies were selected. The sources of the 409 reviewed studies are summarized in Table 1 . A complete bibliography of the 571 studies appearing in the initial search, with the included studies marked, is available online as an Appendix or from the author.

Journal Sources of Studies Included in the Systematic Review (N = 409)

Journal Title
45
15
14
14
13
12
12
10
10
9
9
9
9
8
8
8
8
6
6
5
5
4
4
4

Note: Each of the following journal titles contributed 1 study unless otherwise noted in parentheses: Augmentative and Alternative Communication; Acta Colombiana de Psicología; Acta Comportamentalia; Adapted Physical Activity Quarterly (2); Addiction Research and Theory; Advances in Speech Language Pathology; American Annals of the Deaf; American Journal of Education; American Journal of Occupational Therapy; American Journal of Speech-Language Pathology; The American Journal on Addictions; American Journal on Mental Retardation; Applied Ergonomics; Applied Psychophysiology and Biofeedback; Australian Journal of Guidance & Counseling; Australian Psychologist; Autism; The Behavior Analyst; The Behavior Analyst Today; Behavior Analysis in Practice (2); Behavior and Social Issues (2); Behaviour Change (2); Behavioural and Cognitive Psychotherapy; Behaviour Research and Therapy (3); Brain and Language (2); Brain Injury (2); Canadian Journal of Occupational Therapy (2); Canadian Journal of School Psychology; Career Development for Exceptional Individuals; Chinese Mental Health Journal; Clinical Linguistics and Phonetics; Clinical Psychology & Psychotherapy; Cognitive and Behavioral Practice; Cognitive Computation; Cognitive Therapy and Research; Communication Disorders Quarterly; Developmental Medicine & Child Neurology (2); Developmental Neurorehabilitation (2); Disability and Rehabilitation: An International, Multidisciplinary Journal (3); Disability and Rehabilitation: Assistive Technology; Down Syndrome: Research & Practice; Drug and Alcohol Dependence (2); Early Childhood Education Journal (2); Early Childhood Services: An Interdisciplinary Journal of Effectiveness; Educational Psychology (2); Education and Training in Autism and Developmental Disabilities; Electronic Journal of Research in Educational Psychology; Environment and Behavior (2); European Eating Disorders Review; European Journal of Sport Science; European Review of Applied Psychology; Exceptional Children; Exceptionality; Experimental and Clinical Psychopharmacology; Family & Community Health: The Journal of Health Promotion & Maintenance; Headache: The Journal of Head and Face Pain; International Journal of Behavioral Consultation and Therapy (2); International Journal of Disability; Development and Education (2); International Journal of Drug Policy; International Journal of Psychology; International Journal of Speech-Language Pathology; International Psychogeriatrics; Japanese Journal of Behavior Analysis (3); Japanese Journal of Special Education; Journal of Applied Research in Intellectual Disabilities (2); Journal of Applied Sport Psychology (3); Journal of Attention Disorders (2); Journal of Behavior Therapy and Experimental Psychiatry; Journal of Child Psychology and Psychiatry; Journal of Clinical Psychology in Medical Settings; Journal of Clinical Sport Psychology; Journal of Cognitive Psychotherapy; Journal of Consulting and Clinical Psychology (2); Journal of Deaf Studies and Deaf Education; Journal of Educational & Psychological Consultation (2); Journal of Evidence-Based Practices for Schools (2); Journal of the Experimental Analysis of Behavior (2); Journal of General Internal Medicine; Journal of Intellectual and Developmental Disabilities; Journal of Intellectual Disability Research (2); Journal of Medical Speech-Language Pathology; Journal of Neurology, Neurosurgery & Psychiatry; Journal of Paediatrics and Child Health; Journal of Prevention and Intervention in the Community; Journal of Safety Research; Journal of School Psychology (3); The Journal of Socio-Economics; The Journal of Special Education; Journal of Speech, Language, and Hearing Research (2); Journal of Sport Behavior; Journal of Substance Abuse Treatment; Journal of the International Neuropsychological Society; Journal of Traumatic Stress; The Journals of Gerontology: Series B: Psychological Sciences and Social Sciences; Language, Speech, and Hearing Services in Schools; Learning Disabilities Research & Practice (2); Learning Disability Quarterly (2); Music Therapy Perspectives; Neurorehabilitation and Neural Repair; Neuropsychological Rehabilitation (2); Pain; Physical Education and Sport Pedagogy (2); Preventive Medicine: An International Journal Devoted to Practice and Theory; Psychological Assessment; Psychological Medicine: A Journal of Research in Psychiatry and the Allied Sciences; The Psychological Record; Reading and Writing; Remedial and Special Education (3); Research and Practice for Persons with Severe Disabilities (2); Restorative Neurology and Neuroscience; School Psychology International; Seminars in Speech and Language; Sleep and Hypnosis; School Psychology Quarterly; Social Work in Health Care; The Sport Psychologist (3); Therapeutic Recreation Journal (2); The Volta Review; Work: Journal of Prevention, Assessment & Rehabilitation.

Coding criteria amplifications

A comprehensive description of the coding criteria for each category in this review is available from the author by request. The primary coding criteria are described here and in later sections of this article.

  • Research design was classified into one of the types discussed later in the section titled Predominant Single-Case Experimental Designs on the basis of the authors’ stated design type. Secondary research designs were then coded when applicable (i.e., mixed designs). Distinctions between primary and secondary research designs were made based on the authors’ description of their study. For example, if an author described the study as a “multiple baseline design with time-series measurement,” the primary research design would be coded as being multiple baseline, and time-series would be coded as the secondary research design.
  • Observer ratings were coded as present when observational coding procedures were described and/or the results of a test of interobserver agreement were reported.
  • Interrater reliability for observer ratings was coded as present in any case in which percent agreement, alpha, kappa, or another appropriate statistic was reported, regardless of the amount of the total data that were examined for agreement.
  • Daily diary, daily self-report, and EMA codes were given when authors explicitly described these procedures in the text by name. Coders did not infer the use of these measurement strategies.
  • The number of baseline observations was either taken directly from the figures provided in text or was simply counted in graphical displays of the data when this was determined to be a reliable approach. In some cases, it was not possible to reliably determine the number of baseline data points from the graphical display of data, in which case, the “unavailable” code was assigned. Similarly, the “unavailable” code was assigned when the number of observations was either unreported or ambiguous, or only a range was provided and thus no mean could be determined. Similarly, the mean number of baseline observations was calculated for each study prior to further descriptive statistical analyses because a number of studies reported means only.
  • The coding of the analytic method used in the reviewed studies is discussed later in the section titled Discussion of Review Results and Coding of Analytic Methods .

Results of the Systematic Review

Descriptive statistics of the design, measurement, and analysis characteristics of the reviewed studies are presented in Table 2 . The results and their implications are discussed in the relevant sections throughout the remainder of the article.

Descriptive Statistics of Reviewed SCED Characteristics

SubjectsObserver ratingsDiary/EMABaseline observations Method of analysis (%)
M Range%IRR%Mean RangeVisualStatisticalVisual & statisticalNot reported
Research design
 •Alternating condition264.773.341–1784.695.53.88.449.502–3923.17.719.246.2
 •Changing/shifting criterion181.941.061–477.885.70.05.292.932–1027.8
 •Multiple baseline/combined series2837.2918.081–20075.698.17.110.408.842–8921.613.46.455.8
 •Reversal70 6.6410.641–7578.6100.04.311.6913.781–7217.112.95.762.9
 •Simultaneous condition2 850.0100.00.02.0050.050.00.00.0
•Time-series10 26.7835.432–11450.040.010.06.212.593–100.070.030.00.0
 Mixed designs
  •Multiple baseline with reversal126.898.241–3292.9100.07.113.019.593–3314.321.40.064.3
  •Multiple baseline with changing criterion63.171.331–583.380.016.711.009.615–30
  •Multiple baseline with time-series65.001.793–816.7100.050.017.3015.684–420.066.716.716.7
Total of reviewed studies4096.6314.611–20076.097.16.110.229.591–8920.813.97.352.3

Note. % refers to the proportion of reviewed studies that satisfied criteria for this code: For example, the percent of studies reporting observer ratings.

Discussion of the Systematic Review Results in Context

The SCED is a very flexible methodology and has many variants. Those mentioned here are the building blocks from which other designs are then derived. For those readers interested in the nuances of each design, Barlow et al., (2008) ; Franklin, Allison, and Gorman (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992) , among others, provide cogent, in-depth discussions. Identifying the appropriate SCED depends upon many factors, including the specifics of the IV, the setting in which the study will be conducted, participant characteristics, the desired or hypothesized outcomes, and the research question(s). Similarly, the researcher’s selection of measurement and analysis techniques is determined by these factors.

Predominant Single-Case Experimental Designs

Alternating/simultaneous designs (6%; primary design of the studies reviewed).

Alternating and simultaneous designs involve an iterative manipulation of the IV(s) across different phases to show that changes in the DV vary systematically as a function of manipulating the IV(s). In these multielement designs, the researcher has the option to alternate the introduction of two or more IVs or present two or more IVs at the same time. In the alternating variation, the researcher is able to determine the relative impact of two different IVs on the DV, when all other conditions are held constant. Another variation of this design is to alternate IVs across various conditions that could be related to the DV (e.g., class period, interventionist). Similarly, the simultaneous design would occur when the IVs were presented at the same time within the same phase of the study.

Changing criterion design (4%)

Changing criterion designs are used to demonstrate a gradual change in the DV over the course of the phase involving the active manipulation of the IV. Criteria indicating that a change has occurred happen in a step-wise manner, in which the criterion shifts as the participant responds to the presence of the manipulated IV. The changing criterion design is particularly useful in applied intervention research for a number of reasons. The IV is continuous and never withdrawn, unlike the strategy used in a reversal design. This is particularly important in situations where removal of a psychological intervention would be either detrimental or dangerous to the participant, or would be otherwise unfeasible or unethical. The multiple baseline design also does not withdraw intervention, but it requires replicating the effects of the intervention across participants, settings, or situations. A changing criterion design can be accomplished with one participant in one setting without withholding or withdrawing treatment.

Multiple baseline/combined series design (69%)

The multiple baseline or combined series design can be used to test within-subject change across conditions and often involves multiple participants in a replication context. The multiple baseline design is quite simple in many ways, essentially consisting of a number of repeated, miniature AB experiments or variations thereof. Introduction of the IV is staggered temporally across multiple participants or across multiple within-subject conditions, which allows the researcher to demonstrate that changes in the DV reliably occur only when the IV is introduced, thus controlling for the effects of extraneous factors. Multiple baseline designs can be used both within and across units (i.e., persons or groups of persons). When the baseline phase of each subject begins simultaneously, it is called a concurrent multiple baseline design. In a nonconcurrent variation, baseline periods across subjects begin at different points in time. The multiple baseline design is useful in many settings in which withdrawal of the IV would not be appropriate or when introduction of the IV is hypothesized to result in permanent change that would not reverse when the IV is withdrawn. The major drawback of this design is that the IV must be initially withheld for a period of time to ensure different starting points across the different units in the baseline phase. Depending upon the nature of the research questions, withholding an IV, such as a treatment, could be potentially detrimental to participants.

Reversal designs (17%)

Reversal designs are also known as introduction and withdrawal and are denoted as ABAB designs in their simplest form. As the name suggests, the reversal design involves collecting a baseline measure of the DV (the first A phase), introducing the IV (the first B phase), removing the IV while continuing to assess the DV (the second A phase), and then reintroducing the IV (the second B phase). This pattern can be repeated as many times as is necessary to demonstrate an effect or otherwise address the research question. Reversal designs are useful when the manipulation is hypothesized to result in changes in the DV that are expected to reverse or discontinue when the manipulation is not present. Maintenance of an effect is often necessary to uphold the findings of reversal designs. The demonstration of an effect is evident in reversal designs when improvement occurs during the first manipulation phase, compared to the first baseline phase, then reverts to or approaches original baseline levels during the second baseline phase when the manipulation has been withdrawn, and then improves again when the manipulation in then reinstated. This pattern of reversal, when the manipulation is introduced and then withdrawn, is essential to attributing changes in the DV to the IV. However, maintenance of the effects in a reversal design, in which the DV is hypothesized to reverse when the IV is withdrawn, is not incompatible ( Kazdin, 2010 ). Maintenance is demonstrated by repeating introduction–withdrawal segments until improvement in the DV becomes permanent even when the IV is withdrawn. There is not always a need to demonstrate maintenance in all applications, nor is it always possible or desirable, but it is paramount in the learning and intervention research contexts.

Mixed designs (10%)

Mixed designs include a combination of more than one SCED (e.g., a reversal design embedded within a multiple baseline) or an SCED embedded within a group design (i.e., a randomized controlled trial comparing two groups of multiple baseline experiments). Mixed designs afford the researcher even greater flexibility in designing a study to address complex psychological hypotheses, but also capitalize on the strengths of the various designs. See Kazdin (2010) for a discussion of the variations and utility of mixed designs.

Related Nonexperimental Designs

Quasi-experimental designs.

In contrast to the designs previously described, all of which constitute “true experiments” ( Kazdin, 2010 ; Shadish et al., 2002 ), in quasi-experimental designs the conditions of a true experiment (e.g., active manipulation of the IV, replication of the effect) are approximated and are not readily under the control of the researcher. Because the focus of this article is on experimental designs, quasi-experiments are not discussed in detail; instead the reader is referred to Kazdin (2010) and Shadish et al. (2002) .

Ecological and naturalistic single-case designs

For a single-case design to be experimental, there must be active manipulation of the IV, but in some applications, such as those that might be used in social and personality psychology, the researcher might be interested in measuring naturally occurring phenomena and examining their temporal relationships. Thus, the researcher will not use a manipulation. An example of this type of research might be a study about the temporal relationship between alcohol consumption and depressed mood, which can be measured reliably using EMA methods. Psychotherapy process researchers also use this type of design to assess dyadic relationship dynamics between therapists and clients (e.g., Tschacher & Ramseyer, 2009 ).

Research Design Standards

Each of the reviewed standards provides some degree of direction regarding acceptable research designs. The WWC provides the most detailed and specific requirements regarding design characteristics. Those guidelines presented in Tables 3 , ​ ,4, 4 , and ​ and5 5 are consistent with the methodological rigor necessary to meet the WWC distinction “meets standards.” The WWC also provides less-stringent standards for a “meets standards with reservations” distinction. When minimum criteria in the design, measurement, or analysis sections of a study are not met, it is rated “does not meet standards” ( Kratochwill et al., 2010 ). Many SCEDs are acceptable within the standards of DIV12, DIV16, NRP, and in the Tate et al. SCED scale. DIV12 specifies that replication occurs across a minimum of three successive cases, which differs from the WWC specifications, which allow for three replications within a single-subject design but does not necessarily need to be across multiple subjects. DIV16 does not require, but seems to prefer, a multiple baseline design with a between-subject replication. Tate et al. state that the “design allows for the examination of cause and effect relationships to demonstrate efficacy” (p. 400, 2008). Determining whether or not a design meets this requirement is left up to the evaluator, who might then refer to one of the other standards or another source for direction.

Research Design Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Experimental manipulation (independent variable; IV)The independent variable (i.e., the intervention) must be systematically manipulated as determined by the researcherNeed a well-defined and replicable intervention for a specific disorder, problem behavior, or conditionSpecified intervention according to the classification systemSpecified interventionScale was designed to assess the quality of interventions; thus, an intervention is requiredManipulation in EMA is concerned with the sampling procedure of the study (see Measurement and Assessment table for more information)
2. Research designs
 General guidelinesAt least 3 attempts to demonstrate an effect at 3 different points in time or with 3 different phase repetitionsMany research designs are acceptable beyond those mentionedThe stage of the intervention program must be specified (see )The design allows for the examination of cause and effect to demonstrate efficacyEMA is almost entirely concerned with measurement of variables of interest; thus, the design of the study is determined solely by the research question(s)
 Reversal (e.g., ABAB)Minimum of 4 A and B phases(Mentioned as acceptable. See Analysis table for specific guidelines)Mentioned as acceptableN/AMentioned as acceptableN/A
 Multiple baseline/combined seriesAt least 3 baseline conditionsAt least 3 different, successive subjectsBoth within and between subjects
Considered the strongest because replication occurs across individuals
Single-subject or aggregated subjectsMentioned as acceptableN/A
 Alternating treatmentAt least 3 alternating treatments compared with a baseline condition or two alternating treatments compared with each otherN/AMentioned as acceptableN/AMentioned as acceptableN/A
 Simultaneous treatmentSame as for alternating treatment designsN/AMentioned as acceptableN/AMentioned as acceptableN/A
 Changing/shifting criterionAt least 3 different criteriaN/AN/AN/AN/AN/A
 Mixed designsN/AN/AMentioned as acceptableN/AN/AN/A
 Quasi-experimentalN/AN/AN/AMentioned as acceptableN/A
3. Baseline (see also Measurement and Assessment Standards)Minimum of 3 data pointsMinimum of 3 data pointsMinimum of 3 data points, although more observations are preferredNo minimum specifiedNo minimum (“sufficient sampling of behavior occurred pretreatment”)N/A
4. Randomization specifications providedN/AN/AYesYesN/AN/A

Measurement and Assessment Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Dependent variable (DV)
 Selection of DVN/A≥ 3 clinically important behaviors that are relatively independentOutcome measures that produce reliable scores (validity of measure reported)Standardized or investigator-constructed outcomes measures (report reliability)Measure behaviors that are the target of the interventionDetermined by research question(s)
 Assessor(s)/reporter(s)More than one (self-report not acceptable)N/AMultisource (not always applicable)N/AIndependent (implied minimum of 2)Determined by research question(s)
 Interrater reliabilityOn at least 20% of the data in each phase and in each condition

Must meet minimal established thresholds
N/AN/AN/AInterrater reliability is reportedN/A
 Method(s) of measurement/assessmentN/AN/AMultimethod (e.g., at least 2 assessment methods to evaluate primary outcomes; not always applicable)Quantitative or qualitative measureN/ADescription of prompting, recording, participant-initiated entries, data acquisition interface (e.g., diary)
 Interval of assessmentMust be measured repeatedly over time (no minimum specified) within and across different conditions and levels of the IVN/AN/AList time points when dependent measures were assessedSampling of the targeted behavior (i.e., DV) occurs during the treatment periodDensity and schedule are reported and consistent with addressing research question(s)

Define “immediate and timely response”
 Other guidelinesRaw data record provided (represent the variability of the target behavior)
2. Baseline measurement (see also Research Design Standards in )Minimum of 3 data points across multiple phases of a reversal or multiple baseline design; 5 data points in each phase for highest rating

1 or 2 data points can be sufficient in alternating treatment designs
Minimum of 3 data points (to establish a linear trend) No minimum specifiedNo minimum (“sufficient sampling of behavior [i.e., DV] occurred pretreatment”)N/A
3. Compliance and missing data guidelinesN/AN/AN/AN/AN/ARationale for compliance decisions, rates reported, missing data criteria and actions

Analysis Standards and Guidelines

What Works ClearinghouseAPA Division 12 Task Force on Psychological InterventionsAPA Division 16 Task Force on Evidence-Based Interventions in School PsychologyNational Reading PanelThe Single-Case Experimental Design Scale ( )Ecological Momentary Assessment ( )
1. Visual analysis4-step, 6-variable procedure (based on )Acceptable (no specific guidelines or procedures offered) )N/ANot acceptable (“use statistical analyses or describe effect sizes” p. 389)N/A
2. Statistical analysis proceduresEstimating effect sizes: nonparametric and parametric approaches, multilevel modeling, and regression (recommended)Preferred when the number of data points warrants statistical procedures (no specific guidelines or procedures offered)Rely on the guidelines presented by Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999)Type not specified – report value of the effect size, type of summary statistic, and number of people providing the effect size informationSpecific statistical methods are not specified, only their presence or absence is of interest in completing the scale
3. Demonstrating an effect ABAB - stable baseline established during first A period, data must show improvement during the first B period, reversal or leveling of improvement during the second A period, and resumed improvement in the second B period (no other guidelines offered) N/AN/AN/A
4. Replication N/AReplication occurs across subjects, therapists, or settingsN/A

The Stone and Shiffman (2002) standards for EMA are concerned almost entirely with the reporting of measurement characteristics and less so with research design. One way in which these standards differ from those of other sources is in the active manipulation of the IV. Many research questions in EMA, daily diary, and time-series designs are concerned with naturally occurring phenomena, and a researcher manipulation would run counter to this aim. The EMA standards become important when selecting an appropriate measurement strategy within the SCED. In EMA applications, as is also true in some other time-series and daily diary designs, researcher manipulation occurs as a function of the sampling interval in which DVs of interest are measured according to fixed time schedules (e.g., reporting occurs at the end of each day), random time schedules (e.g., the data collection device prompts the participant to respond at random intervals throughout the day), or on an event-based schedule (e.g., reporting occurs after a specified event takes place).

Measurement

The basic measurement requirement of the SCED is a repeated assessment of the DV across each phase of the design in order to draw valid inferences regarding the effect of the IV on the DV. In other applications, such as those used by personality and social psychology researchers to study various human phenomena ( Bolger et al., 2003 ; Reis & Gable, 2000 ), sampling strategies vary widely depending on the topic area under investigation. Regardless of the research area, SCEDs are most typically concerned with within-person change and processes and involve a time-based strategy, most commonly to assess global daily averages or peak daily levels of the DV. Many sampling strategies, such as time-series, in which reporting occurs at uniform intervals or on event-based, fixed, or variable schedules, are also appropriate measurement methods and are common in psychological research (see Bolger et al., 2003 ).

Repeated-measurement methods permit the natural, even spontaneous, reporting of information ( Reis, 1994 ), which reduces the biases of retrospection by minimizing the amount of time elapsed between an experience and the account of this experience ( Bolger et al., 2003 ). Shiffman et al. (2008) aptly noted that the majority of research in the field of psychology relies heavily on retrospective assessment measures, even though retrospective reports have been found to be susceptible to state-congruent recall (e.g., Bower, 1981 ) and a tendency to report peak levels of the experience instead of giving credence to temporal fluctuations ( Redelmeier & Kahneman, 1996 ; Stone, Broderick, Kaell, Deles-Paul, & Porter, 2000 ). Furthermore, Shiffman et al. (1997) demonstrated that subjective aggregate accounts were a poor fit to daily reported experiences, which can be attributed to reductions in measurement error resulting in increased validity and reliability of the daily reports.

The necessity of measuring at least one DV repeatedly means that the selected assessment method, instrument, and/or construct must be sensitive to change over time and be capable of reliably and validly capturing change. Horner et al. (2005) discusses the important features of outcome measures selected for use in these types of designs. Kazdin (2010) suggests that measures be dimensional, which can more readily detect effects than categorical and binary measures. Although using an established measure or scale, such as the Outcome Questionnaire System ( M. J. Lambert, Hansen, & Harmon, 2010 ), provides empirically validated items for assessing various outcomes, most measure validation studies conducted on this type of instrument involve between-subject designs, which is no guarantee that these measures are reliable and valid for assessing within-person variability. Borsboom, Mellenbergh, and van Heerden (2003) suggest that researchers adapting validated measures should consider whether the items they propose using have a factor structure within subjects similar to that obtained between subjects. This is one of the reasons that SCEDs often use observational assessments from multiple sources and report the interrater reliability of the measure. Self-report measures are acceptable practice in some circles, but generally additional assessment methods or informants are necessary to uphold the highest methodological standards. The results of this review indicate that the majority of studies include observational measurement (76.0%). Within those studies, nearly all (97.1%) reported interrater reliability procedures and results. The results within each design were similar, with the exception of time-series designs, which used observer ratings in only half of the reviewed studies.

Time-series

Time-series designs are defined by repeated measurement of variables of interest over a period of time ( Box & Jenkins, 1970 ). Time-series measurement most often occurs in uniform intervals; however, this is no longer a constraint of time-series designs (see Harvey, 2001 ). Although uniform interval reporting is not necessary in SCED research, repeated measures often occur at uniform intervals, such as once each day or each week, which constitutes a time-series design. The time-series design has been used in various basic science applications ( Scollon, Kim-Pietro, & Diener, 2003 ) across nearly all subspecialties in psychology (e.g., Bolger et al., 2003 ; Piasecki et al., 2007 ; for a review, see Reis & Gable, 2000 ; Soliday et al., 2002 ). The basic time-series formula for a two-phase (AB) data stream is presented in Equation 1 . In this formula α represents the step function of the data stream; S represents the change between the first and second phases, which is also the intercept in a two-phase data stream and a step function being 0 at times i = 1, 2, 3…n1 and 1 at times i = n1+1, n1+2, n1+3…n; n 1 is the number of observations in the baseline phase; n is the total number of data points in the data stream; i represents time; and ε i = ρε i −1 + e i , which indicates the relationship between the autoregressive function (ρ) and the distribution of the data in the stream.

Time-series formulas become increasingly complex when seasonality and autoregressive processes are modeled in the analytic procedures, but these are rarely of concern for short time-series data streams in SCEDs. For a detailed description of other time-series design and analysis issues, see Borckardt et al. (2008) , Box and Jenkins (1970) , Crosbie (1993) , R. R. Jones et al. (1977) , and Velicer and Fava (2003) .

Time-series and other repeated-measures methodologies also enable examination of temporal effects. Borckardt et al. (2008) and others have noted that time-series designs have the potential to reveal how change occurs, not simply if it occurs. This distinction is what most interested Skinner (1938) , but it often falls below the purview of today’s researchers in favor of group designs, which Skinner felt obscured the process of change. In intervention and psychopathology research, time-series designs can assess mediators of change ( Doss & Atkins, 2006 ), treatment processes ( Stout, 2007 ; Tschacher & Ramseyer, 2009 ), and the relationship between psychological symptoms (e.g., Alloy, Just, & Panzarella, 1997 ; Hanson & Chen, 2010 ; Oslin, Cary, Slaymaker, Colleran, & Blow, 2009 ), and might be capable of revealing mechanisms of change ( Kazdin, 2007 , 2009 , 2010 ). Between- and within-subject SCED designs with repeated measurements enable researchers to examine similarities and differences in the course of change, both during and as a result of manipulating an IV. Temporal effects have been largely overlooked in many areas of psychological science ( Bolger et al., 2003 ): Examining temporal relationships is sorely needed to further our understanding of the etiology and amplification of numerous psychological phenomena.

Time-series studies were very infrequently found in this literature search (2%). Time-series studies traditionally occur in subfields of psychology in which single-case research is not often used (e.g., personality, physiological/biological). Recent advances in methods for collecting and analyzing time-series data (e.g., Borckardt et al., 2008 ) could expand the use of time-series methodology in the SCED community. One problem with drawing firm conclusions from this particular review finding is a semantic factor: Time-series is a specific term reserved for measurement occurring at a uniform interval. However, SCED research appears to not yet have adopted this language when referring to data collected in this fashion. When time-series data analytic methods are not used, the matter of measurement interval is of less importance and might not need to be specified or described as a time-series. An interesting extension of this work would be to examine SCED research that used time-series measurement strategies but did not label it as such. This is important because then it could be determined how many SCEDs could be analyzed with time-series statistical methods.

Daily diary and ecological momentary assessment methods

EMA and daily diary approaches represent methodological procedures for collecting repeated measurements in time-series and non-time-series experiments, which are also known as experience sampling. Presenting an in-depth discussion of the nuances of these sampling techniques is well beyond the scope of this paper. The reader is referred to the following review articles: daily diary ( Bolger et al., 2003 ; Reis & Gable, 2000 ; Thiele, Laireiter, & Baumann, 2002 ), and EMA ( Shiffman et al., 2008 ). Experience sampling in psychology has burgeoned in the past two decades as technological advances have permitted more precise and immediate reporting by participants (e.g., Internet-based, two-way pagers, cellular telephones, handheld computers) than do paper and pencil methods (for reviews see Barrett & Barrett, 2001 ; Shiffman & Stone, 1998 ). Both methods have practical limitations and advantages. For example, electronic methods are more costly and may exclude certain subjects from participating in the study, either because they do not have access to the necessary technology or they do not have the familiarity or savvy to successfully complete reporting. Electronic data collection methods enable the researcher to prompt responses at random or predetermined intervals and also accurately assess compliance. Paper and pencil methods have been criticized for their inability to reliably track respondents’ compliance: Palermo, Valenzuela, and Stork (2004) found better compliance with electronic diaries than with paper and pencil. On the other hand, Green, Rafaeli, Bolger, Shrout, & Reis (2006) demonstrated the psychometric data structure equivalence between these two methods, suggesting that the data collected in either method will yield similar statistical results given comparable compliance rates.

Daily diary/daily self-report and EMA measurement were somewhat rarely represented in this review, occurring in only 6.1% of the total studies. EMA methods had been used in only one of the reviewed studies. The recent proliferation of EMA and daily diary studies in psychology reported by others ( Bolger et al., 2003 ; Piasecki et al., 2007 ; Shiffman et al., 2008 ) suggests that these methods have not yet reached SCED researchers, which could in part have resulted from the long-held supremacy of observational measurement in fields that commonly practice single-case research.

Measurement Standards

As was previously mentioned, measurement in SCEDs requires the reliable assessment of change over time. As illustrated in Table 4 , DIV16 and the NRP explicitly require that reliability of all measures be reported. DIV12 provides little direction in the selection of the measurement instrument, except to require that three or more clinically important behaviors with relative independence be assessed. Similarly, the only item concerned with measurement on the Tate et al. scale specifies assessing behaviors consistent with the target of the intervention. The WWC and the Tate et al. scale require at least two independent assessors of the DV and that interrater reliability meeting minimum established thresholds be reported. Furthermore, WWC requires that interrater reliability be assessed on at least 20% of the data in each phase and in each condition. DIV16 expects that assessment of the outcome measures will be multisource and multimethod, when applicable. The interval of measurement is not specified by any of the reviewed sources. The WWC and the Tate et al. scale require that DVs be measured repeatedly across phases (e.g., baseline and treatment), which is a typical requirement of a SCED. The NRP asks that the time points at which DV measurement occurred be reported.

The baseline measurement represents one of the most crucial design elements of the SCED. Because subjects provide their own data for comparison, gathering a representative, stable sampling of behavior before manipulating the IV is essential to accurately inferring an effect. Some researchers have reported the typical length of the baseline period to range from 3 to 12 observations in intervention research applications (e.g., Center et al., 1986 ; Huitema, 1985 ; R. R. Jones et al., 1977 ; Sharpley, 1987 ); Huitema’s (1985) review of 881 experiments published in the Journal of Applied Behavior Analysis resulted in a modal number of three to four baseline points. Center et al. (1986) suggested five as the minimum number of baseline measurements needed to accurately estimate autocorrelation. Longer baseline periods suggest a greater likelihood of a representative measurement of the DVs, which has been found to increase the validity of the effects and reduce bias resulting from autocorrelation ( Huitema & McKean, 1994 ). The results of this review are largely consistent with those of previous researchers: The mean number of baseline observations was found to be 10.22 ( SD = 9.59), and 6 was the modal number of observations. Baseline data were available in 77.8% of the reviewed studies. Although the baseline assessment has tremendous bearing on the results of a SCED study, it was often difficult to locate the exact number of data points. Similarly, the number of data points assessed across all phases of the study were not easily identified.

The WWC, DIV12, and DIV16 agree that a minimum of three data points during the baseline is necessary. However, to receive the highest rating by the WWC, five data points are necessary in each phase, including the baseline and any subsequent withdrawal baselines as would occur in a reversal design. DIV16 explicitly states that more than three points are preferred and further stipulates that the baseline must demonstrate stability (i.e., limited variability), absence of overlap between the baseline and other phases, absence of a trend, and that the level of the baseline measurement is severe enough to warrant intervention; each of these aspects of the data is important in inferential accuracy. Detrending techniques can be used to address baseline data trend. The integration option in ARIMA-based modeling and the empirical mode decomposition method ( Wu, Huang, Long, & Peng, 2007 ) are two sophisticated detrending techniques. In regression-based analytic methods, detrending can be accomplished by simply regressing each variable in the model on time (i.e., the residuals become the detrended series), which is analogous to adding a linear, exponential, or quadratic term to the regression equation.

NRP does not provide a minimum for data points, nor does the Tate et al. scale, which requires only a sufficient sampling of baseline behavior. Although the mean and modal number of baseline observations is well within these parameters, seven (1.7%) studies reported mean baselines of less than three data points.

Establishing a uniform minimum number of required baseline observations would provide researchers and reviewers with only a starting guide. The baseline phase is important in SCED research because it establishes a trend that can then be compared with that of subsequent phases. Although a minimum number of observations might be required to meet standards, many more might be necessary to establish a trend when there is variability and trends in the direction of the expected effect. The selected data analytic approach also has some bearing on the number of necessary baseline observations. This is discussed further in the Analysis section.

Reporting of repeated measurements

Stone and Shiffman (2002) provide a comprehensive set of guidelines for the reporting of EMA data, which can also be applied to other repeated-measurement strategies. Because the application of EMA is widespread and not confined to specific research designs, Stone and Shiffman intentionally place few restraints on researchers regarding selection of the DV and the reporter, which is determined by the research question under investigation. The methods of measurement, however, are specified in detail: Descriptions of prompting, recording of responses, participant-initiated entries, and the data acquisition interface (e.g., paper and pencil diary, PDA, cellular telephone) ought to be provided with sufficient detail for replication. Because EMA specifically, and time-series/daily diary methods similarly, are primarily concerned with the interval of assessment, Stone and Shiffman suggest reporting the density and schedule of assessment. The approach is generally determined by the nature of the research question and pragmatic considerations, such as access to electronic data collection devices at certain times of the day and participant burden. Compliance and missing data concerns are present in any longitudinal research design, but they are of particular importance in repeated-measurement applications with frequent measurement. When the research question pertains to temporal effects, compliance becomes paramount, and timely, immediate responding is necessary. For this reason, compliance decisions, rates of missing data, and missing data management techniques must be reported. The effect of missing data in time-series data streams has been the topic of recent research in the social sciences (e.g., Smith, Borckardt, & Nash, in press ; Velicer & Colby, 2005a , 2005b ). The results and implications of these and other missing data studies are discussed in the next section.

Analysis of SCED Data

Visual analysis.

Experts in the field generally agree about the majority of critical single-case experiment design and measurement characteristics. Analysis, on the other hand, is an area of significant disagreement, yet it has also received extensive recent attention and advancement. Debate regarding the appropriateness and accuracy of various methods for analyzing SCED data, the interpretation of single-case effect sizes, and other concerns vital to the validity of SCED results has been ongoing for decades, and no clear consensus has been reached. Visual analysis, following systematic procedures such as those provided by Franklin, Gorman, Beasley, and Allison (1997) and Parsonson and Baer (1978) , remains the standard by which SCED data are most commonly analyzed ( Parker, Cryer, & Byrns, 2006 ). Visual analysis can arguably be applied to all SCEDs. However, a number of baseline data characteristics must be met for effects obtained through visual analysis to be valid and reliable. The baseline phase must be relatively stable; free of significant trend, particularly in the hypothesized direction of the effect; have minimal overlap of data with subsequent phases; and have a sufficient sampling of behavior to be considered representative ( Franklin, Gorman, et al., 1997 ; Parsonson & Baer, 1978 ). The effect of baseline trend on visual analysis, and a technique to control baseline trend, are offered by Parker et al. (2006) . Kazdin (2010) suggests using statistical analysis when a trend or significant variability appears in the baseline phase, two conditions that ought to preclude the use of visual analysis techniques. Visual analysis methods are especially adept at determining intervention effects and can be of particular relevance in real-world applications (e.g., Borckardt et al., 2008 ; Kratochwill, Levin, Horner, & Swoboda, 2011 ).

However, visual analysis has its detractors. It has been shown to be inconsistent, can be affected by autocorrelation, and results in overestimation of effect (e.g., Matyas & Greenwood, 1990 ). Visual analysis as a means of estimating an effect precludes the results of SCED research from being included in meta-analysis, and also makes it very difficult to compare results to the effect sizes generated by other statistical methods. Yet, visual analysis proliferates in large part because SCED researchers are familiar with these methods and are not only generally unfamiliar with statistical approaches, but lack agreement about their appropriateness. Still, top experts in single-case analysis champion the use of statistical methods alongside visual analysis whenever it is appropriate to do so ( Kratochwill et al., 2011 ).

Statistical analysis

Statistical analysis of SCED data consists generally of an attempt to address one or more of three broad research questions: (1) Does introduction/manipulation of the IV result in statistically significant change in the level of the DV (level-change or phase-effect analysis)? (2) Does introduction/manipulation of the IV result in statistically significant change in the slope of the DV over time (slope-change analysis)? and (3) Do meaningful relationships exist between the trajectory of the DV and other potential covariates? Level- and slope-change analyses are relevant to intervention effectiveness studies and other research questions in which the IV is expected to result in changes in the DV in a particular direction. Visual analysis methods are most adept at addressing research questions pertaining to changes in level and slope (Questions 1 and 2), most often using some form of graphical representation and standardized computation of a mean level or trend line within and between each phase of interest (e.g., Horner & Spaulding, 2010 ; Kratochwill et al., 2011 ; Matyas & Greenwood, 1990 ). Research questions in other areas of psychological science might address the relationship between DVs or the slopes of DVs (Question 3). A number of sophisticated modeling approaches (e.g., cross-lag, multilevel, panel, growth mixture, latent class analysis) may be used for this type of question, and some are discussed in greater detail later in this section. However, a discussion about the nuances of this type of analysis and all their possible methods is well beyond the scope of this article.

The statistical analysis of SCEDs is a contentious issue in the field. Not only is there no agreed-upon statistical method, but the practice of statistical analysis in the context of the SCED is viewed by some as unnecessary (see Shadish, Rindskopf, & Hedges, 2008 ). Traditional trends in the prevalence of statistical analysis usage by SCED researchers are revealing: Busk & Marascuilo (1992) found that only 10% of the published single-case studies they reviewed used statistical analysis; Brossart, Parker, Olson, & Mahadevan (2006) estimated that this figure had roughly doubled by 2006. A range of concerns regarding single-case effect size calculation and interpretation is discussed in significant detail elsewhere (e.g., Campbell, 2004 ; Cohen, 1994 ; Ferron & Sentovich, 2002 ; Ferron & Ware, 1995 ; Kirk, 1996 ; Manolov & Solanas, 2008 ; Olive & Smith, 2005 ; Parker & Brossart, 2003 ; Robey et al., 1999 ; Smith et al., in press ; Velicer & Fava, 2003 ). One concern is the lack of a clearly superior method across datasets. Although statistical methods for analyzing SCEDs abound, few studies have examined their comparative performance with the same dataset. The most recent studies of this kind, performed by Brossart et al. (2006) , Campbell (2004) , Parker and Brossart (2003) , and Parker and Vannest (2009) , found that the more promising available statistical analysis methods yielded moderately different results on the same data series, which led them to conclude that each available method is equipped to adequately address only a relatively narrow spectrum of data. Given these findings, analysts need to select an appropriate model for the research questions and data structure, being mindful of how modeling results can be influenced by extraneous factors.

The current standards unfortunately provide little guidance in the way of statistical analysis options. This article presents an admittedly cursory introduction to available statistical methods; many others are not covered in this review. The following articles provide more in-depth discussion and description of other methods: Barlow et al. (2008) ; Franklin et al., (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992 , 2010 ). Shadish et al. (2008) summarize more recently developed methods. Similarly, a Special Issue of Evidence-Based Communication Assessment and Intervention (2008, Volume 2) provides articles and discussion of the more promising statistical methods for SCED analysis. An introduction to autocorrelation and its implications for statistical analysis is necessary before specific analytic methods can be discussed. It is also pertinent at this time to discuss the implications of missing data.

Autocorrelation

Many repeated measurements within a single subject or unit create a situation that most psychological researchers are unaccustomed to dealing with: autocorrelated data, which is the nonindependence of sequential observations, also known as serial dependence. Basic and advanced discussions of autocorrelation in single-subject data can be found in Borckardt et al. (2008) , Huitema (1985) , and Marshall (1980) , and discussions of autocorrelation in multilevel models can be found in Snijders and Bosker (1999) and Diggle and Liang (2001) . Along with trend and seasonal variation, autocorrelation is one example of the internal structure of repeated measurements. In the social sciences, autocorrelated data occur most naturally in the fields of physiological psychology, econometrics, and finance, where each phase of interest has potentially hundreds or even thousands of observations that are tightly packed across time (e.g., electroencephalography actuarial data, financial market indices). Applied SCED research in most areas of psychology is more likely to have measurement intervals of day, week, or hour.

Autocorrelation is a direct result of the repeated-measurement requirements of the SCED, but its effect is most noticeable and problematic when one is attempting to analyze these data. Many commonly used data analytic approaches, such as analysis of variance, assume independence of observations and can produce spurious results when the data are nonindependent. Even statistically insignificant autocorrelation estimates are generally viewed as sufficient to cause inferential bias when conventional statistics are used (e.g., Busk & Marascuilo, 1988 ; R. R. Jones et al., 1977 ; Matyas & Greenwood, 1990 ). The effect of autocorrelation on statistical inference in single-case applications has also been known for quite some time (e.g., R. R. Jones et al., 1977 ; Kanfer, 1970 ; Kazdin, 1981 ; Marshall, 1980 ). The findings of recent simulation studies of single-subject data streams indicate that autocorrelation is a nontrivial matter. For example, Manolov and Solanas (2008) determined that calculated effect sizes were linearly related to the autocorrelation of the data stream, and Smith et al. (in press) demonstrated that autocorrelation estimates in the vicinity of 0.80 negatively affect the ability to correctly infer a significant level-change effect using a standardized mean differences method. Huitema and colleagues (e.g., Huitema, 1985 ; Huitema & McKean, 1994 ) argued that autocorrelation is rarely a concern in applied research. Huitema’s methods and conclusions have been questioned and opposing data have been published (e.g., Allison & Gorman, 1993 ; Matyas & Greenwood, 1990 ; Robey et al., 1999 ), resulting in abandonment of the position that autocorrelation can be conscionably ignored without compromising the validity of the statistical procedures. Procedures for removing autocorrelation in the data stream prior to calculating effect sizes are offered as one option: One of the more promising analysis methods, autoregressive integrated moving averages (discussed later in this article), was specifically designed to remove the internal structure of time-series data, such as autocorrelation, trend, and seasonality ( Box & Jenkins, 1970 ; Tiao & Box, 1981 ).

Missing observations

Another concern inherent in repeated-measures designs is missing data. Daily diary and EMA methods are intended to reduce the risk of retrospection error by eliciting accurate, real-time information ( Bolger et al., 2003 ). However, these methods are subject to missing data as a result of honest forgetfulness, not possessing the diary collection tool at the specified time of collection, and intentional or systematic noncompliance. With paper and pencil diaries and some electronic methods, subjects might be able to complete missed entries retrospectively, defeating the temporal benefits of these assessment strategies ( Bolger et al., 2003 ). Methods of managing noncompliance through the study design and measurement methods include training the subject to use the data collection device appropriately, using technology to prompt responding and track the time of response, and providing incentives to participants for timely compliance (for additional discussion of this topic, see Bolger et al., 2003 ; Shiffman & Stone, 1998 ).

Even when efforts are made to maximize compliance during the conduct of the research, the problem of missing data is often unavoidable. Numerous approaches exist for handling missing observations in group multivariate designs (e.g., Horton & Kleinman, 2007 ; Ibrahim, Chen, Lipsitz, & Herring, 2005 ). Ragunathan (2004) and others concluded that full information and raw data maximum likelihood methods are preferable. Velicer and Colby (2005a , 2005b ) established the superiority of maximum likelihood methods over listwise deletion, mean of adjacent observations, and series mean substitution in the estimation of various critical time-series data parameters. Smith et al. (in press) extended these findings regarding the effect of missing data on inferential precision. They found that managing missing data with the EM procedure ( Dempster, Laird, & Rubin, 1977 ), a maximum likelihood algorithm, did not affect one’s ability to correctly infer a significant effect. However, lag-1 autocorrelation estimates in the vicinity of 0.80 resulted in insufficient power sensitivity (< 0.80), regardless of the proportion of missing data (10%, 20%, 30%, or 40%). 1 Although maximum likelihood methods have garnered some empirical support, methodological strategies that minimize missing data, particularly systematically missing data, are paramount to post-hoc statistical remedies.

Nonnormal distribution of data

In addition to the autocorrelated nature of SCED data, typical measurement methods also present analytic challenges. Many statistical methods, particularly those involving model finding, assume that the data are normally distributed. This is often not satisfied in SCED research when measurements involve count data, observer-rated behaviors, and other, similar metrics that result in skewed distributions. Techniques are available to manage nonnormal distributions in regression-based analysis, such as zero-inflated Poisson regression ( D. Lambert, 1992 ) and negative binomial regression ( Gardner, Mulvey, & Shaw, 1995 ), but many other statistical analysis methods do not include these sophisticated techniques. A skewed data distribution is perhaps one of the reasons Kazdin (2010) suggests not using count, categorical, or ordinal measurement methods.

Available statistical analysis methods

Following is a basic introduction to the more promising and prevalent analytic methods for SCED research. Because there is little consensus regarding the superiority of any single method, the burden unfortunately falls on the researcher to select a method capable of addressing the research question and handling the data involved in the study. Some indications and contraindications are provided for each method presented here.

Multilevel and structural equation modeling

Multilevel modeling (MLM; e.g., Schmidt, Perels, & Schmitz, 2010 ) techniques represent the state of the art among parametric approaches to SCED analysis, particularly when synthesizing SCED results ( Shadish et al., 2008 ). MLM and related latent growth curve and factor mixture methods in structural equation modeling (SEM; e.g., Lubke & Muthén, 2005 ; B. O. Muthén & Curran, 1997 ) are particularly effective for evaluating trajectories and slopes in longitudinal data and relating changes to potential covariates. MLM and related hierarchical linear models (HLM) can also illuminate the relationship between the trajectories of different variables under investigation and clarify whether or not these relationships differ amongst the subjects in the study. Time-series and cross-lag analyses can also be used in MLM and SEM ( Chow, Ho, Hamaker, & Dolan, 2010 ; du Toit & Browne, 2007 ). However, they generally require sophisticated model-fitting techniques, making them difficult for many social scientists to implement. The structure (autocorrelation) and trend of the data can also complicate many MLM methods. The common, short data streams in SCED research and the small number of subjects also present problems to MLM and SEM approaches, which were developed for data with significantly greater numbers of observations when the number of subjects is fewer, and for a greater number of participants for model-fitting purposes, particularly when there are fewer data points. Still, MLM and related techniques arguably represent the most promising analytic methods.

A number of software options 2 exist for SEM. Popular statistical packages in the social sciences provide SEM options, such as PROC CALIS in SAS ( SAS Institute Inc., 2008 ), the AMOS module ( Arbuckle, 2006 ) of SPSS ( SPSS Statistics, 2011 ), and the sempackage for R ( R Development Core Team, 2005 ), the use of which is described by Fox ( Fox, 2006 ). A number of stand-alone software options are also available for SEM applications, including Mplus ( L. K. Muthén & Muthén, 2010 ) and Stata ( StataCorp., 2011 ). Each of these programs also provides options for estimating multilevel/hierarchical models (for a review of using these programs for MLM analysis see Albright & Marinova, 2010 ). Hierarchical linear and nonlinear modeling can also be accomplished using the HLM 7 program ( Raudenbush, Bryk, & Congdon, 2011 ).

Autoregressive moving averages (ARMA; e.g., Browne & Nesselroade, 2005 ; Liu & Hudack, 1995 ; Tiao & Box, 1981 )

Two primary points have been raised regarding ARMA modeling: length of the data stream and feasibility of the modeling technique. ARMA models generally require 30–50 observations in each phase when analyzing a single-subject experiment (e.g., Borckardt et al., 2008 ; Box & Jenkins, 1970 ), which is often difficult to satisfy in applied psychological research applications. However, ARMA models in an SEM framework, such as those described by du Toit & Browne (2001) , are well suited for longitudinal panel data with few observations and many subjects. Autoregressive SEM models are also applicable under similar conditions. Model-fitting options are available in SPSS, R, and SAS via PROC ARMA.

ARMA modeling also requires considerable training in the method and rather advanced knowledge about statistical methods (e.g., Kratochwill & Levin, 1992 ). However, Brossart et al. (2006) point out that ARMA-based approaches can produce excellent results when there is no “model finding” and a simple lag-1 model, with no differencing and no moving average, is used. This approach can be taken for many SCED applications when phase- or slope-change analyses are of interest with a single, or very few, subjects. As already mentioned, this method is particularly useful when one is seeking to account for autocorrelation or other over-time variations that are not directly related to the experimental or intervention effect of interest (i.e., detrending). ARMA and other time-series analysis methods require missing data to be managed prior to analysis by means of options such as full information maximum likelihood estimation, multiple imputation, or the Kalman filter (see Box & Jenkins, 1970 ; Hamilton, 1994 ; Shumway & Stoffer, 1982 ) because listwise deletion has been shown to result in inaccurate time-series parameter estimates ( Velicer & Colby, 2005a ).

Standardized mean differences

Standardized mean differences approaches include the common Cohen’s d , Glass’s Delta, and Hedge’s g that are used in the analysis of group designs. The computational properties of mean differences approaches to SCEDs are identical to those used for group comparisons, except that the results represent within-case variation instead of the variation between groups, which suggests that the obtained effect sizes are not interpretively equivalent. The advantage of the mean differences approach is its simplicity of calculation and also its familiarity to social scientists. The primary drawback of these approaches is that they were not developed to contend with autocorrelated data. However, Manolov and Solanas (2008) reported that autocorrelation least affected effect sizes calculated using standardized mean differences approaches. To the applied-research scientist this likely represents the most accessible analytic approach, because statistical software is not required to calculate these effect sizes. The resultant effect sizes of single subject standardized mean differences analysis must be interpreted cautiously because their relation to standard effect size benchmarks, such as those provided by Cohen (1988) , is unknown. Standardized mean differences approaches are appropriate only when examining significant differences between phases of the study and cannot illuminate trajectories or relationships between variables.

Other analytic approaches

Researchers have offered other analytic methods to deal with the characteristics of SCED data. A number of methods for analyzing N -of-1 experiments have been developed. Borckardt’s Simulation Modeling Analysis (2006) program provides a method for analyzing level- and slope-change in short (<30 observations per phase; see Borckardt et al., 2008 ), autocorrelated data streams that is statistically sophisticated, yet accessible and freely available to typical psychological scientists and clinicians. A replicated single-case time-series design conducted by Smith, Handler, & Nash (2010) provides an example of SMA application. The Singwin Package, described in Bloom et al., (2003) , is a another easy-to-use parametric approach for analyzing single-case experiments. A number of nonparametric approaches have also been developed that emerged from the visual analysis tradition: Some examples include percent nonoverlapping data ( Scruggs, Mastropieri, & Casto, 1987 ) and nonoverlap of all pairs ( Parker & Vannest, 2009 ); however, these methods have come under scrutiny, and Wolery, Busick, Reichow, and Barton (2010) have suggested abandoning them altogether. Each of these methods appears to be well suited for managing specific data characteristics, but they should not be used to analyze data streams beyond their intended purpose until additional empirical research is conducted.

Combining SCED Results

Beyond the issue of single-case analysis is the matter of integrating and meta-analyzing the results of single-case experiments. SCEDs have been given short shrift in the majority of meta-analytic literature ( Littell, Corcoran, & Pillai, 2008 ; Shadish et al., 2008 ), with only a few exceptions ( Carr et al., 1999 ; Horner & Spaulding, 2010 ). Currently, few proven methods exist for integrating the results of multiple single-case experiments. Allison and Gorman (1993) and Shadish et al. (2008) present the problems associated with meta-analyzing single-case effect sizes, and W. P. Jones (2003) , Manolov and Solanas (2008) , Scruggs and Mastropieri (1998) , and Shadish et al. (2008) offer four different potential statistical solutions for this problem, none of which appear to have received consensus amongst researchers. The ability to synthesize and compare single-case effect sizes, particularly effect sizes garnered through group design research, is undoubtedly necessary to increase SCED proliferation.

Discussion of Review Results and Coding of Analytic Methods

The coding criteria for this review were quite stringent in terms of what was considered to be either visual or statistical analysis. For visual analysis to be coded as present, it was necessary for the authors to self-identify as having used a visual analysis method. In many cases, it could likely be inferred that visual analysis had been used, but it was often not specified. Similarly, statistical analysis was reserved for analytic methods that produced an effect. 3 Analyses that involved comparing magnitude of change using raw count data or percentages were not considered rigorous enough. These two narrow definitions of visual and statistical analysis contributed to the high rate of unreported analytic method, shown in Table 1 (52.3%). A better representation of the use of visual and statistical analysis would likely be the percentage of studies within those that reported a method of analysis. Under these parameters, 41.5% used visual analysis and 31.3% used statistical analysis. Included in these figures are studies that included both visual and statistical methods (11%). These findings are slightly higher than those estimated by Brossart et al. (2006) , who estimated statistical analysis is used in about 20% of SCED studies. Visual analysis continues to undoubtedly be the most prevalent method, but there appears to be a trend for increased use of statistical approaches, which is likely to only gain momentum as innovations continue.

Analysis Standards

The standards selected for inclusion in this review offer minimal direction in the way of analyzing the results of SCED research. Table 5 summarizes analysis-related information provided by the six reviewed sources for SCED standards. Visual analysis is acceptable to DV12 and DIV16, along with unspecified statistical approaches. In the WWC standards, visual analysis is the acceptable method of determining an intervention effect, with statistical analyses and randomization tests permissible as a complementary or supporting method to the results of visual analysis methods. However, the authors of the WWC standards state, “As the field reaches greater consensus about appropriate statistical analyses and quantitative effect-size measures, new standards for effect demonstration will need to be developed” ( Kratochwill et al., 2010 , p.16). The NRP and DIV12 seem to prefer statistical methods when they are warranted. The Tate at al. scale accepts only statistical analysis with the reporting of an effect size. Only the WWC and DIV16 provide guidance in the use of statistical analysis procedures: The WWC “recommends” nonparametric and parametric approaches, multilevel modeling, and regression when statistical analysis is used. DIV16 refers the reader to Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999) for direction in this matter. Statistical analysis of daily diary and EMA methods is similarly unsettled. Stone and Shiffman (2002) ask for a detailed description of the statistical procedures used, in order for the approach to be replicated and evaluated. They provide direction for analyzing aggregated and disaggregated data. They also aptly note that because many different modes of analysis exist, researchers must carefully match the analytic approach to the hypotheses being pursued.

Limitations and Future Directions

This review has a number of limitations that leave the door open for future study of SCED methodology. Publication bias is a concern in any systematic review. This is particularly true for this review because the search was limited to articles published in peer-reviewed journals. This strategy was chosen in order to inform changes in the practice of reporting and of reviewing, but it also is likely to have inflated the findings regarding the methodological rigor of the reviewed works. Inclusion of book chapters, unpublished studies, and dissertations would likely have yielded somewhat different results.

A second concern is the stringent coding criteria in regard to the analytic methods and the broad categorization into visual and statistical analytic approaches. The selection of an appropriate method for analyzing SCED data is perhaps the murkiest area of this type of research. Future reviews that evaluate the appropriateness of selected analytic strategies and provide specific decision-making guidelines for researchers would be a very useful contribution to the literature. Although six sources of standards apply to SCED research reviewed in this article, five of them were developed almost exclusively to inform psychological and behavioral intervention research. The principles of SCED research remain the same in different contexts, but there is a need for non–intervention scientists to weigh in on these standards.

Finally, this article provides a first step in the synthesis of the available SCED reporting guidelines. However, it does not resolve disagreements, nor does it purport to be a definitive source. In the future, an entity with the authority to construct such a document ought to convene and establish a foundational, adaptable, and agreed-upon set of guidelines that cuts across subspecialties but is applicable to many, if not all, areas of psychological research, which is perhaps an idealistic goal. Certain preferences will undoubtedly continue to dictate what constitutes acceptable practice in each subspecialty of psychology, but uniformity along critical dimensions will help advance SCED research.

Conclusions

The first decade of the twenty-first century has seen an upwelling of SCED research across nearly all areas of psychology. This article contributes updated benchmarks in terms of the frequency with which SCED design and methodology characteristics are used, including the number of baseline observations, assessment and measurement practices, and data analytic approaches, most of which are largely consistent with previously reported benchmarks. However, this review is much broader than those of previous research teams and also breaks down the characteristics of single-case research by the predominant design. With the recent SCED proliferation came a number of standards for the conduct and reporting of such research. This article also provides a much-needed synthesis of recent SCED standards that can inform the work of researchers, reviewers, and funding agencies conducting and evaluating single-case research, which reveals many areas of consensus as well as areas of significant disagreement. It appears that the question of where to go next is very relevant at this point in time. The majority of the research design and measurement characteristics of the SCED are reasonably well established, and the results of this review suggest general practice that is in accord with existing standards and guidelines, at least in regard to published peer-reviewed works. In general, the published literature appears to be meeting the basic design and measurement requirement to ensure adequate internal validity of SCED studies.

Consensus regarding the superiority of any one analytic method stands out as an area of divergence. Judging by the current literature and lack of consensus, researchers will need to carefully select a method that matches the research design, hypotheses, and intended conclusions of the study, while also considering the most up-to-date empirical support for the chosen analytic method, whether it be visual or statistical. In some cases the number of observations and subjects in the study will dictate which analytic methods can and cannot be used. In the case of the true N -of-1 experiment, there are relatively few sound analytic methods, and even fewer that are robust with shorter data streams (see Borckardt et al., 2008 ). As the number of observations and subjects increases, sophisticated modeling techniques, such as MLM, SEM, and ARMA, become applicable. Trends in the data and autocorrelation further obfuscate the development of a clear statistical analysis selection algorithm, which currently does not exist. Autocorrelation was rarely addressed or discussed in the articles reviewed, except when the selected statistical analysis dictated consideration. Given the empirical evidence regarding the effect of autocorrelation on visual and statistical analysis, researchers need to address this more explicitly. Missing-data considerations are similarly left out when they are unnecessary for analytic purposes. As newly devised statistical analysis approaches mature and are compared with one another for appropriateness in specific SCED applications, guidelines for statistical analysis will necessarily be revised. Similarly, empirically derived guidance, in the form of a decision tree, must be developed to ensure application of appropriate methods based on characteristics of the data and the research questions being addressed. Researchers could also benefit from tutorials and comparative reviews of different software packages: This is a needed area of future research. Powerful and reliable statistical analyses help move the SCED up the ladder of experimental designs and attenuate the view that the method applies primarily to pilot studies and idiosyncratic research questions and situations.

Another potential future advancement of SCED research comes in the area of measurement. Currently, SCED research gives significant weight to observer ratings and seems to discourage other forms of data collection methods. This is likely due to the origins of the SCED in behavioral assessment and applied behavior analysis, which remains a present-day stronghold. The dearth of EMA and diary-like sampling procedures within the SCED research reviewed, yet their ever-growing prevalence in the larger psychological research arena, highlights an area for potential expansion. Observational measurement, although reliable and valid in many contexts, is time and resource intensive and not feasible in all areas in which psychologists conduct research. It seems that numerous untapped research questions are stifled because of this measurement constraint. SCED researchers developing updated standards in the future should include guidelines for the appropriate measurement requirement of non-observer-reported data. For example, the results of this review indicate that reporting of repeated measurements, particularly the high-density type found in diary and EMA sampling strategies, ought to be more clearly spelled out, with specific attention paid to autocorrelation and trend in the data streams. In the event that SCED researchers adopt self-reported assessment strategies as viable alternatives to observation, a set of standards explicitly identifying the necessary psychometric properties of the measures and specific items used would be in order.

Along similar lines, SCED researchers could take a page from other areas of psychology that champion multimethod and multisource evaluation of primary outcomes. In this way, the long-standing tradition of observational assessment and the cutting-edge technological methods of EMA and daily diary could be married with the goal of strengthening conclusions drawn from SCED research and enhancing the validity of self-reported outcome assessment. The results of this review indicate that they rarely intersect today, and I urge SCED researchers to adopt other methods of assessment informed by time-series, daily diary, and EMA methods. The EMA standards could serve as a jumping-off point for refined measurement and assessment reporting standards in the context of multimethod SCED research.

One limitation of the current SCED standards is their relatively limited scope. To clarify, with the exception of the Stone & Shiffman EMA reporting guidelines, the other five sources of standards were developed in the context of designing and evaluating intervention research. Although this is likely to remain its patent emphasis, SCEDs are capable of addressing other pertinent research questions in the psychological sciences, and the current standards truly only roughly approximate salient crosscutting SCED characteristics. I propose developing broad SCED guidelines that address the specific design, measurement, and analysis issues in a manner that allows it to be useful across applications, as opposed to focusing solely on intervention effects. To accomplish this task, methodology experts across subspecialties in psychology would need to convene. Admittedly this is no small task.

Perhaps funding agencies will also recognize the fiscal and practical advantages of SCED research in certain areas of psychology. One example is in the field of intervention effectiveness, efficacy, and implementation research. A few exemplary studies using robust forms of SCED methodology are needed in the literature. Case-based methodologies will never supplant the group design as the gold standard in experimental applications, nor should that be the goal. Instead, SCEDs provide a viable and valid alternative experimental methodology that could stimulate new areas of research and answer questions that group designs cannot. With the astonishing number of studies emerging every year that use single-case designs and explore the methodological aspects of the design, we are poised to witness and be a part of an upsurge in the sophisticated application of the SCED. When federal grant-awarding agencies and journal editors begin to use formal standards while making funding and publication decisions, the field will benefit.

Last, for the practice of SCED research to continue and mature, graduate training programs must provide students with instruction in all areas of the SCED. This is particularly true of statistical analysis techniques that are not often taught in departments of psychology and education, where the vast majority of SCED studies seem to be conducted. It is quite the conundrum that the best available statistical analytic methods are often cited as being inaccessible to social science researchers who conduct this type of research. This need not be the case. To move the field forward, emerging scientists must be able to apply the most state-of-the-art research designs, measurement techniques, and analytic methods.

Acknowledgments

Research support for the author was provided by research training grant MH20012 from the National Institute of Mental Health, awarded to Elizabeth A. Stormshak. The author gratefully acknowledges Robert Horner and Laura Lee McIntyre, University of Oregon; Michael Nash, University of Tennessee; John Ferron, University of South Florida; the Action Editor, Lisa Harlow, and the anonymous reviewers for their thoughtful suggestions and guidance in shaping this article; Cheryl Mikkola for her editorial support; and Victoria Mollison for her assistance in the systematic review process.

Appendix. Results of Systematic Review Search and Studies Included in the Review

Psycinfo search conducted july 2011.

  • Alternating treatment design
  • Changing criterion design
  • Experimental case*
  • Multiple baseline design
  • Replicated single-case design
  • Simultaneous treatment design
  • Time-series design
  • Quantitative study OR treatment outcome/randomized clinical trial
  • NOT field study OR interview OR focus group OR literature review OR systematic review OR mathematical model OR qualitative study
  • Publication range: 2000–2010
  • Published in peer-reviewed journals
  • Available in the English Language

Bibliography

(* indicates inclusion in study: N = 409)

1 Autocorrelation estimates in this range can be caused by trends in the data streams, which creates complications in terms of detecting level-change effects. The Smith et al. (in press) study used a Monte Carlo simulation to control for trends in the data streams, but trends are likely to exist in real-world data with high lag-1 autocorrelation estimates.

2 The author makes no endorsement regarding the superiority of any statistical program or package over another by their mention or exclusion in this article. The author also has no conflicts of interest in this regard.

3 However, it should be noted that it was often very difficult to locate an actual effect size reported in studies that used statistical analysis. Although this issue would likely have added little to this review, it does inhibit the inclusion of the results in meta-analysis.

  • Albright JJ, Marinova DM. Estimating multilevel modelsuUsing SPSS, Stata, and SAS. Indiana University; 2010. Retrieved from http://www.iub.edu/%7Estatmath/stat/all/hlm/hlm.pdf . [ Google Scholar ]
  • Allison DB, Gorman BS. Calculating effect sizes for meta-analysis: The case of the single case. Behavior Research and Therapy. 1993; 31 (6):621–631. doi: 10.1016/0005-7967(93)90115-B. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alloy LB, Just N, Panzarella C. Attributional style, daily life events, and hopelessness depression: Subtype validation by prospective variability and specificity of symptoms. Cognitive Therapy Research. 1997; 21 :321–344. doi: 10.1023/A:1021878516875. [ CrossRef ] [ Google Scholar ]
  • Arbuckle JL. Amos (Version 7.0) Chicago, IL: SPSS, Inc; 2006. [ Google Scholar ]
  • Barlow DH, Nock MK, Hersen M. Single case research designs: Strategies for studying behavior change. 3. New York, NY: Allyn and Bacon; 2008. [ Google Scholar ]
  • Barrett LF, Barrett DJ. An introduction to computerized experience sampling in psychology. Social Science Computer Review. 2001; 19 (2):175–185. doi: 10.1177/089443930101900204. [ CrossRef ] [ Google Scholar ]
  • Bloom M, Fisher J, Orme JG. Evaluating practice: Guidelines for the accountable professional. 4. Boston, MA: Allyn & Bacon; 2003. [ Google Scholar ]
  • Bolger N, Davis A, Rafaeli E. Diary methods: Capturing life as it is lived. Annual Review of Psychology. 2003; 54 :579–616. doi: 10.1146/annurev.psych.54.101601.145030. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borckardt JJ. Simulation Modeling Analysis: Time series analysis program for short time series data streams (Version 8.3.3) Charleston, SC: Medical University of South Carolina; 2006. [ Google Scholar ]
  • Borckardt JJ, Nash MR, Murphy MD, Moore M, Shaw D, O’Neil P. Clinical practice as natural laboratory for psychotherapy research. American Psychologist. 2008; 63 :1–19. doi: 10.1037/0003-066X.63.2.77. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borsboom D, Mellenbergh GJ, van Heerden J. The theoretical status of latent variables. Psychological Review. 2003; 110 (2):203–219. doi: 10.1037/0033-295X.110.2.203. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bower GH. Mood and memory. American Psychologist. 1981; 36 (2):129–148. doi: 10.1037/0003-066x.36.2.129. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Box GEP, Jenkins GM. Time-series analysis: Forecasting and control. San Francisco, CA: Holden-Day; 1970. [ Google Scholar ]
  • Brossart DF, Parker RI, Olson EA, Mahadevan L. The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification. 2006; 30 (5):531–563. doi: 10.1177/0145445503261167. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Browne MW, Nesselroade JR. Representing psychological processes with dynamic factor models: Some promising uses and extensions of autoregressive moving average time series models. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics: A festschrift for Roderick P McDonald. Mahwah, NJ: Lawrence Erlbaum Associates Publishers; 2005. pp. 415–452. [ Google Scholar ]
  • Busk PL, Marascuilo LA. Statistical analysis in single-case research: Issues, procedures, and recommendations, with applications to multiple behaviors. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ, England: Lawrence Erlbaum Associates, Inc; 1992. pp. 159–185. [ Google Scholar ]
  • Busk PL, Marascuilo RC. Autocorrelation in single-subject research: A counterargument to the myth of no autocorrelation. Behavioral Assessment. 1988; 10 :229–242. [ Google Scholar ]
  • Campbell JM. Statistical comparison of four effect sizes for single-subject designs. Behavior Modification. 2004; 28 (2):234–246. doi: 10.1177/0145445503259264. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carr EG, Horner RH, Turnbull AP, Marquis JG, Magito McLaughlin D, McAtee ML, Doolabh A. Positive behavior support for people with developmental disabilities: A research synthesis. Washington, DC: American Association on Mental Retardation; 1999. [ Google Scholar ]
  • Center BA, Skiba RJ, Casey A. A methodology for the quantitative synthesis of intra-subject design research. Journal of Educational Science. 1986; 19 :387–400. doi: 10.1177/002246698501900404. [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Hollon SD. Defining empirically supported therapies. Journal of Consulting and Clinical Psychology. 1998; 66 (1):7–18. doi: 10.1037/0022-006X.66.1.7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chambless DL, Ollendick TH. Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology. 2001; 52 :685–716. doi: 10.1146/annurev.psych.52.1.685. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S-M, Ho M-hR, Hamaker EL, Dolan CV. Equivalence and differences between structural equation modeling and state-space modeling techniques. Structural Equation Modeling. 2010; 17 (2):303–332. doi: 10.1080/10705511003661553. [ CrossRef ] [ Google Scholar ]
  • Cohen J. Statistical power analysis for the bahavioral sciences. 2. Hillsdale, NJ: Erlbaum; 1988. [ Google Scholar ]
  • Cohen J. The earth is round (p < .05) American Psychologist. 1994; 49 :997–1003. doi: 10.1037/0003-066X.49.12.997. [ CrossRef ] [ Google Scholar ]
  • Crosbie J. Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology. 1993; 61 (6):966–974. doi: 10.1037/0022-006X.61.6.966. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dattilio FM, Edwards JA, Fishman DB. Case studies within a mixed methods paradigm: Toward a resolution of the alienation between researcher and practitioner in psychotherapy research. Psychotherapy: Theory, Research, Practice, Training. 2010; 47 (4):427–441. doi: 10.1037/a0021181. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dempster A, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977; 39 (1):1–38. [ Google Scholar ]
  • Des Jarlais DC, Lyles C, Crepaz N. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. American Journal of Public Health. 2004; 94 (3):361–366. doi: 10.2105/ajph.94.3.361. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Diggle P, Liang KY. Analyses of longitudinal data. New York: Oxford University Press; 2001. [ Google Scholar ]
  • Doss BD, Atkins DC. Investigating treatment mediators when simple random assignment to a control group is not possible. Clinical Psychology: Science and Practice. 2006; 13 (4):321–336. doi: 10.1111/j.1468-2850.2006.00045.x. [ CrossRef ] [ Google Scholar ]
  • du Toit SHC, Browne MW. The covariance structure of a vector ARMA time series. In: Cudeck R, du Toit SHC, Sörbom D, editors. Structural equation modeling: Present and future. Lincolnwood, IL: Scientific Software International; 2001. pp. 279–314. [ Google Scholar ]
  • du Toit SHC, Browne MW. Structural equation modeling of multivariate time series. Multivariate Behavioral Research. 2007; 42 :67–101. doi: 10.1080/00273170701340953. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fechner GT. Elemente der psychophysik [Elements of psychophysics] Leipzig, Germany: Breitkopf & Hartel; 1889. [ Google Scholar ]
  • Ferron J, Sentovich C. Statistical power of randomization tests used with multiple-baseline designs. The Journal of Experimental Education. 2002; 70 :165–178. doi: 10.1080/00220970209599504. [ CrossRef ] [ Google Scholar ]
  • Ferron J, Ware W. Analyzing single-case data: The power of randomization tests. The Journal of Experimental Education. 1995; 63 :167–178. [ Google Scholar ]
  • Fox J. TEACHER’S CORNER: Structural equation modeling with the sem package in R. Structural Equation Modeling: A Multidisciplinary Journal. 2006; 13 (3):465–486. doi: 10.1207/s15328007sem1303_7. [ CrossRef ] [ Google Scholar ]
  • Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahwah, NJ: Lawrence Erlbaum Associates; 1997. [ Google Scholar ]
  • Franklin RD, Gorman BS, Beasley TM, Allison DB. Graphical display and visual analysis. In: Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahway, NJ: Lawrence Erlbaum Associates, Publishers; 1997. pp. 119–158. [ Google Scholar ]
  • Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin. 1995; 118 (3):392–404. doi: 10.1037/0033-2909.118.3.392. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Green AS, Rafaeli E, Bolger N, Shrout PE, Reis HT. Paper or plastic? Data equivalence in paper and electronic diaries. Psychological Methods. 2006; 11 (1):87–105. doi: 10.1037/1082-989X.11.1.87. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hamilton JD. Time series analysis. Princeton, NJ: Princeton University Press; 1994. [ Google Scholar ]
  • Hammond D, Gast DL. Descriptive analysis of single-subject research designs: 1983–2007. Education and Training in Autism and Developmental Disabilities. 2010; 45 :187–202. [ Google Scholar ]
  • Hanson MD, Chen E. Daily stress, cortisol, and sleep: The moderating role of childhood psychosocial environments. Health Psychology. 2010; 29 (4):394–402. doi: 10.1037/a0019879. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harvey AC. Forecasting, structural time series models and the Kalman filter. Cambridge, MA: Cambridge University Press; 2001. [ Google Scholar ]
  • Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single-subject research to identify evidence-based practice in special education. Exceptional Children. 2005; 71 :165–179. [ Google Scholar ]
  • Horner RH, Spaulding S. Single-case research designs. In: Salkind NJ, editor. Encyclopedia of research design. Thousand Oaks, CA: Sage Publications; 2010. [ Google Scholar ]
  • Horton NJ, Kleinman KP. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician. 2007; 61 (1):79–90. doi: 10.1198/000313007X172556. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hser Y, Shen H, Chou C, Messer SC, Anglin MD. Analytic approaches for assessing long-term treatment effects. Evaluation Review. 2001; 25 (2):233–262. doi: 10.1177/0193841X0102500206. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huitema BE. Autocorrelation in applied behavior analysis: A myth. Behavioral Assessment. 1985; 7 (2):107–118. [ Google Scholar ]
  • Huitema BE, McKean JW. Reduced bias autocorrelation estimation: Three jackknife methods. Educational and Psychological Measurement. 1994; 54 (3):654–665. doi: 10.1177/0013164494054003008. [ CrossRef ] [ Google Scholar ]
  • Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH. Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association. 2005; 100 (469):332–346. doi: 10.1198/016214504000001844. [ CrossRef ] [ Google Scholar ]
  • Institute of Medicine. Reducing risks for mental disorders: Frontiers for preventive intervention research. Washington, DC: National Academy Press; 1994. [ PubMed ] [ Google Scholar ]
  • Jacobsen NS, Christensen A. Studying the effectiveness of psychotherapy: How well can clinical trials do the job? American Psychologist. 1996; 51 :1031–1039. doi: 10.1037/0003-066X.51.10.1031. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones RR, Vaught RS, Weinrott MR. Time-series analysis in operant research. Journal of Behavior Analysis. 1977; 10 (1):151–166. doi: 10.1901/jaba.1977.10-151. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones WP. Single-case time series with Bayesian analysis: A practitioner’s guide. Measurement and Evaluation in Counseling and Development. 2003; 36 (28–39) [ Google Scholar ]
  • Kanfer H. Self-monitoring: Methodological limitations and clinical applications. Journal of Consulting and Clinical Psychology. 1970; 35 (2):148–152. doi: 10.1037/h0029874. [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Drawing valid inferences from case studies. Journal of Consulting and Clinical Psychology. 1981; 49 (2):183–192. doi: 10.1037/0022-006X.49.2.183. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology. 2007; 3 :1–27. doi: 10.1146/annurev.clinpsy.3.022806.091432. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care. American Psychologist. 2008; 63 (3):146–159. doi: 10.1037/0003-066X.63.3.146. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Understanding how and why psychotherapy leads to change. Psychotherapy Research. 2009; 19 (4):418–428. doi: 10.1080/10503300802448899. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kazdin AE. Single-case research designs: Methods for clinical and applied settings. 2. New York, NY: Oxford University Press; 2010. [ Google Scholar ]
  • Kirk RE. Practical significance: A concept whose time has come. Educational and Psychological Measurement. 1996; 56 :746–759. doi: 10.1177/0013164496056005002. [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR. Preparing psychologists for evidence-based school practice: Lessons learned and challenges ahead. American Psychologist. 2007; 62 :829–843. doi: 10.1037/0003-066X.62.8.829. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR. Single-case designs technical documentation. 2010 Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf . Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .
  • Kratochwill TR, Levin JR. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; 1992. [ Google Scholar ]
  • Kratochwill TR, Levin JR. Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods. 2010; 15 (2):124–144. doi: 10.1037/a0017736. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Levin JR, Horner RH, Swoboda C. Visual analysis of single-case intervention research: Conceptual and methodological considerations (WCER Working Paper No. 2011-6) 2011 Retrieved from University of Wisconsin–Madison, Wisconsin Center for Education Research website: http://www.wcer.wisc.edu/publications/workingPapers/papers.php .
  • Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics. 1992; 34 (1):1–14. [ Google Scholar ]
  • Lambert MJ, Hansen NB, Harmon SC. Developing and Delivering Practice-Based Evidence. John Wiley & Sons, Ltd; 2010. Outcome Questionnaire System (The OQ System): Development and practical applications in healthcare settings; pp. 139–154. [ Google Scholar ]
  • Littell JH, Corcoran J, Pillai VK. Systematic reviews and meta-analysis. New York: Oxford University Press; 2008. [ Google Scholar ]
  • Liu LM, Hudack GB. The SCA statistical system. Vector ARMA modeling of multiple time series. Oak Brook, IL: Scientific Computing Associates Corporation; 1995. [ Google Scholar ]
  • Lubke GH, Muthén BO. Investigating population heterogeneity with factor mixture models. Psychological Methods. 2005; 10 (1):21–39. doi: 10.1037/1082-989x.10.1.21. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manolov R, Solanas A. Comparing N = 1 effect sizes in presence of autocorrelation. Behavior Modification. 2008; 32 (6):860–875. doi: 10.1177/0145445508318866. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marshall RJ. Autocorrelation estimation of time series with randomly missing observations. Biometrika. 1980; 67 (3):567–570. doi: 10.1093/biomet/67.3.567. [ CrossRef ] [ Google Scholar ]
  • Matyas TA, Greenwood KM. Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention effects. Journal of Applied Behavior Analysis. 1990; 23 (3):341–351. doi: 10.1901/jaba.1990.23-341. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kratochwill TR, Chair Members of the Task Force on Evidence-Based Interventions in School Psychology. Procedural and coding manual for review of evidence-based interventions. 2003 Retrieved July 18, 2011 from http://www.sp-ebi.org/documents/_workingfiles/EBImanual1.pdf .
  • Moher D, Schulz KF, Altman DF the CONSORT Group. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001; 285 :1987–1991. doi: 10.1001/jama.285.15.1987. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morgan DL, Morgan RK. Single-participant research design: Bringing science to managed care. American Psychologist. 2001; 56 (2):119–127. doi: 10.1037/0003-066X.56.2.119. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Muthén BO, Curran PJ. General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods. 1997; 2 (4):371–402. doi: 10.1037/1082-989x.2.4.371. [ CrossRef ] [ Google Scholar ]
  • Muthén LK, Muthén BO. Mplus (Version 6.11) Los Angeles, CA: Muthén & Muthén; 2010. [ Google Scholar ]
  • Nagin DS. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods. 1999; 4 (2):139–157. doi: 10.1037/1082-989x.4.2.139. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Institute of Child Health and Human Development. Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769) Washington, DC: U.S. Government Printing Office; 2000. [ Google Scholar ]
  • Olive ML, Smith BW. Effect size calculations and single subject designs. Educational Psychology. 2005; 25 (2–3):313–324. doi: 10.1080/0144341042000301238. [ CrossRef ] [ Google Scholar ]
  • Oslin DW, Cary M, Slaymaker V, Colleran C, Blow FC. Daily ratings measures of alcohol craving during an inpatient stay define subtypes of alcohol addiction that predict subsequent risk for resumption of drinking. Drug and Alcohol Dependence. 2009; 103 (3):131–136. doi: 10.1016/J.Drugalcdep.2009.03.009. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Palermo TP, Valenzuela D, Stork PP. A randomized trial of electronic versus paper pain diaries in children: Impact on compliance, accuracy, and acceptability. Pain. 2004; 107 (3):213–219. doi: 10.1016/j.pain.2003.10.005. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parker RI, Brossart DF. Evaluating single-case research data: A comparison of seven statistical methods. Behavior Therapy. 2003; 34 (2):189–211. doi: 10.1016/S0005-7894(03)80013-8. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Cryer J, Byrns G. Controlling baseline trend in single case research. School Psychology Quarterly. 2006; 21 (4):418–440. doi: 10.1037/h0084131. [ CrossRef ] [ Google Scholar ]
  • Parker RI, Vannest K. An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy. 2009; 40 (4):357–367. doi: 10.1016/j.beth.2008.10.006. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parsonson BS, Baer DM. The analysis and presentation of graphic data. In: Kratochwill TR, editor. Single subject research. New York, NY: Academic Press; 1978. pp. 101–166. [ Google Scholar ]
  • Parsonson BS, Baer DM. The visual analysis of data, and current research into the stimuli controlling it. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ; England: Lawrence Erlbaum Associates, Inc; 1992. pp. 15–40. [ Google Scholar ]
  • Piasecki TM, Hufford MR, Solham M, Trull TJ. Assessing clients in their natural environments with electronic diaries: Rationale, benefits, limitations, and barriers. Psychological Assessment. 2007; 19 (1):25–43. doi: 10.1037/1040-3590.19.1.25. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. [ Google Scholar ]
  • Ragunathan TE. What do we do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health. 2004; 25 :99–117. doi: 10.1146/annurev.publhealth.25.102802.124410. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Raudenbush SW, Bryk AS, Congdon R. HLM 7 Hierarchical Linear and Nonlinear Modeling. Scientific Software International, Inc; 2011. [ Google Scholar ]
  • Redelmeier DA, Kahneman D. Patients’ memories of painful medical treatments: Real-time and retrospective evaluations of two minimally invasive procedures. Pain. 1996; 66 (1):3–8. doi: 10.1016/0304-3959(96)02994-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Reis HT. Domains of experience: Investigating relationship processes from three perspectives. In: Erber R, Gilmore R, editors. Theoretical frameworks in personal relationships. Mahwah, NJ: Erlbaum; 1994. pp. 87–110. [ Google Scholar ]
  • Reis HT, Gable SL. Event sampling and other methods for studying everyday experience. In: Reis HT, Judd CM, editors. Handbook of research methods in social and personality psychology. New York, NY: Cambridge University Press; 2000. pp. 190–222. [ Google Scholar ]
  • Robey RR, Schultz MC, Crawford AB, Sinner CA. Single-subject clinical-outcome research: Designs, data, effect sizes, and analyses. Aphasiology. 1999; 13 (6):445–473. doi: 10.1080/026870399402028. [ CrossRef ] [ Google Scholar ]
  • Rossi PH, Freeman HE. Evaluation: A systematic approach. 5. Thousand Oaks, CA: Sage; 1993. [ Google Scholar ]
  • SAS Institute Inc. The SAS system for Windows, Version 9. Cary, NC: SAS Institute Inc; 2008. [ Google Scholar ]
  • Schmidt M, Perels F, Schmitz B. How to perform idiographic and a combination of idiographic and nomothetic approaches: A comparison of time series analyses and hierarchical linear modeling. Journal of Psychology. 2010; 218 (3):166–174. doi: 10.1027/0044-3409/a000026. [ CrossRef ] [ Google Scholar ]
  • Scollon CN, Kim-Pietro C, Diener E. Experience sampling: Promises and pitfalls, strengths and weaknesses. Assessing Well-Being. 2003; 4 :5–35. doi: 10.1007/978-90-481-2354-4_8. [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA. Summarizing single-subject research: Issues and applications. Behavior Modification. 1998; 22 (3):221–242. doi: 10.1177/01454455980223001. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Scruggs TE, Mastropieri MA, Casto G. The quantitative synthesis of single-subject research. Remedial and Special Education. 1987; 8 (2):24–33. doi: 10.1177/074193258700800206. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin; 2002. [ Google Scholar ]
  • Shadish WR, Rindskopf DM, Hedges LV. The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment and Intervention. 2008; 3 :188–196. doi: 10.1080/17489530802581603. [ CrossRef ] [ Google Scholar ]
  • Shadish WR, Sullivan KJ. Characteristics of single-case designs used to assess treatment effects in 2008. Behavior Research Methods. 2011; 43 :971–980. doi: 10.3758/s13428-011-0111-y. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sharpley CF. Time-series analysis of behavioural data: An update. Behaviour Change. 1987; 4 :40–45. [ Google Scholar ]
  • Shiffman S, Hufford M, Hickcox M, Paty JA, Gnys M, Kassel JD. Remember that? A comparison of real-time versus retrospective recall of smoking lapses. Journal of Consulting and Clinical Psychology. 1997; 65 :292–300. doi: 10.1037/0022-006X.65.2.292.a. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shiffman S, Stone AA. Ecological momentary assessment: A new tool for behavioral medicine research. In: Krantz DS, Baum A, editors. Technology and methods in behavioral medicine. Mahwah, NJ: Erlbaum; 1998. pp. 117–131. [ Google Scholar ]
  • Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annual Review of Clinical Psychology. 2008; 4 :1–32. doi: 10.1146/annurev.clinpsy.3.022806.091415. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the EM Algorithm. Journal of Time Series Analysis. 1982; 3 (4):253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [ CrossRef ] [ Google Scholar ]
  • Skinner BF. The behavior of organisms. New York, NY: Appleton-Century-Crofts; 1938. [ Google Scholar ]
  • Smith JD, Borckardt JJ, Nash MR. Inferential precision in single-case time-series datastreams: How well does the EM Procedure perform when missing observations occur in autocorrelated data? Behavior Therapy. doi: 10.1016/j.beth.2011.10.001. (in press) [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith JD, Handler L, Nash MR. Therapeutic Assessment for preadolescent boys with oppositional-defiant disorder: A replicated single-case time-series design. Psychological Assessment. 2010; 22 (3):593–602. doi: 10.1037/a0019697. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage; 1999. [ Google Scholar ]
  • Soliday E, Moore KJ, Lande MB. Daily reports and pooled time series analysis: Pediatric psychology applications. Journal of Pediatric Psychology. 2002; 27 (1):67–76. doi: 10.1093/jpepsy/27.1.67. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • SPSS Statistics. Chicago, IL: SPSS Inc; 2011. (Version 20.0.0) [ Google Scholar ]
  • StataCorp. Stata Statistical Software: Release 12. College Station, TX: StataCorp LP; 2011. [ Google Scholar ]
  • Stone AA, Broderick JE, Kaell AT, Deles-Paul PAEG, Porter LE. Does the peak-end phenomenon observed in laboratory pain studies apply to real-world pain in rheumatoid arthritics? Journal of Pain. 2000; 1 :212–217. doi: 10.1054/jpai.2000.7568. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stone AA, Shiffman S. Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine. 2002; 24 :236–243. doi: 10.1207/S15324796ABM2403_09. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stout RL. Advancing the analysis of treatment process. Addiction. 2007; 102 :1539–1545. doi: 10.1111/j.1360-0443.2007.01880.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tate RL, McDonald S, Perdices M, Togher L, Schultz R, Savage S. Rating the methodological quality of single-subject designs and N-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation. 2008; 18 (4):385–401. doi: 10.1080/09602010802009201. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thiele C, Laireiter A-R, Baumann U. Diaries in clinical psychology and psychotherapy: A selective review. Clinical Psychology & Psychotherapy. 2002; 9 (1):1–37. doi: 10.1002/cpp.302. [ CrossRef ] [ Google Scholar ]
  • Tiao GC, Box GEP. Modeling multiple time series with applications. Journal of the American Statistical Association. 1981; 76 :802–816. [ Google Scholar ]
  • Tschacher W, Ramseyer F. Modeling psychotherapy process by time-series panel analysis (TSPA) Psychotherapy Research. 2009; 19 (4):469–481. doi: 10.1080/10503300802654496. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. A comparison of missing-data procedures for ARIMA time-series analysis. Educational and Psychological Measurement. 2005a; 65 (4):596–615. doi: 10.1177/0013164404272502. [ CrossRef ] [ Google Scholar ]
  • Velicer WF, Colby SM. Missing data and the general transformation approach to time series analysis. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics. A festschrift to Roderick P McDonald. Hillsdale, NJ: Lawrence Erlbaum; 2005b. pp. 509–535. [ Google Scholar ]
  • Velicer WF, Fava JL. Time series analysis. In: Schinka J, Velicer WF, Weiner IB, editors. Research methods in psychology. Vol. 2. New York, NY: John Wiley & Sons; 2003. [ Google Scholar ]
  • Wachtel PL. Beyond “ESTs”: Problematic assumptions in the pursuit of evidence-based practice. Psychoanalytic Psychology. 2010; 27 (3):251–272. doi: 10.1037/a0020532. [ CrossRef ] [ Google Scholar ]
  • Watson JB. Behaviorism. New York, NY: Norton; 1925. [ Google Scholar ]
  • Weisz JR, Hawley KM. Finding, evaluating, refining, and applying empirically supported treatments for children and adolescents. Journal of Clinical Child Psychology. 1998; 27 :206–216. doi: 10.1207/s15374424jccp2702_7. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weisz JR, Hawley KM. Procedural and coding manual for identification of beneficial treatments. Washinton, DC: American Psychological Association, Society for Clinical Psychology, Division 12, Committee on Science and Practice; 1999. [ Google Scholar ]
  • Westen D, Bradley R. Empirically supported complexity. Current Directions in Psychological Science. 2005; 14 :266–271. doi: 10.1111/j.0963-7214.2005.00378.x. [ CrossRef ] [ Google Scholar ]
  • Westen D, Novotny CM, Thompson-Brenner HK. The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting controlled clinical trials. Psychological Bulletin. 2004; 130 :631–663. doi: 10.1037/0033-2909.130.4.631. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilkinson L The Task Force on Statistical Inference. Statistical methods in psychology journals: Guidelines and explanations. American Psychologist. 1999; 54 :694–704. doi: 10.1037/0003-066X.54.8.594. [ CrossRef ] [ Google Scholar ]
  • Wolery M, Busick M, Reichow B, Barton EE. Comparison of overlap methods for quantitatively synthesizing single-subject data. The Journal of Special Education. 2010; 44 (1):18–28. doi: 10.1177/0022466908328009. [ CrossRef ] [ Google Scholar ]
  • Wu Z, Huang NE, Long SR, Peng C-K. On the trend, detrending, and variability of nonlinear and nonstationary time series. Proceedings of the National Academy of Sciences. 2007; 104 (38):14889–14894. doi: 10.1073/pnas.0701020104. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Introducing single case study research design: an overview

Affiliation.

  • 1 Napier University, Edinburgh.
  • PMID: 27712405
  • DOI: 10.7748/nr.5.4.15.s3

An overview of case study research The use of case study design in research has a long and distinguished history in many disciplines ( 1 ). It is used in law, education, history, medicine, psychology and business ( 2 ). Yin ( 3 ) stated that case studies are essential for social science and that they are used extensively in practice-oriented professions. Examples of studies include those of Freud, from which he derived his theories of personality ( 4 ), and Whyte's study of neighbourhood gangs in Chicago ( 5 ), which was found by other researchers ( 6 ) to be a typical case and again, as in the example of Freud, led to the development of theory. The term 'case study' refers to both a process of inquiry - that is, studying a case or designing and executing a case study - and its end product, the case study or 'case report' ( 7 ).

PubMed Disclaimer

Similar articles

  • Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification. Wolahan SM, Hirt D, Glenn TC. Wolahan SM, et al. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
  • Methodologies and study designs relevant to medical education research. Turner TL, Balmer DF, Coverdale JH. Turner TL, et al. Int Rev Psychiatry. 2013 Jun;25(3):301-10. doi: 10.3109/09540261.2013.790310. Int Rev Psychiatry. 2013. PMID: 23859093 Review.
  • "The casual cruelty of our prejudices": on Walter Lippmann's theory of stereotype and its "obliteration" in psychology and social science. Bottom WP, Kong DT. Bottom WP, et al. J Hist Behav Sci. 2012 Fall;48(4):363-94. doi: 10.1002/jhbs.21565. Epub 2012 Aug 30. J Hist Behav Sci. 2012. PMID: 22936385
  • Theory is personal. Siegel AM. Siegel AM. Ann N Y Acad Sci. 2009 Apr;1159:287-300. doi: 10.1111/j.1749-6632.2008.04343.x. Ann N Y Acad Sci. 2009. PMID: 19379249
  • Interviews on Freud and Jung with Henry A. Murray in 1965. Roazen P. Roazen P. J Anal Psychol. 2003 Feb;48(1):1-2. J Anal Psychol. 2003. PMID: 12664714

LinkOut - more resources

Full text sources.

  • Ovid Technologies, Inc.
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Single Case Archive

"More discoveries have arisen from intense observation than from statistics applied to large groups" - W. I. B. Beveridge

"Was ist das Allgemeine? Das einzelne Fall. Was ist das Besondere? Millionen Fälle." - Goethe

"Das Messen eines Dings ist eine grobe Handlung, die auf lebendige Körper nicht anders als höchst unvollkommen angewendet werden kann." - Goethe

"Each person is an idiom unto himself, an apparent violation of the syntax of the species." - G.W. Allport

"Given a thimbleful of [dramatic] facts we rush to make generalizations as large as a tub" - G.W. Allport

"Our typical contemporary mainstream psychology group study in which one single procedure is carried out with large numbers of cases and then treated statistically seems to fall short of any genuine psychological contribution" - Watson, 1934

Welcome to Single Case Archive

The Single Case Archive compiles clinical, systematic and experimental single case studies in the field of psychotherapy. Currently ISI published single case studies from all different psychotherapeutic orientations are being included in the database. These case studies were screened by an international group of researchers for basic information on type of study, patient, therapist and therapy. The objective of this online archive is to facilitate the study of case studies for research, clinical and teaching purposes. With an easy to use search engine, the archive allows the quick identification of relatively homogenous sets of cases in function of specific clinical or research questions. For more information on this archive, see ‘About’.

Start your search

Start browsing psychotherapy case studies in The Archive

Latest News

Keynote

On Tuesday 19 March 2024, SCA board member Jochem Willemsen will give a keynote lecture at the Sigmund Freud U...

Case meta-synthesis

Case meta-synthesis

Metasynthesis is currently considered one of the most promising directions for qualitative...

Webinar

On January 4th, 2024 SCA board member Jochem Willemsen presented a webinar on case study methodology...

Case study-prize

Case study-prize

Three new prizes for evidence based case studies will be awarded by the Association for Psychoanalytic Ps...

Wim Trijsburg prize

Wim Trijsburg prize

The Dutch Association of Psychotherapy awarded their 2019 Wim Trijsburg prize to SCA board...

Another case study PhD

Another case study PhD

Another PhD study based on systematic case studies has been completed successfully! Ellen ...

Register to Single Case Archive

Register to SCA and get full access to The Archive and more!

Send us a case study

A quick and easy way to let us know about a case study you have seen.

Case of the Week

Case of the Week - N° 32: double description to provide a new therapeutic view

Case of the Week - N° 32: dou...

This week’s case points out the usefulness of the process of “double descripti...

Brought to you by

University of Essex

Thank you for registering to Single Case Archive. Your account needs to be activated before you can full access. You will be notified by email once your account has been activated.

Are you sure you want to remove this item?

single case study research

  • Subscribe to journal Subscribe
  • Get new issue alerts Get alerts

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Single-Case Design, Analysis, and Quality Assessment for Intervention Research

Lobo, Michele A. PT, PhD; Moeyaert, Mariola PhD; Baraldi Cunha, Andrea PT, PhD; Babik, Iryna PhD

Biomechanics & Movement Science Program, Department of Physical Therapy, University of Delaware, Newark, Delaware (M.A.L., A.B.C., I.B.); and Division of Educational Psychology & Methodology, State University of New York at Albany, Albany, New York (M.M.).

Correspondence: Michele A. Lobo, PT, PhD, Biomechanics & Movement Science Program, Department of Physical Therapy, University of Delaware, Newark, DE 19713 ( [email protected] ).

This research was supported by the National Institute of Health, Eunice Kennedy Shriver National Institute of Child Health & Human Development (1R21HD076092-01A1, Lobo PI), and the Delaware Economic Development Office (Grant #109).Some of the information in this article was presented at the IV Step Meeting in Columbus, Ohio, June 2016.The authors declare no conflict of interest.

Background and Purpose: 

The purpose of this article is to describe single-case studies and contrast them with case studies and randomized clinical trials. We highlight current research designs, analysis techniques, and quality appraisal tools relevant for single-case rehabilitation research.

Summary of Key Points: 

Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single-case studies involve repeated measures and manipulation of an independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, as well as external validity for generalizability of results, particularly when the study designs incorporate replication, randomization, and multiple participants. Single-case studies should not be confused with case studies/series (ie, case reports), which are reports of clinical management of a patient or a small series of patients.

Recommendations for Clinical Practice: 

When rigorously designed, single-case studies can be particularly useful experimental designs in a variety of situations, such as when research resources are limited, studied conditions have low incidences, or when examining effects of novel or expensive interventions. Readers will be directed to examples from the published literature in which these techniques have been discussed, evaluated for quality, and implemented.

INTRODUCTION

In this special interest article we present current tools and techniques relevant for single-case rehabilitation research. Single-case (SC) studies have been identified by a variety of names, including “n of 1 studies” and “single-subject” studies. The term “single-case study” is preferred over the previously mentioned terms because previous terms suggest these studies include only 1 participant. In fact, as discussed later, for purposes of replication and improved generalizability, the strongest SC studies commonly include more than 1 participant.

A SC study should not be confused with a “case study/series” (also called “case report”). In a typical case study/series, a single patient or small series of patients is involved, but there is not a purposeful manipulation of an independent variable, nor are there necessarily repeated measures. Most case studies/series are reported in a narrative way, whereas results of SC studies are presented numerically or graphically. 1 , 2 This article defines SC studies, contrasts them with randomized clinical trials, discusses how they can be used to scientifically test hypotheses, and highlights current research designs, analysis techniques, and quality appraisal tools that may be useful for rehabilitation researchers.

In SC studies, measurements of outcome (dependent variables) are recorded repeatedly for individual participants across time and varying levels of an intervention (independent variables). 1–5 These varying levels of intervention are referred to as “phases,” with 1 phase serving as a baseline or comparison, so each participant serves as his/her own control. 2 In contrast to case studies and case series in which participants are observed across time without experimental manipulation of the independent variable, SC studies employ systematic manipulation of the independent variable to allow for hypothesis testing. 1 , 6 As a result, SC studies allow for rigorous experimental evaluation of intervention effects and provide a strong basis for establishing causal inferences. Advances in design and analysis techniques for SC studies observed in recent decades have made SC studies increasingly popular in educational and psychological research. Yet, the authors believe SC studies have been undervalued in rehabilitation research, where randomized clinical trials (RCTs) are typically recommended as the optimal research design to answer questions related to interventions. 7 In reality, there are advantages and disadvantages to both SC studies and RCTs that should be carefully considered to select the best design to answer individual research questions. Although there are a variety of other research designs that could be utilized in rehabilitation research, only SC studies and RCTs are discussed here because SC studies are the focus of this article and RCTs are the most highly recommended design for intervention studies. 7

When designed and conducted properly, RCTs offer strong evidence that changes in outcomes may be related to provision of an intervention. However, RCTs require monetary, time, and personnel resources that many researchers, especially those in clinical settings, may not have available. 8 RCTs also require access to large numbers of consenting participants who meet strict inclusion and exclusion criteria that can limit variability of the sample and generalizability of results. 9 The requirement for large participant numbers may make RCTs difficult to perform in many settings, such as rural and suburban settings, and for many populations, such as those with diagnoses marked by lower prevalence. 8 To rely exclusively on RCTs has the potential to result in bodies of research that are skewed to address the needs of some individuals whereas neglecting the needs of others. RCTs aim to include a large number of participants and to use random group assignment to create study groups that are similar to one another in terms of all potential confounding variables, but it is challenging to identify all confounding variables. Finally, the results of RCTs are typically presented in terms of group means and standard deviations that may not represent true performance of any one participant. 10 This can present as a challenge for clinicians aiming to translate and implement these group findings at the level of the individual.

SC studies can provide a scientifically rigorous alternative to RCTs for experimentally determining the effectiveness of interventions. 1 , 2 SC studies can assess a variety of research questions, settings, cases, independent variables, and outcomes. 11 There are many benefits to SC studies that make them appealing for intervention research. SC studies may require fewer resources than RCTs and can be performed in settings and with populations that do not allow for large numbers of participants. 1 , 2 In SC studies, each participant serves as his/her own comparison, thus controlling for many confounding variables that can impact outcome in rehabilitation research, such as gender, age, socioeconomic level, cognition, home environment, and concurrent interventions. 2 , 11 Results can be analyzed and presented to determine whether interventions resulted in changes at the level of the individual, the level at which rehabilitation professionals intervene. 2 , 12 When properly designed and executed, SC studies can demonstrate strong internal validity to determine the likelihood of a causal relationship between the intervention and outcomes and external validity to generalize the findings to broader settings and populations. 2 , 12 , 13

SINGLE-CASE RESEARCH DESIGNS FOR INTERVENTION RESEARCH

There are a variety of SC designs that can be used to study the effectiveness of interventions. Here we discuss (1) AB designs, (2) reversal designs, (3) multiple baseline designs, and (4) alternating treatment designs, as well as ways replication and randomization techniques can be used to improve internal validity of all of these designs. 1–3 , 12–14

The simplest of these designs is the AB design 15 ( Figure 1 ). This design involves repeated measurement of outcome variables throughout a baseline control/comparison phase (A) and then throughout an intervention phase (B). When possible, it is recommended that a stable level and/or rate of change in performance be observed within the baseline phase before transitioning into the intervention phase. 2 As with all SC designs, it is also recommended that there be a minimum of 5 data points in each phase. 1 , 2 There is no randomization or replication of the baseline or intervention phases in the basic AB design. 2 Therefore, AB designs have problems with internal validity and generalizability of results. 12 They are weak in establishing causality because changes in outcome variables could be related to a variety of other factors, including maturation, experience, learning, and practice effects. 2 , 12 Sample data from a single-case AB study performed to assess the impact of Floor Play intervention on social interaction and communication skills for a child with autism 15 are shown in Figure 1 .

F1

If an intervention does not have carryover effects, it is recommended to use a reversal design . 2 For example, a reversal A 1 BA 2 design 16 ( Figure 2 ) includes alternation of the baseline and intervention phases, whereas a reversal A 1 B 1 A 2 B 2 design 17 ( Figure 3 ) consists of alternation of 2 baseline (A 1 , A 2 ) and 2 intervention (B 1 , B 2 ) phases. Incorporating at least 4 phases in the reversal design (ie, A 1 B 1 A 2 B 2 or A 1 B 1 A 2 B 2 A 3 B 3 ...) allows for a stronger determination of a causal relationship between the intervention and outcome variables because the relationship can be demonstrated across at least 3 different points in time–-change in outcome from A 1 to B 1 , from B 1 to A 2 , and from A 2 to B 2 . 18 Before using this design, however, researchers must determine that it is safe and ethical to withdraw the intervention, especially in cases where the intervention is effective and necessary. 12

F2

A recent study used an ABA reversal SC study to determine the effectiveness of core stability training in 8 participants with multiple sclerosis. 16 During the first 4 weekly data collections, the researchers ensured a stable baseline, which was followed by 8 weekly intervention data points, and concluded with 4 weekly withdrawal data points. Intervention significantly improved participants' walking and reaching performance ( Figure 2 ). 16 This A 1 BA 2 design could have been strengthened by the addition of a second intervention phase for replication (A 1 B 1 A 2 B 2 ). For instance, a single-case A 1 B 1 A 2 B 2 withdrawal design aimed to assess the efficacy of rehabilitation using visuo-spatio-motor cueing for 2 participants with severe unilateral neglect after a severe right hemisphere stroke. 17 Each phase included 8 data points. Statistically significant intervention-related improvement was observed, suggesting that visuo-spatio-motor cueing might be promising for treating individuals with very severe neglect ( Figure 3 ). 17

The reversal design can also incorporate a cross-over design where each participant experiences more than 1 type of intervention. For instance, a B 1 C 1 B 2 C 2 design could be used to study the effects of 2 different interventions (B and C) on outcome measures. Challenges with including more than 1 intervention involve potential carryover effects from earlier interventions and order effects that may impact the measured effectiveness of the interventions. 2 , 12 Including multiple participants and randomizing the order of intervention phase presentations are tools to help control for these types of effects. 19

When an intervention permanently changes an individual's ability, a return-to-baseline performance is not feasible and reversal designs are not appropriate. Multiple baseline designs ( MBDs ) are useful in these situations ( Figure 4 ). 20 Multiple baseline designs feature staggered introduction of the intervention across time: each participant is randomly assigned to 1 of at least 3 experimental conditions characterized by the length of the baseline phase. 21 These studies involve more than 1 participant, thus functioning as SC studies with replication across participants. Staggered introduction of the intervention allows for separation of intervention effects from those of maturation, experience, learning, and practice. For example, a multiple baseline SC study was used to investigate the effect of an antispasticity baclofen medication on stiffness in 5 adult males with spinal cord injury. 20 The subjects were randomly assigned to receive 5 to 9 baseline data points with a placebo treatment before the initiation of the intervention phase with the medication. Both participants and assessors were blind to the experimental condition. The results suggested that baclofen might not be a universal treatment choice for all individuals with spasticity resulting from a traumatic spinal cord injury ( Figure 4 ). 20

F4

The impact of 2 or more interventions can also be assessed via alternating treatment designs ( ATDs ). In ATDs, after establishing the baseline, the experimenter exposes subjects to different intervention conditions administered in close proximity for equal intervals ( Figure 5 ). 22 ATDs are prone to “carryover effects” when the effects of 1 intervention influence the observed outcomes of another intervention. 1 As a result, such designs introduce unique challenges when attempting to determine the effects of any 1 intervention and have been less commonly utilized in rehabilitation. An ATD was used to monitor disruptive behaviors in the school setting throughout a baseline followed by an alternating treatment phase with randomized presentation of a control condition or an exercise condition. 23 Results showed that 30 minutes of moderate to intense physical activity decreased behavioral disruptions through 90 minutes after the intervention. 23 An ATD was also used to compare the effects of commercially available and custom-made video prompts on the performance of multistep cooking tasks in 4 participants with autism. 22 Results showed that participants independently performed more steps with the custom-made video prompts ( Figure 5 ). 22

F5

Regardless of the SC study design, replication and randomization should be incorporated when possible to improve internal and external validity. 11 The reversal design is an example of replication across study phases. The minimum number of phase replications needed to meet quality standards is 3 (A 1 B 1 A 2 B 2 ), but having 4 or more replications is highly recommended (A 1 B 1 A 2 B 2 A 3 ...). 11 , 14 In cases when interventions aim to produce lasting changes in participants' abilities, replication of findings may be demonstrated by replicating intervention effects across multiple participants (as in multiple-participant AB designs), or across multiple settings, tasks, or service providers. When the results of an intervention are replicated across multiple reversals, participants, and/or contexts, there is an increased likelihood that a causal relationship exists between the intervention and the outcome. 2 , 12

Randomization should be incorporated in SC studies to improve internal validity and the ability to assess for causal relationships among interventions and outcomes. 11 In contrast to traditional group designs, SC studies often do not have multiple participants or units that can be randomly assigned to different intervention conditions. Instead, in randomized phase-order designs , the sequence of phases is randomized. Simple or block randomization is possible. For example, with simple randomization for an A 1 B 1 A 2 B 2 design, the A and B conditions are treated as separate units and are randomly assigned to be administered for each of the predefined data collection points. As a result, any combination of A-B sequences is possible without restrictions on the number of times each condition is administered or regard for repetitions of conditions (eg, A 1 B 1 B 2 A 2 B 3 B 4 B 5 A 3 B 6 A 4 A 5 A 6 ). With block randomization for an A 1 B 1 A 2 B 2 design, 2 conditions (eg, A and B) would be blocked into a single unit (AB or BA), randomization of which to different periods would ensure that each condition appears in the resulting sequence more than 2 times (eg, A 1 B 1 B 2 A 2 A 3 B 3 A 4 B 4 ). Note that AB and reversal designs require that the baseline (A) always precedes the first intervention (B), which should be accounted for in the randomization scheme. 2 , 11

In randomized phase start-point designs , the lengths of the A and B phases can be randomized. 2 , 11 , 24–26 For example, for an AB design, researchers could specify the number of time points at which outcome data will be collected (eg, 20), define the minimum number of data points desired in each phase (eg, 4 for A, 3 for B), and then randomize the initiation of the intervention so that it occurs anywhere between the remaining time points (points 5 and 17 in the current example). 27 , 28 For multiple baseline designs, a dual-randomization or “regulated randomization” procedure has been recommended. 29 If multiple baseline randomization depends solely on chance, it could be the case that all units are assigned to begin intervention at points not really separated in time. 30 Such randomly selected initiation of the intervention would result in the drastic reduction of the discriminant and internal validity of the study. 29 To eliminate this issue, investigators should first specify appropriate intervals between the start points for different units, then randomly select from those intervals, and finally randomly assign each unit to a start point. 29

SINGLE-CASE ANALYSIS TECHNIQUES FOR INTERVENTION RESEARCH

The What Works Clearinghouse (WWC) single-case design technical documentation provides an excellent overview of appropriate SC study analysis techniques to evaluate the effectiveness of intervention effects. 1 , 18 First, visual analyses are recommended to determine whether there is a functional relationship between the intervention and the outcome. Second, if evidence for a functional effect is present, the visual analysis is supplemented with quantitative analysis methods evaluating the magnitude of the intervention effect. Third, effect sizes are combined across cases to estimate overall average intervention effects, which contribute to evidence-based practice, theory, and future applications. 2 , 18

Visual Analysis

Traditionally, SC study data are presented graphically. When more than 1 participant engages in a study, a spaghetti plot showing all of their data in the same figure can be helpful for visualization. Visual analysis of graphed data has been the traditional method for evaluating treatment effects in SC research. 1 , 12 , 31 , 32 The visual analysis involves evaluating level, trend, and stability of the data within each phase (ie, within-phase data examination) followed by examination of the immediacy of effect, consistency of data patterns, and overlap of data between baseline and intervention phases (ie, between-phase comparisons). When the changes (and/or variability) in level are in the desired direction, are immediate, readily discernible, and maintained over time, it is concluded that the changes in behavior across phases result from the implemented treatment and are indicative of improvement. 33 Three demonstrations of an intervention effect are necessary for establishing a functional relationship. 1

Within-Phase Examination

Level, trend, and stability of the data within each phase are evaluated. Mean and/or median can be used to report the level, and trend can be evaluated by determining whether the data points are monotonically increasing or decreasing. Within-phase stability can be evaluated by calculating the percentage of data points within 15% of the phase median (or mean). The stability criterion is satisfied if about 85% (80%–90%) of the data in a phase fall within a 15% range of the median (or average) of all data points for that phase. 34

Between-Phase Examination

Immediacy of effect, consistency of data patterns, and overlap of data between baseline and intervention phases are evaluated next. For this, several nonoverlap indices have been proposed that all quantify the proportion of measurements in the intervention phase not overlapping with the baseline measurements. 35 Nonoverlap statistics are typically scaled as percent from 0 to 100, or as a proportion from 0 to 1. Here, we briefly discuss the nonoverlap of all pairs ( NAP ), 36 the extended celeration line ( ECL ), the improvement rate difference ( IRD ), 37 and the TauU , and the TauU-adjusted, TauU adj , 35 as these are the most recent and complete techniques. We also examine the percentage of nonoverlapping data ( PND ) 38 and the two standard deviations band method, as these are frequently used techniques. In addition, we include the percentage of nonoverlapping corrected data ( PNCD )–-an index applying to the PND after controlling for baseline trend. 39

Nonoverlap of All Pairs

single case study research

Extended Celeration Line

single case study research

As a consequence, this method depends on a straight line and makes an assumption of linearity in the baseline. 2 , 12

Improvement Rate Difference

This analysis is conceptualized as the difference in improvement rates (IR) between baseline ( IR B ) and intervention phases ( IR T ). 38 The IR for each phase is defined as the number of “improved data points” divided by the total data points in that phase. Improvement rate difference, commonly employed in medical group research under the name of “risk reduction” or “risk difference,” attempts to provide an intuitive interpretation for nonoverlap and to make use of an established, respected effect size, IR B − IR T , or the difference between 2 proportions. 37

TauU and TauU adj

single case study research

Online calculators might assist researchers in obtaining the TauU and TauU adjusted coefficients ( http://www.singlecaseresearch.org/calculators/tau-u ).

Percentage of Nonoverlapping Data

single case study research

Two Standard Deviation Band Method

When the stability criterion described earlier is met within phases, it is possible to apply the 2-standard deviation band method. 12 , 41 First, the mean of the data for a specific condition is calculated and represented with a solid line. In the next step, the standard deviation of the same data is computed, and 2 dashed lines are represented: one located 2 standard deviations above the mean and the other 2 standard deviations below. For normally distributed data, few points (<5%) are expected to be outside the 2-standard deviation bands if there is no change in the outcome score because of the intervention. However, this method is not considered a formal statistical procedure, as the data cannot typically be assumed to be normal, continuous, or independent. 41

Statistical Analysis

If the visual analysis indicates a functional relationship (ie, 3 demonstrations of the effectiveness of the intervention effect), it is recommended to proceed with the quantitative analyses, reflecting the magnitude of the intervention effect. First, effect sizes are calculated for each participant (individual-level analysis). Moreover, if the research interest lies in the generalizability of the effect size across participants, effect sizes can be combined across cases to achieve an overall average effect size estimate (across-case effect size).

Note that quantitative analysis methods are still being developed in the domain of SC research 1 and statistical challenges of producing an acceptable measure of treatment effect remain. 14 , 42 , 43 Therefore, the WWC standards strongly recommend conducting sensitivity analysis and reporting multiple effect size estimators. If consistency across different effect size estimators is identified, there is stronger evidence for the effectiveness of the treatment. 1 , 18

Individual-Level Effect Size Analysis

single case study research

Across-Case Effect Sizes

Two-level modeling to estimate the intervention effects across cases can be used to evaluate across-case effect sizes. 44 , 45 , 50 Multilevel modeling is recommended by the WWC standards because it takes the hierarchical nature of SC studies into account: measurements are nested within cases and cases, in turn, are nested within studies. By conducting a multilevel analysis, important research questions can be addressed (which cannot be answered by single-level analysis of SC study data), such as (1) What is the magnitude of the average treatment effect across cases? (2) What is the magnitude and direction of the case-specific intervention effect? (3) How much does the treatment effect vary within cases and across cases? (4) Does a case and/or study-level predictor influence the treatment's effect? The 2-level model has been validated in previous research using extensive simulation studies. 45 , 46 , 51 The 2-level model appears to have sufficient power (>0.80) to detect large treatment effects in at least 6 participants with 6 measurements. 21

Furthermore, to estimate the across-case effect sizes, the HPS (Hedges, Pustejovsky, and Shadish) , or single-case educational design ( SCEdD)-specific mean difference, index can be calculated. 52 This is a standardized mean difference index specifically designed for SCEdD data, with the aim of making it comparable to Cohen's d of group-comparison designs. The standard deviation takes into account both within-participant and between-participant variability, and is typically used to get an across-case estimator for a standardized change in level. The advantage of using the HPS across-case effect size estimator is that it is directly comparable with Cohen's d for group comparison research, thus enabling the use of Cohen's (1988) benchmarks. 53

Valuable recommendations on SC data analyses have recently been provided. 54 , 55 They suggest that a specific SC study data analytic technique can be chosen on the basis of (1) the study aims and the desired quantification (eg, overall quantification, between-phase quantifications, and randomization), (2) the data characteristics as assessed by visual inspection and the assumptions one is willing to make about the data, and (3) the knowledge and computational resources. 54 , 55 Table 1 lists recommended readings and some commonly used resources related to the design and analysis of single-case studies.

3rd ed. Needham Heights, MA: Allyn & Bacon; 2008.

New York, NY: Oxford University Press; 2010.

Hillsdale, NJ: Lawrence Erlbaum Associates; 1992.

Washington, DC: American Psychological Association; 2014.

Philadelphia, PA: F. A. Davis Company; 2015.

Reversal design . 2008;10(2):115-128.

. 2014;35:1963-1969.

. 2000;10(4):385-399.

Multiple baseline design . 1990;69(6):311-317.

. 2010;25(6):459-469.

Alternating treatment design . 2014;52(5):447-462.

. 2013;34(6):371-383.

Randomization . 2010;15(2):124-144.

Visual analysis . 2000;17(1):20-39.

. 2012;33(4):202-219.

Percentage of nonoverlapping data . 2010;4(4):619-625.

. 2010;47(8):842-858.

Nonoverlap of all pairs . 2009;40:357-367.

. 2012;21(3):203-216.

Improvement rate difference . 2016;121(3):169-193.

. 2016;86:104-113.

Tau-U/Piecewise regression . In press.

. 2017;38(2).

Hierarchical Linear Modeling . 2013;43(12):2943-2952.

. 2007;29(3):23-55.

QUALITY APPRAISAL TOOLS FOR SINGLE-CASE DESIGN RESEARCH

Quality appraisal tools are important to guide researchers in designing strong experiments and conducting high-quality systematic reviews of the literature. Unfortunately, quality assessment tools for SC studies are relatively novel, ratings across tools demonstrate variability, and there is currently no “gold standard” tool. 56 Table 2 lists important SC study quality appraisal criteria compiled from the most common scales; when planning studies or reviewing the literature, we recommend readers to consider these criteria. Table 3 lists some commonly used SC quality assessment and reporting tools and references to resources where the tools can be located.

Criteria Requirements
1. Design The design is appropriate for evaluating the intervention
2. Method details Participants' characteristics, selection method, and testing setting specifics are adequately detailed to allow future replication
3. Independent variable , , , The independent variable (ie, the intervention) is thoroughly described to allow replication; fidelity of the intervention is thoroughly documented; the independent variable is systematically manipulated under the control of the experimenter
4. Dependent variable , , Each dependent/outcome variable is quantifiable. Each outcome variable is measured systematically and repeatedly across time to ensure the acceptable 0.80-0.90 interassessor percent agreement (or ≥0.60 Cohen's kappa) on at least 20% of sessions
5. Internal validity , , The study includes at least 3 attempts to demonstrate an intervention effect at 3 different points in time or with 3 different phase replications. Design-specific recommendations: (1) for reversal designs, a study should have ≥4 phases with ≥5 points, (2) for alternating intervention designs, a study should have ≥5 points per condition with ≤2 points per phase, and (3) for multiple baseline designs, a study should have ≥6 phases with ≥5 points to meet the What Works Clearinghouse standards without reservations. Assessors are independent and blind to experimental conditions
6. External validity Experimental effects should be replicated across participants, settings, tasks, and/or service providers
7. Face validity , , The outcome measure should be clearly operationally defined, have a direct unambiguous interpretation, and measure a construct is designed to measure
8. Social validity , Both the outcome variable and the magnitude of change in outcome because of an intervention should be socially important; the intervention should be practical and cost effective
9. Sample attrition , The sample attrition should be low and unsystematic, because loss of data in SC designs due to overall or differential attrition can produce biased estimates of the intervention's effectiveness if that loss is systematically related to the experimental conditions
10. Randomization , If randomization is used, the experimenter should make sure that (1) equivalence is established at the baseline and (2) the group membership is determined through a random process
What Works Clearinghouse Standards (WWC) Kratochwill TR, Hitchcock J, Horner RH, et al. Institute of Education Sciences: What Works Clearinghouse: Procedures and standards handbook. . Published 2010. Accessed November 20, 2016.
Quality indicators from Horner et al Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single-subject research to identify evidence-based practice in special education. . 2005;71(2):165-179.
Evaluative method Reichow B, Volkmar F, Cicchetti D. Development of the evaluative method for evaluating and determining evidence-based practices in autism. . 2008;38(7):1311-1319.
Certainty framework Simeonsson R, Bailey D. Evaluating programme impact: levels of certainty. In: Mitchell D, Brown R, eds. London, England: Chapman & Hall; 1991:280-296.
Evidence in Augmentative and Alternative Communication Scales (EVIDAAC) Schlosser RW, Sigafoos J, Belfiore P. EVIDAAC comparative single-subject experimental design scale (CSSEDARS). . Published 2009. Accessed November 20, 2016.
Single-Case Experimental Design (SCED) Tate RL, McDonald S, Perdices M, Togher L, Schulz R, Savage S. Rating the methodological quality of single-subject designs and n-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. . 2008;18(4):385-401.
Logan et al scales Logan LR, Hickman RR, Harris SR, Heriza CB. Single-subject research design: Recommendations for levels of evidence and quality rating. . 2008;50:99-103.
Single-Case Reporting Guideline In BEhavioural Interventions (SCRIBE) Tate RL, Perdices M, Rosenkoetter U, et al. The Single-Case Reporting guideline In BEhavioural interventions (SCRIBE) 2016 statement. 2016;56:133-142.
Theory, examples, and tools related to multilevel data analysis Van den Noortgate W, Ferron J, Beretvas SN, Moeyaert M. Multilevel synthesis of single-case experimental data. Katholieke Universiteit Leuven web site. .
Tools for computing between-cases standardized mean difference ( -statistic) Pustejovsky JE. scdhlm: a web-based calculator for between-case standardized mean differences (Version 0.2) [Web application]. .
Tools for computing NAP, IRD, Tau, and other statistics Vannest KJ, Parker RI, Gonen O. Single case research: web based calculators for SCR analysis (Version 1.0) [Web-based application]. College Station, TX: Texas A&M University. Published 2011. Accessed November 20, 2016. .
Tools for obtaining graphical representations, means, trend lines, PND Wright J. Intervention central. Accessed November 20, 2016. .
Access to free Simulation Modeling Analysis (SMA) Software Borckardt JJ. SMA Simulation modeling analysis: time series analysis program for short time series data streams. Published 2006. .

When an established tool is required for systematic review, we recommend use of the WWC tool because it has well-defined criteria and is developed and supported by leading experts in the SC research field in association with the Institute of Education Sciences. 18 The WWC documentation provides clear standards and procedures to evaluate the quality of SC research; it assesses the internal validity of SC studies, classifying them as “meeting standards,” “meeting standards with reservations,” or “not meeting standards.” 1 , 18 Only studies classified in the first 2 categories are recommended for further visual analysis. Also, WWC evaluates the evidence of effect, classifying studies into “strong evidence of a causal relation,” “moderate evidence of a causal relation,” or “no evidence of a causal relation.” Effect size should only be calculated for studies providing strong or moderate evidence of a causal relation.

The Single-Case Reporting Guideline In BEhavioural Interventions (SCRIBE) 2016 is another useful SC research tool developed recently to improve the quality of single-case designs. 57 SCRIBE consists of a 26-item checklist that researchers need to address while reporting the results of SC studies. This practical checklist allows for critical evaluation of SC studies during study planning, manuscript preparation, and review.

Single-case studies can be designed and analyzed in a rigorous manner that allows researchers strength in assessing causal relationships among interventions and outcomes, and in generalizing their results. 2 , 12 These studies can be strengthened via incorporating replication of findings across multiple study phases, participants, settings, or contexts, and by using randomization of conditions or phase lengths. 11 There are a variety of tools that can allow researchers to objectively analyze findings from SC studies. 56 Although a variety of quality assessment tools exist for SC studies, they can be difficult to locate and utilize without experience, and different tools can provide variable results. The WWC quality assessment tool is recommended for those aiming to systematically review SC studies. 1 , 18

SC studies, like all types of study designs, have a variety of limitations. First, it can be challenging to collect at least 5 data points in a given study phase. This may be especially true when traveling for data collection is difficult for participants, or during the baseline phase when delaying intervention may not be safe or ethical. Power in SC studies is related to the number of data points gathered for each participant, so it is important to avoid having a limited number of data points. 12 , 58 Second, SC studies are not always designed in a rigorous manner and, thus, may have poor internal validity. This limitation can be overcome by addressing key characteristics that strengthen SC designs ( Table 2 ). 1 , 14 , 18 Third, SC studies may have poor generalizability. This limitation can be overcome by including a greater number of participants, or units. Fourth, SC studies may require consultation from expert methodologists and statisticians to ensure proper study design and data analysis, especially to manage issues like autocorrelation and variability of data. 2 Fifth, although it is recommended to achieve a stable level and rate of performance throughout the baseline, human performance is quite variable and can make this requirement challenging. Finally, the most important validity threat to SC studies is maturation. This challenge must be considered during the design process to strengthen SC studies. 1 , 2 , 12 , 58

SC studies can be particularly useful for rehabilitation research. They allow researchers to closely track and report change at the level of the individual. They may require fewer resources and, thus, can allow for high-quality experimental research, even in clinical settings. Furthermore, they provide a tool for assessing causal relationships in populations and settings where large numbers of participants are not accessible. For all of these reasons, SC studies can serve as an effective method for assessing the impact of interventions.

  • Cited Here |
  • Google Scholar

n-of-1 studies; quality assessment; research design; single-subject research

  • + Favorites
  • View in Gallery

Readers Of this Article Also Read

Natural walking intensity in persons with parkinson disease, walking endurance and oxygen uptake on-kinetics in individuals with parkinson..., descriptive statistics, an important first step, effects of 2 years of exercise on gait impairment in people with parkinson..., randomized control trial of effects of a 10-week inspiratory muscle training....

UCC University College Cork

Green campus.

  • Visited Pages
  • Current Students
  • Registration
  • Job Vacancies
  • Examinations
  • Programme and Course Descriptions
  • Undergraduate
  • Postgraduate
  • International Office
  • Adult Continuing Education
  • Online Courses
  • Continuing Professional Development
  • Micro-credentials
  • Scholarships and Prizes
  • Transition-In Programme
  • Graduate Attributes
  • Incoming First Year Students
  • Apply to UCC
  • Upcoming Events in UCC
  • Parents and Guardians Information
  • Apprenticeships
  • UCC Innovation
  • UCC Futures
  • UCC in the world university rankings
  • News and Views
  • Leadership and Strategy
  • Campus Life
  • World's First Green Campus
  • Cork City and Region
  • UCC Arboretum
  • Academic Schools and Departments
  • Support and Service Departments
  • Work with UCC Students
  • Recruit UCC Graduates
  • Executive Education
  • Centre for Continuing Professional Development
  • Research and Innovation
  • Entrepreneurship Resources
  • Meet People
  • Make an Impact
  • Discover our Alumni
  • Explore Benefits
  • Register for UCC Alumni Online
  • Make a Gift

You should be seeing some content in this space that is currently blocked due to cookie settings. Open the Cookie Management Tool to change your settings.

  • Student Led
  • Research Informed
  • UCC Academic Strategy
  • SDG Toolkit
  • Food Health and Wellbeing
  • Landscape, Heritage and Natural Resources
  • Recycling and Waste Management
  • UCC "Saver Saves" Scheme
  • Climate Action
  • Water Management
  • UCC Electrical & Gas Performance Academic Year 2022/23
  • Building Design at University College Cork
  • Sustainable Procurement
  • Commuting To Campus
  • Campaigns and Initiatives
  • Academic Business Travel
  • How We're Doing
  • Practice Focused
  • Submit A Case Study
  • What is LEAF?
  • Latest News - 2024
  • 2018 and Older
  • Office of Sustainability and Climate Action
  • Case Studies

BioGreen Cafe

Save to Favourites

On this page

single case study research

UCC BioGreen Cafe: Ireland's First Single Use Plastic Free Café

In September 2018, UCC Green Campus Committee, in collaboration with KSG (the university’s contracted food service provider) officially launched the Bio Green Café, Ireland’s first single-use plastic free café. It is a case study in how green procurement, stakeholder engagement and environmental awareness and education can contribute to bringing about a complete redesign in the provision of a service. Single-use plastic has been eliminated from all front- and back-of-house activities. The café was a major contributing factor to UCC being ranked number 1 University in the world for SDG 12: Sustainable Consumption and Production in the Times Higher Education Impact Ranking 2019.

  • 80,000 disposable items avoided annually
  • An Infinity Water System, which will enable glass bottle refills reducing the number of plastic bottles by 10,000 per year
  • Ireland’s first single-use plastic free café

Key Learnings

  • The success of the BioCafe has enormous replication potential. UCC Green Campus and KSG have hosted a number of visits to the café from other business and public sector organisations looking to replicate the approach taken here.
  • Becoming a BioCafe has seen business increase in the cafe. It has become a “destination” café within the campus, with staff and students specifically organising meetings there.

A graphic depicting SDGs 2, 6, 11, 12, 13

University College Cork, College Road, Cork T12 YN60,

United States

single case study research

Select a language

United states and the oecd.

The United States was one of the 20 founding member countries that signed the Convention of the OECD in 1960. Today it is one of 38 OECD Members.

Explore our data, policy advice and research to learn more.

Related publications

single case study research

Selected indicators for United States

Latest insights.

single case study research

Related events

single case study research

Explore all our content on United States

single case study research

IMAGES

  1. Mixed Methods Single Case Research: State of the Art and Future

    single case study research

  2. Single case study research

    single case study research

  3. Embedded single-case study design

    single case study research

  4. PPT

    single case study research

  5. Single Case Study Design

    single case study research

  6. (PDF) Single case studies as a prime example for exploratory research

    single case study research

COMMENTS

  1. Single-Case Design, Analysis, and Quality Assessment for Intervention Research

    The purpose of this article is to present current tools and techniques relevant for single-case rehabilitation research. Single-case (SC) studies have been identified by a variety of names, including "n of 1 studies" and "single-subject" studies. The term "single-case study" is preferred over the previously mentioned terms because ...

  2. Single Case Research Design

    Abstract. This chapter addresses the peculiarities, characteristics, and major fallacies of single case research designs. A single case study research design is a collective term for an in-depth analysis of a small non-random sample. The focus on this design is on in-depth.

  3. Single case studies are a powerful tool for developing ...

    The majority of methods in psychology rely on averaging group data to draw conclusions. In this Perspective, Nickels et al. argue that single case methodology is a valuable tool for developing and ...

  4. PDF Single Cases: The What, Why and How

    researchers are conducting single case research. To place these 38 studies into context, a search of single or comparative case studies (i.e. two cases), or articles emphasizing the use of single cases as a methodology, yielded 104 articles during this same time period. Further, several special issues have showcased single case studies ...

  5. Single Case Study

    The concepts of single-case or case studies are explained and linked to principles of psychotherapy. Three types of single-case studies—descriptive, exploratory, and explanatory—are distinguished. The historical development of the single-case study is presented reaching from the experimental single-case research at the beginning of the ...

  6. Single-case experimental designs: the importance of ...

    A systematic review of applied single-case research published between 2016 and 2018: study designs, randomization, data aspects, and data analysis. Behav. Res. 53 , 1371-1384 (2021).

  7. Single-Case Designs

    In designing single-case research studies, it is essential that researchers select the appropriate research design for the specific research question and dependent variable. The first two factors to consider when making that selection are whether (a) the research question involves demonstrating the effects of a single intervention or comparing ...

  8. Advancing the Application and Use of Single-Case Research Designs

    A special issue of Perspectives on Behavior Science focused on methodological advances needed for single-case research is a timely contribution to the field. There are growing efforts to both articulate professional standards for single-case methods (Kratochwill et al., 2010; Tate et al., 2016), and advance new procedures for analysis and interpretation of single-case studies (Manolov ...

  9. A Primer on Single-Case Research Designs: Contemporary Use and Analysis

    In the study, a single case research design (Gast et al., 2018; Ledford et al., 2019), was utilised. An examination of the relation between a researcher-manipulated independent variable (e.g., the ...

  10. Single-Case Intervention Research

    This book is a compendium of tools and information for researchers considering single-case design (SCD) research, a newly viable and often essential methodology in applied psychology, education, and related fields. ... Single-Case Designs and Large-N Studies: The Best of Both Worlds Susan M. Sheridan; Using Single-Case Research Designs in ...

  11. Single‐case experimental designs: Characteristics, changes, and

    Tactics of Scientific Research (Sidman, 1960) provides a visionary treatise on single-case designs, their scientific underpinnings, and their critical role in understanding behavior. Since the foundational base was provided, single-case designs have proliferated especially in areas of application where they have been used to evaluate interventions with an extraordinary range of clients ...

  12. Single-Case Designs

    Single-Case Evaluation Designs. Single-case evaluation methodology is a mainstay of ABA research (Kazdin, 2011) and the basis of many sports related studies ( Luiselli, 2011; Martin et al., 2004). The publications we reviewed in this chapter are testimony to the variety of single-case designs available to researchers.

  13. Single-Case Design, Analysis, and Quality Assessment for ...

    We highlight current research designs, analysis techniques, and quality appraisal tools relevant for single-case rehabilitation research. Summary of key points: Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single-case studies involve repeated measures and manipulation of an ...

  14. Single Case Designs in Psychology Practice

    The clinician in practice is apt to select the AB 1 B 2, changing criterion design. The single case approach provides a means of measuring the increased amount of an intervention. For example, in Kazdin, 2 increased expected levels of quiz performance are used across math objectives 1, 2, 3 and 4 as measured during daily school sessions. 3, 5.

  15. Advancing the Application and Use of Single-Case Research Designs

    The application of single-case research methods is entering a new phase of scientific relevance. Researchers in an increasing array of disciplines are finding single-case methods useful for the questions they are asking and the clinical needs in their fields (Kratochwill et al., 2010; Maggin et al., 2017; Maggin & Odom, 2014; Riley-Tillman, Burns, & Kilgus, 2020).

  16. Single case studies vs. multiple case studies: A comparative study

    3.1.1 Format of a case study. Except to identify the case and the specific type of a case study that shall be implemented, the researchers have to consider if it's wisely to make a single case study, or if it's better to do a multiple case study, for the understanding of the phenomenon.

  17. A systematic review of applied single-case research ...

    Single-case experimental designs (SCEDs) have become a popular research methodology in educational science, psychology, and beyond. The growing popularity has been accompanied by the development of specific guidelines for the conduct and analysis of SCEDs. In this paper, we examine recent practices in the conduct and analysis of SCEDs by systematically reviewing applied SCEDs published over a ...

  18. Single-case design standards: An update and proposed upgrades

    In this paper, we provide a critique focused on the What Works Clearinghouse (WWC) Standards for Single-Case Research Design (Standards 4.1).Specifically, we (a) recommend the use of visual-analysis to verify a single-case intervention study's design standards and to examine the study's operational issues, (b) identify limitations of the design-comparable effect-size measure and discuss ...

  19. Single-case research designs: Methods for clinical and applied settings

    Single-case research has played an important role in developing and evaluating interventions that are designed to alter a particular facet of human functioning. Now thoroughly updated in its second edition, acclaimed author Alan Kazdin's Single-Case Research Designs provides a notable contrast to the quantitative methodology approach that pervades the biological and social sciences.

  20. Single-Case Experimental Designs: A Systematic Review of Published

    The single-case experiment has a storied history in psychology dating back to the field's founders: Fechner (1889), Watson (1925), and Skinner (1938).It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see ...

  21. Introducing single case study research design: an overview

    An overview of case study research The use of case study design in research has a long and distinguished history in many disciplines ( 1 ). It is used in law, education, history, medicine, psychology and business ( 2 ). Yin ( 3 ) stated that case studies are essential for social science and that they are used extensively in practice-oriented ...

  22. Single Case Archive

    The Single Case Archive compiles clinical, systematic and experimental single case studies in the field of psychotherapy. Currently ISI published single case studies from all different psychotherapeutic orientations are being included in the database. These case studies were screened by an international group of researchers for basic ...

  23. Single-Case Design, Analysis, and Quality Assessment for Int ...

    In this special interest article we present current tools and techniques relevant for single-case rehabilitation research. Single-case (SC) studies have been identified by a variety of names, including "n of 1 studies" and "single-subject" studies. The term "single-case study" is preferred over the previously mentioned terms because ...

  24. BioGreen Cafe

    Learn, Study and Research in UCC, Ireland's first 5 star university. ... Ireland's first single-use plastic free café. It is a case study in how green procurement, stakeholder engagement and environmental awareness and education can contribute to bringing about a complete redesign in the provision of a service. Single-use plastic has been ...

  25. United States

    Policy recommendations and case studies. Featured publications. Environment at a Glance Indicators. Creative minds, creative schools. Society at a Glance 2024. OECD Social Indicators ... policy advice and research to learn more. Ambassador of the United States to the OECD . Related publications. See all publications. Report. OECD Economic ...

  26. Fault ride through enhancement of large‐scale solar plants using

    These case studies are explained in the following. 4.1 Case study 1: System with a single inverter. In this case, the under-study system is considered only one GCPVP connected to the grid. The system and simulation details are tabulated in Table 1 for this case.