Society Homepage About Public Health Policy Contact

Data-driven hypothesis generation in clinical research: what we learned from a human subject study, article sidebar.

hypothesis generation induction

Submit your own article

Register as an author to reserve your spot in the next issue of the Medical Research Archives.

Join the Society

The European Society of Medicine is more than a professional association. We are a community. Our members work in countries across the globe, yet are united by a common goal: to promote health and health equity, around the world.

Join Europe’s leading medical society and discover the many advantages of membership, including free article publication.

Main Article Content

Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study design, data collection, and result analysis. In this perspective article, the authors provide a literature review on the following topics first: scientific thinking, reasoning, medical reasoning, literature-based discovery, and a field study to explore scientific thinking and discovery. Over the years, scientific thinking has shown excellent progress in cognitive science and its applied areas: education, medicine, and biomedical research. However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting. The results indicate that a secondary data analytical tool, VIADS—a visual interactive analytic tool for filtering, summarizing, and visualizing large health data sets coded with hierarchical terminologies, can shorten the time participants need, on average, to generate a hypothesis and also requires fewer cognitive events to generate each hypothesis. As a counterpoint, this exploration also indicates that the quality ratings of the hypotheses thus generated carry significantly lower ratings for feasibility when applying VIADS. Despite its small scale, the study confirmed the feasibility of conducting a human participant study directly to explore the hypothesis generation process in clinical research. This study provides supporting evidence to conduct a larger-scale study with a specifically designed tool to facilitate the hypothesis-generation process among inexperienced clinical researchers. A larger study could provide generalizable evidence, which in turn can potentially improve clinical research productivity and overall clinical research enterprise.

Article Details

The  Medical Research Archives  grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the  Medical Research Archives .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 July 2024

Automating psychological hypothesis generation with AI: when large language models meet causal graph

  • Song Tong   ORCID: orcid.org/0000-0002-4183-8454 1 , 2 , 3 , 4   na1 ,
  • Kai Mao 5   na1 ,
  • Zhen Huang 2 ,
  • Yukun Zhao 2 &
  • Kaiping Peng 1 , 2 , 3 , 4  

Humanities and Social Sciences Communications volume  11 , Article number:  896 ( 2024 ) Cite this article

823 Accesses

4 Altmetric

Metrics details

  • Science, technology and society

Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on “well-being”, then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses ( t (59) = 3.34, p  = 0.007 and t (59) = 4.32, p  < 0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research.

Similar content being viewed by others

hypothesis generation induction

Augmenting interpretable models with large language models during training

hypothesis generation induction

ThoughtSource: A central hub for large language model reasoning data

hypothesis generation induction

Testing theory of mind in large language models and humans

Introduction.

In an age in which the confluence of artificial intelligence (AI) with various subjects profoundly shapes sectors ranging from academic research to commercial enterprises, dissecting the interplay of these disciplines becomes paramount (Williams et al., 2023 ). In particular, psychology, which serves as a nexus between the humanities and natural sciences, consistently endeavors to demystify the complex web of human behaviors and cognition (Hergenhahn and Henley, 2013 ). Its profound insights have significantly enriched academia, inspiring innovative applications in AI design. For example, AI models have been molded on hierarchical brain structures (Cichy et al., 2016 ) and human attention systems (Vaswani et al., 2017 ). Additionally, these AI models reciprocally offer a rejuvenated perspective, deepening our understanding from the foundational cognitive taxonomy to nuanced esthetic perceptions (Battleday et al., 2020 ; Tong et al., 2021 ). Nevertheless, the multifaceted domain of psychology, particularly social psychology, has exhibited a measured evolution compared to its tech-centric counterparts. This can be attributed to its enduring reliance on conventional theory-driven methodologies (Henrich et al., 2010 ; Shah et al., 2015 ), a characteristic that stands in stark contrast to the burgeoning paradigms of AI and data-centric research (Bechmann and Bowker, 2019 ; Wang et al., 2023 ).

In the journey of psychological research, each exploration originates from a spark of innovative thought. These research trajectories may arise from established theoretical frameworks, daily event insights, anomalies within data, or intersections of interdisciplinary discoveries (Jaccard and Jacoby, 2019 ). Hypothesis generation is pivotal in psychology (Koehler, 1994 ; McGuire, 1973 ), as it facilitates the exploration of multifaceted influencers of human attitudes, actions, and beliefs. The HyGene model (Thomas et al., 2008 ) elucidated the intricacies of hypothesis generation, encompassing the constraints of working memory and the interplay between ambient and semantic memories. Recently, causal graphs have provided psychology with a systematic framework that enables researchers to construct and simulate intricate systems for a holistic view of “bio-psycho-social” interactions (Borsboom et al., 2021 ; Crielaard et al., 2022 ). Yet, the labor-intensive nature of the methodology poses challenges, which requires multidisciplinary expertise in algorithmic development, exacerbating the complexities (Crielaard et al., 2022 ). Meanwhile, advancements in AI, exemplified by models such as the generative pretrained transformer (GPT), present new avenues for creativity and hypothesis generation (Wang et al., 2023 ).

Building on this, notably large language models (LLMs) such as GPT-3, GPT-4, and Claude-2, which demonstrate profound capabilities to comprehend and infer causality from natural language texts, a promising path has emerged to extract causal knowledge from vast textual data (Binz and Schulz, 2023 ; Gu et al., 2023 ). Exciting possibilities are seen in specific scenarios in which LLMs and causal graphs manifest complementary strengths (Pan et al., 2023 ). Their synergistic combination converges human analytical and systemic thinking, echoing the holistic versus analytic cognition delineated in social psychology (Nisbett et al., 2001 ). This amalgamation enables fine-grained semantic analysis and conceptual understanding via LLMs, while causal graphs offer a global perspective on causality, alleviating the interpretability challenges of AI (Pan et al., 2023 ). This integrated methodology efficiently counters the inherent limitations of working and semantic memories in hypothesis generation and, as previous academic endeavors indicate, has proven efficacious across disciplines. For example, a groundbreaking study in physics synthesized 750,000 physics publications, utilizing cutting-edge natural language processing to extract 6368 pivotal quantum physics concepts, culminating in a semantic network forecasting research trajectories (Krenn and Zeilinger, 2020 ). Additionally, by integrating knowledge-based causal graphs into the foundation of the LLM, the LLM’s capability for causative inference significantly improves (Kıcıman et al., 2023 ).

To this end, our study seeks to build a pioneering analytical framework, combining the semantic and conceptual extraction proficiency of LLMs with the systemic thinking of the causal graph, with the aim of crafting a comprehensive causal network of semantic concepts within psychology. We meticulously analyzed 43,312 psychological articles, devising an automated method to construct a causal graph, and systematically mining causative concepts and their interconnections. Specifically, the initial sifting and preparation of the data ensures a high-quality corpus, and is followed by employing advanced extraction techniques to identify standardized causal concepts. This results in a graph database that serves as a reservoir of causal knowledge. In conclusion, using node embedding and similarity-based link prediction, we unearthed potential causal relationships, and thus generated the corresponding hypotheses.

To gauge the pragmatic value of our network, we selected 130 hypotheses on “well-being” generated by our framework, comparing them with hypotheses crafted by novice experts (doctoral students in psychology) and the LLM models. The results are encouraging: Our algorithm matches the caliber of novice experts, outshining the hypotheses generated solely by the LLM models in novelty. Additionally, through deep semantic analysis, we demonstrated that our algorithm contains more profound conceptual incorporations and a broader semantic spectrum.

Our study advances the field of psychology in two significant ways. Firstly, it extracts invaluable causal knowledge from the literature and converts it to visual graphics. These aids can feed algorithms to help deduce more latent causal relations and guide models in generating a plethora of novel causal hypotheses. Secondly, our study furnishes novel tools and methodologies for causal analysis and scientific knowledge discovery, representing the seamless fusion of modern AI with traditional research methodologies. This integration serves as a bridge between conventional theory-driven methodologies in psychology and the emerging paradigms of data-centric research, thereby enriching our understanding of the factors influencing psychology, especially within the realm of social psychology.

Methodological framework for hypothesis generation

The proposed LLM-based causal graph (LLMCG) framework encompasses three steps: literature retrieval, causal pair extraction, and hypothesis generation, as illustrated in Fig. 1 . In the literature gathering phase, ~140k psychology-related articles were downloaded from public databases. In step two, GPT-4 were used to distil causal relationships from these articles, culminating in the creation of a causal relationship network based on 43,312 selected articles. In the third step, an in-depth examination of these data was executed, adopting link prediction algorithms to forecast the dynamics within the causal relationship network for searching the highly potential causality concept pairs.

figure 1

Note: LLM stands for large language model; LLMCG algorithm stands for LLM-based causal graph algorithm, which includes the processes of literature retrieval, causal pair extraction, and hypothesis generation.

Step 1: Literature retrieval

The primary data source for this study was a public repository of scientific articles, the PMC Open Access Subset. Our decision to utilize this repository was informed by several key attributes that it possesses. The PMC Open Access Subset boasts an expansive collection of over 2 million full-text XML science and medical articles, providing a substantial and diverse base from which to derive insights for our research. Furthermore, the open-access nature of the articles not only enhances the transparency and reproducibility of our methodology, but also ensures that the results and processes can be independently accessed and verified by other researchers. Notably, the content within this subset originates from recognized journals, all of which have undergone rigorous peer review, lending credence to the quality and reliability of the data we leveraged. Finally, an added advantage was the rich metadata accompanying each article. These metadata were instrumental in refining our article selection process, ensuring coherent thematic alignment with our research objectives in the domains of psychology.

To identify articles relevant to our study, we applied a series of filtering criteria. First, the presence of certain keywords within article titles or abstracts was mandatory. Some examples of these keywords include “psychol”, “clin psychol”, and “biol psychol”. Second, we exploited the metadata accompanying each article. The classification of articles based on these metadata ensured alignment with recognized thematic standards in the domains of psychology and neuroscience. Upon the application of these criteria, we managed to curate a subset of approximately 140K articles that most likely discuss causal concepts in both psychology and neuroscience.

Step 2: Causal pair extraction

The process of extracting causal knowledge from vast troves of scientific literature is intricate and multifaceted. Our methodology distils this complex process into four coherent steps, each serving a distinct purpose. (1) Article selection and cost analysis: Determines the feasibility of processing a specific volume of articles, ensuring optimal resource allocation. (2) Text extraction and analysis: Ensures the purity of the data that enter our causal extraction phase by filtering out nonrelevant content. (3) Causal knowledge extraction: Uses advanced language models to detect, classify, and standardize causal factors relationships present in texts. (4) Graph database storage: Facilitates structured storage, easy retrieval, and the possibility of advanced relational analyses for future research. This streamlined approach ensures accuracy, consistency, and scalability in our endeavor to understand the interplay of causal concepts in psychology and neuroscience.

Text extraction and cleaning

After a meticulous cost analysis detailed in Appendix A , our selection process identified 43,312 articles. This selection was strategically based on the criterion that the journal titles must incorporate the term “Psychol”, signifying their direct relevance to the field of psychology. The distributions of publication sources and years can be found in Table 1 . Extracting the full texts of the articles from their PDF sources was an essential initial step, and, for this purpose, the PyPDF2 Python library was used. This library allowed us to seamlessly extract and concatenate titles, abstracts, and main content from each PDF article. However, a challenge arose with the presence of extraneous sections such as references or tables, in the extracted texts. The implemented procedure, employing regular expressions in Python, was not only adept at identifying variations of the term “references” but also ascertained whether this section appeared as an isolated segment. This check was critical to ensure that the identified that the “references” section was indeed distinct, marking the start of a reference list without continuation into other text. Once identified as a standalone entity, the next step in the method was to efficiently remove the reference section and its subsequent content.

Causal knowledge extraction method

In our effort to extract causal knowledge, the choice of GPT-4 was not arbitrary. While several models were available for such tasks, GPT-4 emerged as a frontrunner due to its advanced capabilities (Wu et al., 2023 ), extensive training on diverse data, with its proven proficiency in understanding context, especially in complex scientific texts (Cheng et al., 2023 ; Sanderson, 2023 ). Other models were indeed considered; however, the capacity of GPT-4 to generate coherent, contextually relevant responses gave our project an edge in its specific requirements.

The extraction process commenced with the segmentation of the articles. Due to the token constraints inherent to GPT-4, it was imperative to break down the articles into manageable chunks, specifically those of 4000 tokens or fewer. This approach ensured a comprehensive interpretation of the content without omitting any potential causal relationships. The next phase was prompt engineering. To effectively guide the extraction capabilities of GPT-4, we crafted explicit prompts. A testament to this meticulous engineering is demonstrated in a directive in which we asked the model to elucidate causal pairs in a predetermined JSON format. For a clearer understanding, readers are referred to Table 2 , which elucidates the example prompt and the subsequent model response. After extraction, the outputs were not immediately cataloged. A filtering process was initiated to ascertain the standardization of the concept pairs. This process weeded out suboptimal outputs. Aiding in this quality control, GPT-4 played a pivotal role in the verification of causal pairs, determining their relevance, causality, and ensuring correct directionality. Finally, while extracting knowledge, we were aware of the constraints imposed by the GPT-4 API. There was a conscious effort to ensure that we operated within the bounds of 60 requests and 150k tokens per minute. This interplay of prompt engineering and stringent filtering was productive.

In addition, we conducted an exploratory study to assess GPT-4’s discernment between “causality” and “correlation” involved four graduate students (mean age 31 ± 10.23), each evaluating relationship pairs extracted from their familiar psychology articles. The experimental details and results can be found in Appendix A and Table A1. The results showed that out of 289 relationships identified by GPT-4, 87.54% were validated. Notably, when GPT-4 classified relationships as causal, only 13.02% (31/238) were recognized as non-relationship, while 65.55% (156/238) agreed upon as causality. This shows that GPT-4 can accurately extract relationships (causality or correlation) in psychological texts, underscoring the potential as a tool for the construction of causal graphs.

To enhance the robustness of the extracted causal relationships and minimize biases, we adopted a multifaceted approach. Recognizing the indispensable role of human judgment, we periodically subjected random samples of extracted causal relationships to the scrutiny of domain experts. Their valuable feedback was instrumental in the real-time fine-tuning the extraction process. Instead of heavily relying on referenced hypotheses, our focus was on extracting causal pairs, primarily from the findings mentioned in the main texts. This systematic methodology ultimately resulted in a refined text corpus distilled from 43,312 articles, which contained many conceptual insights and were primed for rigorous causal extraction.

Graph database storage

Our decision to employ Neo4j as the database system was strategic. Neo4j, as a graph database (Thomer and Wickett, 2020 ), is inherently designed to capture and represent complex relationships between data points, an attribute that is essential for understanding intricate causal relationships. Beyond its technical prowess, Neo4j provides advantages such as scalability, resilience, and efficient querying capabilities (Webber, 2012 ). It is particularly adept at traversing interconnected data points, making it an excellent fit for our causal relationship analysis. The mined causal knowledge finds its abode in the Neo4j graph database. Each pair of causal concepts is represented as a node, with its directionality and interpretations stored as attributes. Relationships provide related concepts together. Storing the knowledge graph in Neo4j allows for the execution of the graph algorithms to analyze concept interconnectivity and reveal potential relationships.

The graph database contains 197k concepts and 235k connections. Table 3 encapsulates the core concepts and provides a vivid snapshot of the most recurring themes; helping us to understand the central topics that dominate the current psychological discourse. A comprehensive examination of the core concepts extracted from 43,312 psychological papers, several distinct patterns and focal areas emerged. In particular, there is a clear balance between health and illness in psychological research. The prominence of terms such as “depression”, “anxiety”, and “symptoms of depression magnifies the commitment in the discipline to understanding and addressing mental illnesses. However, juxtaposed against these are positive terms such as “life satisfaction” and “sense of happiness”, suggesting that psychology not only fixates on challenges but also delves deeply into the nuances of positivity and well-being. Furthermore, the significance given to concepts such as “life satisfaction”, “sense of happiness”, and “job satisfaction” underscores an increasing recognition of emotional well-being and job satisfaction as integral to overall mental health. Intertwining the realms of psychology and neuroscience, terms such as “microglial cell activation”, “cognitive impairment”, and “neurodegenerative changes” signal a growing interest in understanding the neural underpinnings of cognitive and psychological phenomena. In addition, the emphasis on “self-efficacy”, “positive emotions”, and “self-esteem” reflect the profound interest in understanding how self-perception and emotions influence human behavior and well-being. Concepts such as “age”, “resilience”, and “creativity” further expand the canvas, showcasing the eclectic and comprehensive nature of inquiries in the field of psychology.

Overall, this analysis paints a vivid picture of modern psychological research, illuminating its multidimensional approach. It demonstrates a discipline that is deeply engaged with both the challenges and triumphs of human existence, offering holistic insight into the human mind and its myriad complexities.

Step 3: Hypothesis generation using link prediction

In the quest to uncover novel causal relationships beyond direct extraction from texts, the technique of link prediction emerges as a pivotal methodology. It hinges on the premise of proposing potential causal ties between concepts that our knowledge graph does not explicitly connect. The process intricately weaves together vector embedding, similarity analysis, and probability-based ranking. Initially, concepts are transposed into a vector space using node2vec, which is valued for its ability to capture topological nuances. Here, every pair of unconnected concepts is assigned a similarity score, and pairs that do not meet a set benchmark are quickly discarded. As we dive deeper into the higher echelons of these scored pairs, the likelihood of their linkage is assessed using the Jaccard similarity of their neighboring concepts. Subsequently, these potential causal relationships are organized in descending order of their derived probabilities, and the elite pairs are selected.

An illustration of this approach is provided in the case highlighted in Figure A1. For instance, the behavioral inhibition system (BIS) exhibits ties to both the behavioral activation system (BAS) and the subsequent behavioral response of the BAS when encountering reward stimuli, termed the BAS reward response. Simultaneously, another concept, interference, finds itself bound to both the BAS and the BAS Reward Response. This configuration hints at a plausible link between the BIS and interference. Such highly probable causal pairs are not mere intellectual curiosity. They act as springboards, catalyzing the genesis of new experimental designs or research hypotheses ripe for empirical probing. In essence, this capability equips researchers with a cutting-edge instrument, empowering them to navigate the unexplored waters of the psychological and neurological domains.

Using pairs of highly probable causal concepts, we pushed GPT-4 to conjure novel causal hypotheses that bridge concepts. To further elucidate the process of this method, Table 4 provides some examples of hypotheses generated from the process. Such hypotheses, as exemplified in the last row, underscore the potential and power of our method for generating innovative causal propositions.

Hypotheses evaluation and results

In this section, we present an analysis focusing on quality in terms of novelty and usefulness of the hypotheses generated. According to existing literature, these dimensions are instrumental in encapsulating the essence of inventive ideas (Boden, 2009 ; McCarthy et al., 2018 ; Miron-Spektor and Beenen, 2015 ). These parameters have not only been quintessential for gauging creative concepts, but they have also been adopted to evaluate the caliber of research hypotheses (Dowling and Lucey, 2023 ; Krenn and Zeilinger, 2020 ; Oleinik, 2019 ). Specifically, we evaluate the quality of the hypotheses generated by the proposed LLMCG algorithm in relation to those generated by PhD students from an elite university who represent human junior experts, the LLM model, which represents advanced AI systems, and the research ideas refined by psychological researchers which represents cooperation between AI and humans.

The evaluation comprises three main stages. In the first stage, the hypotheses are generated by all contributors, including steps taken to ensure fairness and relevance for comparative analysis. In the second stage, the hypotheses from the first stage are independently and blindly reviewed by experts who represent the human academic community. These experts are asked to provide hypothesis ratings using a specially designed questionnaire to ensure statistical validity. The third stage delves deeper by transforming each research idea into the semantic space of a bidirectional encoder representation from transformers (BERT) (Lee et al., 2023 ), allowing us to intricately analyze the intrinsic reasons behind the rating disparities among the groups. This semantic mapping not only pinpoints the nuanced differences, but also provides potential insights into the cognitive constructs of each hypothesis.

Evaluation procedure

Selection of the focus area for hypothesis generation.

Selecting an appropriate focus area for hypothesis generation is crucial to ensure a balanced and insightful comparison of the hypothesis generation capacities between various contributors. In this study, our goal is to gauge the quality of hypotheses derived from four distinct contributors, with measures in place to mitigate potential confounding variables that might skew the results among groups (Rubin, 2005 ). Our choice of domain is informed by two pivotal criteria: the intricacy and subtlety of the subject matter and familiarity with the domain. It is essential that our chosen domain boasts sufficient complexity to prompt meaningful hypothesis generation and offer a robust assessment of both AI and human contributors” depth of understanding and creativity. Furthermore, while human contributors should be well-acquainted with the domain, their expertise need not match the vast corpus knowledge of the AI.

In terms of overarching human pursuits such as the search for happiness, positive psychology distinguishes itself by avoiding narrowly defined, individual-centric challenges (Seligman and Csikszentmihalyi, 2000 ). This alignment with our selection criteria is epitomized by well-being, a salient concept within positive psychology, as shown in Table 3 . Well-being, with its multidimensional essence that encompass emotional, psychological, and social facets, and its central stature in both research and practical applications of positive psychology (Diener et al., 2010 ; Fredrickson, 2001 ; Seligman and Csikszentmihalyi, 2000 ), becomes the linchpin of our evaluation. The growing importance of well-being in the current global context offers myriad novel avenues for hypothesis generation and theoretical advancement (Forgeard et al., 2011 ; Madill et al., 2022 ; Otu et al., 2020 ). Adding to our rationale, the Positive Psychology Research Center at Tsinghua University is a globally renowned hub for cutting-edge research in this domain. Leveraging this stature, we secured participation from specialized Ph.D. students, reinforcing positive psychology as the most fitting domain for our inquiry.

Hypotheses comparison

In our study, the generated psychological hypotheses were categorized into four distinct groups, consisting of two experimental groups and two control groups. The experimental groups encapsulate hypotheses generated by our algorithm, either through random selection or handpicking by experts from a pool of generated hypotheses. On the other hand, control groups comprise research ideas that were meticulously crafted by doctoral students with substantial academic expertise in the domains and hypotheses generated by representative LLMs. In the following, we elucidate the methodology and underlying rationale for each group:

LLMCG algorithm output (Random-selected LLMCG)

Following the requirement of generating hypotheses centred on well-being, the LLMCG algorithm crafted 130 unique hypotheses. These hypotheses were derived by LLMCG’s evaluation of the most likely causal relationships related to well-being that had not been previously documented in research literature datasets. From this refined pool, 30 research ideas were chosen at random for this experimental group. These hypotheses represent the algorithm’s ability to identify causal relationships and formulate pertinent hypotheses.

LLMCG expert-vetted hypotheses (Expert-selected LLMCG)

For this group, two seasoned psychological researchers, one male aged 47 and one female aged 46, in-depth expertise in the realm of Positive Psychology, conscientiously handpicked 30 of the most promising hypotheses from the refined pool, excluding those from the Random-selected LLMCG category. The selection criteria centered on a holistic understanding of both the novelty and practical relevance of each hypothesis. With an illustrious postdoctoral journey and a robust portfolio of publications in positive psychology to their names, they rigorously sifted through the hypotheses, pinpointing those that showcased a perfect confluence of originality and actionable insight. These hypotheses were meticulously appraised for their relevance, structural coherence, and potential academic value, representing the nexus of machine intelligence and seasoned human discernment.

PhD students’ output (Control-Human)

We enlisted the expertise of 16 doctoral students from the Positive Psychology Research Center at Tsinghua University. Under the guidance of their supervisor, each student was provided with a questionnaire geared toward research on well-being. The participants were given a period of four working days to complete and return the questionnaire, which was distributed during vacation to ensure minimal external disruptions and commitments. The specific instructions provided in the questionnaire is detailed in Table B1 , and each participant was asked to complete 3–4 research hypotheses. By the stipulated deadline, we received responses from 13 doctoral students, with a mean age of 31.92 years (SD = 7.75 years), cumulatively presenting 41 hypotheses related to well-being. To maintain uniformity with the other groups, a random selection was made to shortlist 30 hypotheses for further analysis. These hypotheses reflect the integration of core theoretical concepts with the latest insights into the domain, presenting an academic interpretation rooted in their rigorous training and education. Including this group in our study not only provides a natural benchmark for human ingenuity and expertise but also underscores the invaluable contribution of human cognition in research ideation, serving as a pivotal contrast to AI-generated hypotheses. This juxtaposition illuminates the nuanced differences between human intellectual depth and AI’s analytical progress, enriching the comparative dimensions of our study.

Claude model output (Control-Claude)

This group exemplifies the pinnacle of current LLM technology in generating research hypotheses. Since LLMCG is a nascent technology, its assessment requires a comparative study with well-established counterparts, creating a key paradigm in comparative research. Currently, Claude-2 and GPT-4 represent the apex of AI technology. For example, Claude-2, with an accuracy rate of 54. 4% excels in reasoning and answering questions, substantially outperforming other models such as Falcon, Koala and Vicuna, which have accuracy rates of 17.1–25.5% (Wu et al., 2023 ). To facilitate a more comprehensive evaluation of the new model by researchers and to increase the diversity and breadth of comparison, we chose Claude-2 as the control model. Using the detailed instructions provided in Table B2, Claude-2 was iteratively prompted to generate research hypotheses, generating ten hypotheses per prompt, culminating in a total of 50 hypotheses. Although the sheer number and range of these hypotheses accentuate the capabilities of Claude-2, to ensure compatibility in terms of complexity and depth between all groups, a subsequent refinement was considered essential. With minimal human intervention, GPT-4 was used to evaluate these 50 hypotheses and select the top 30 that exhibited the most innovative, relevant, and academically valuable insights. This process ensured the infusion of both the LLM”s analytical prowess and a layer of qualitative rigor, thus giving rise to a set of hypotheses that not only align with the overarching theme of well-being but also resonate with current academic discourse.

Hypotheses assessment

The assessment of the hypotheses encompasses two key components: the evaluation conducted by eminent psychology professors emphasizing novelty and utility, and the deep semantic analysis involving BERT and t -distributed stochastic neighbor embedding ( t -SNE) visualization to discern semantic structures and disparities among hypotheses.

Human academic community

The review task was entrusted to three eminent psychology professors (all male, mean age = 42.33), who have a decade-long legacy in guiding doctoral and master”s students in positive psychology and editorial stints in renowned journals; their task was to conduct a meticulous evaluation of the 120 hypotheses. Importantly, to ensure unbiased evaluation, the hypotheses were presented to them in a completely randomized order in the questionnaire.

Our emphasis was undeniably anchored to two primary tenets: novelty and utility (Cohen, 2017 ; Shardlow et al., 2018 ; Thompson and Skau, 2023 ; Yu et al., 2016 ), as shown in Table B3 . Utility in hypothesis crafting demands that our propositions extend beyond mere factual accuracy; they must resonate deeply with academic investigations, ensuring substantial practical implications. Given the inherent challenges of research, marked by constraints in time, manpower, and funding, it is essential to design hypotheses that optimize the utilization of these resources. On the novelty front, we strive to introduce innovative perspectives that have the power to challenge and expand upon existing academic theories. This not only propels the discipline forward but also ensures that we do not inadvertently tread on ground already covered by our contemporaries.

Deep semantic analysis

While human evaluations provide invaluable insight into the novelty and utility of hypotheses, to objectively discern and visualize semantic structures and the disparities among them, we turn to the realm of deep learning. Specifically, we employ the power of BERT (Devlin et al., 2018 ). BERT, as highlighted by Lee et al. ( 2023 ), had a remarkable potential to assess the innovation of ideas. By translating each hypothesis into a high-dimensional vector in the BERT domain, we obtain the profound semantic core of each statement. However, such granularity in dimensions presents challenges when aiming for visualization.

To alleviate this and to intuitively understand the clustering and dispersion of these hypotheses in semantic space, we deploy the t -SNE ( t -distributed Stochastic Neighbor Embedding) technique (Van der Maaten and Hinton, 2008 ), which is adept at reducing the dimensionality of the data while preserving the relative pairwise distances between the items. Thus, when we map our BERT-encoded hypotheses onto a 2D t -SNE plane, an immediate visual grasp on how closely or distantly related our hypotheses are in terms of their semantic content. Our intent is twofold: to understand the semantic terrains carved out by the different groups and to infer the potential reasons for some of the hypotheses garnered heightened novelty or utility ratings from experts. The convergence of human evaluations and semantic layouts, as delineated by Algorithm 1 in Appendix B , reveal the interplay between human intuition and the inherent semantic structure of the hypotheses.

Qualitative analysis by topic analysis

To better understand the underlying thought processes and the topical emphasis of both PhD students and the LLMCG model, qualitative analyses were performed using visual tools such as word clouds and connection graphs, as detailed in Appendix B . The word cloud, as a graphical representation, effectively captures the frequency and importance of terms, providing direct visualization of the dominant themes. Connection graphs, on the other hand, elucidate the relationships and interplay between various themes and concepts. Using these visual tools, we aimed to achieve a more intuitive and clear representation of the data, allowing for easy comparison and interpretation.

Observations drawn from both the word clouds and the connection graphs in Figures B1 and B2 provide us with a rich tapestry of insights into the thought processes and priorities of Ph.D. students and the LLMCG model. For instance, the emphasis in the Control-Human word cloud on terms such as “robot” and “AI” indicates a strong interest among Ph.D. students in the nexus between technology and psychology. It is particularly fascinating to see a group of academically trained individuals focusing on the real world implications and intersections of their studies, as shown by their apparent draw toward trending topics. This not only underscores their adaptability but also emphasizes the importance of contextual relevance. Conversely, the LLMCG groups, particularly the Expert-selected LLMCG group, emphasize the community, collective experiences, and the nuances of social interconnectedness. This denotes a deep-rooted understanding and application of higher-order social psychological concepts, reflecting the model”s ability to dive deep into the intricate layers of human social behavior.

Furthermore, the connection graphs support these observations. The Control-Human graph, with its exploration of themes such as “Robot Companionship” and its relation to factors such as “heart rate variability (HRV)”, demonstrates a confluence of technology and human well-being. The other groups, especially the Random-selected LLMCG group, yield themes that are more societal and structural, hinting at broader determinants of individual well-being.

Analysis of human evaluations

To quantify the agreement among the raters, we employed Spearman correlation coefficients. The results, as shown in Table B5, reveal a spectrum of agreement levels between the reviewer pairs, showcasing the subjective dimension intrinsic to the evaluation of novelty and usefulness. In particular, the correlation between reviewer 1 and reviewer 2 in novelty (Spearman r  = 0.387, p  < 0.0001) and between reviewer 2 and reviewer 3 in usefulness (Spearman r  = 0.376, p  < 0.0001) suggests a meaningful level of consensus, particularly highlighting their capacity to identify valuable insights when evaluating hypotheses.

The variations in correlation values, such as between reviewer 2 and reviewer 3 ( r  = 0.069, p  = 0.453), can be attributed to the diverse research orientations and backgrounds of each reviewer. Reviewer 1 focuses on social ecology, reviewer 3 specializes in neuroscientific methodologies, and reviewer 2 integrates various views using technologies like virtual reality, and computational methods. In our evaluation, we present specific hypotheses cases to illustrate the differing perspectives between reviewers, as detailed in Table B4 and Figure B3. For example, C5 introduces the novel concept of “Virtual Resilience”. Reviewers 1 and 3 highlighted its originality and utility, while reviewer 2 rated it lower in both categories. Meanwhile, C6, which focuses on social neuroscience, resonated with reviewer 3, while reviewers 1 and 2 only partially affirmed it. These differences underscore the complexity of evaluating scientific contributions and highlight the importance of considering a range of expert opinions for a comprehensive evaluation.

This assessment is divided into two main sections: Novelty analysis and usefulness analysis.

Novelty analysis

In the dynamic realm of scientific research, measuring and analyzing novelty is gaining paramount importance (Shin et al., 2022 ). ANOVA was used to analyze the novelty scores represented in Fig. 2 a, and we identified a significant influence of the group factor on the mean novelty score between different reviewers. Initially, z-scores were calculated for each reviewer”s ratings to standardize the scoring scale, which were then averaged. The distinct differences between the groups, as visualized in the boxplots, are statistically underpinned by the results in Table 5 . The ANOVA results revealed a pronounced effect of the grouping factor ( F (3116) = 6.92, p  = 0.0002), with variance explained by the grouping factor (R-squared) of 15.19%.

figure 2

Box plots on the left ( a ) and ( b ) depict distributions of novelty and usefulness scores, respectively, while smoothed line plots on the right demonstrate the descending order of novelty and usefulness scores and subjected to a moving average with a window size of 2. * denotes p  < 0.05, ** denotes p  <0.01.

Further pairwise comparisons using the Bonferroni method, as delineated in Table 5 and visually corroborated by Fig. 2 a; significant disparities were discerned between Random-selected LLMCG and Control-Claude ( t (59) = 3.34, p  = 0.007) and between Control-Human and Control-Claude ( t (59) = 4.32, p  < 0.001). The Cohen’s d values of 0.8809 and 1.1192 respectively indicate that the novelty scores for the Random-selected LLMCG and Control-Human groups are significantly higher than those for the Control-Claude group. Additionally, when considering the cumulative distribution plots to the right of Fig. 2 a, we observe the distributional characteristics of the novel scores. For example, it can be observed that the Expert-selected LLMCG curve portrays a greater concentration in the middle score range when compared to the Control-Claude , curve but dominates in the high novelty scores (highlighted in dashed rectangle). Moreover, comparisons involving Control-Human with both Random-selected LLMCG and Expert-selected LLMCG did not manifest statistically significant variances, indicating aligned novelty perceptions among these groups. Finally, the comparisons between Expert-selected LLMCG and Control-Claude ( t (59) = 2.49, p  = 0.085) suggest a trend toward significance, with a Cohen’s d value of 0.6226 indicating generally higher novelty scores for Expert-selected LLMCG compared to Control-Claude .

To mitigate potential biases due to individual reviewer inclinations, we expanded our evaluation to include both median and maximum z-scores from the three reviewers for each hypothesis. These multifaceted analyses enhance the robustness of our results by minimizing the influence of extreme values and potential outliers. First, when analyzing the median novelty scores, the ANOVA test demonstrated a notable association with the grouping factor ( F (3,116) = 6.54, p  = 0.0004), which explained 14.41% of the variance. As illustrated in Table 5 , pairwise evaluations revealed significant disparities between Control-Human and Control-Claude ( t (59) = 4.01, p  = 0.001), with Control-Human performing significantly higher than Control-Claude (Cohen’s d  = 1.1031). Similarly, there were significant differences between Random-selected LLMCG and Control-Claude ( t (59) = 3.40, p  = 0.006), where Random-selected LLMCG also significantly outperformed Control-Claude (Cohen’s d  = 0.8875). Interestingly, the comparison of Expert-selected LLMCG with Control-Claude ( t (59) = 1.70, p  = 0.550) and other group pairings did not include statistically significant differences.

Subsequently, turning our attention to maximum novelty scores provided crucial insights, especially where outlier scores may carry significant weight. The influence of the grouping factor was evident ( F (3,116) = 7.20, p  = 0.0002), indicating an explained variance of 15.70%. In particular, clear differences emerged between Control-Human and Control-Claude ( t (59) = 4.36, p  < 0.001), and between Random-selected LLMCG and Control-Claude ( t (59) = 3.47, p  = 0.004). A particularly intriguing observation was the significant difference between Expert-selected LLMCG and Control-Claude ( t (59) = 3.12, p  = 0.014). The Cohen’s d values of 1.1637, 1.0457, and 0.6987 respectively indicate that the novelty scores for the Control-Human , Random-selected LLMCG , and Expert-selected LLMCG groups are significantly higher than those for the Control-Claude group. Together, these analyses offer a multifaceted perspective on novelty evaluations. Specifically, the results of the median analysis echo and support those of the mean, reinforcing the reliability of our assessments. The discerned significance between Control-Claude and Expert-selected LLMCG in the median data emphasizes the intricate differences, while also pointing to broader congruence in novelty perceptions.

Usefulness analysis

Evaluating the practical impact of hypotheses is crucial in scientific research assessments. In the mean useful spectrum, the grouping factor did not exert a significant influence ( F (3,116) = 5.25, p  = 0.553). Figure 2 b presents the utility score distributions between groups. The narrow interquartile range of Control-Human suggests a relatively consistent assessment among reviewers. On the other hand, the spread and outliers in the Control-Claude distribution hint at varied utility perceptions. Both LLMCG groups cover a broad score range, demonstrating a mixture of high and low utility scores, while the Expert-selected LLMCG gravitates more toward higher usefulness scores. The smoothed line plots accompanying Fig. 2 b further detail the score densities. For instance, Random-selected LLMCG boasts several high utility scores, counterbalanced by a smattering of low scores. Interestingly, the distributions for Control-Human and Expert-selected LLMCG appear to be closely aligned. While mean utility scores provide an overarching view, the nuances within the boxplots and smoothed plots offer deeper insights. This comprehensive understanding can guide future endeavors in content generation and evaluation, spotlighting key areas of focus and potential improvements.

Comparison between the LLMCG and GPT-4

To evaluate the impact of integrating a causal graph with GPT-4, we performed an ablation study comparing the hypotheses generated by GPT-4 alone and those of the proposed LLMCG framework. For this experiment, 60 hypotheses were created using GPT-4, following the detailed instructions in Table B2 . Furthermore, 60 hypotheses for the LLMCG group were randomly selected from the remaining pool of 70 hypotheses. Subsequently, both sets of hypotheses were assessed by three independent reviewers for novelty and usefulness, as previously described.

Table 6 shows a comparison between the GPT-4 and LLMCG groups, highlighting a significant difference in novelty scores (mean value: t (119) = 6.60, p  < 0.0001) but not in usefulness scores (mean value: t (119) = 1.31, p  = 0.1937). This indicates that the LLMCG framework significantly enhances hypothesis novelty (all Cohen’s d  > 1.1) without affecting usefulness compared to the GPT-4 group. Figure B6 visually contrasts these findings, underlining the causal graph’s unique role in fostering novel hypothesis generation when integrated with GPT-4.

The t -SNE visualizations (Fig. 3 ) illustrate the semantic relationships between different groups, capturing the patterns of novelty and usefulness. Notably, a distinct clustering among PhD students suggests shared academic influences, while the LLMCG groups display broader topic dispersion, hinting at a wider semantic understanding. The size of the bubbles reflects the novelty and usefulness scores, emphasizing the diverse perceptions of what is considered innovative versus beneficial. Additionally, the numbers near the yellow dots represent the participant IDs, which demonstrated that the semantics of the same participant, such as H05 or H06, are closely aligned. In Fig. B4 , a distinct clustering of examples is observed, particularly highlighting the close proximity of hypotheses C3, C4, and C8 within the semantic space. This observation is further elucidated in Appendix B , enhancing the comprehension of BERT’s semantic representation. Instead of solely depending on superficial textual descriptions, this analysis penetrates into the underlying understanding of concepts within the semantic space, a topic also explored in recent research (Johnson et al., 2023 ).

figure 3

Comparison of ( a ) novelty and ( b ) usefulness scores (bubble size scaled by 100) among the different groups.

In the distribution of semantic distances (Fig. 4 ), we observed that the Control-Human group exhibits a distinctively greater semantic distance in comparison to the other groups, emphasizing their unique semantic orientations. The statistical support for this observation is derived from the ANOVA results, with a significant F-statistic ( F (3,1652) = 84.1611, p  < 0.00001), underscoring the impact of the grouping factor. This factor explains a remarkable 86.96% of the variance, as indicated by the R -squared value. Multiple comparisons, as shown in Table 7 , further elucidate the subtleties of these group differences. Control-Human and Control-Claude exhibit a significant contrast in their semantic distances, as highlighted by the t value of 16.41 and the adjusted p value ( < 0.0001). This difference indicates distinct thought patterns or emphasis in the two groups. Notably, Control-Human demonstrates a greater semantic distance (Cohen’s d = 1.1630). Similarly, a comparison of the Control-Claude and LLMCG models reveals pronounced differences (Cohen’s d  > 0.9), more so with the Expert-selected LLMCG ( p  < 0.0001). A comparison of Control-Human with the LLMCG models shows divergent semantic orientations, with statistically significant larger distances than Random-selected LLMCG ( p  = 0.0036) and a trend toward difference with Expert-selected LLMCG ( p  = 0.0687). Intriguingly, the two LLMCG groups—Random-selected and Expert-selected—exhibit similar semantic distances, as evidenced by a nonsignificant p value of 0.4362. Furthermore, the significant distinctions we observed, particularly between the Control-Human and other groups, align with human evaluations of novelty. This coherence indicates that the BERT space representation coupled with statistical analyses could effectively mimic human judgment. Such results underscore the potential of this approach for automated hypothesis testing, paving the way for more efficient and streamlined semantic evaluations in the future.

figure 4

Note: ** denotes p  < 0.01, **** denotes p  < 0.0001.

In general, visual and statistical analyses reveal the nuanced semantic landscapes of each group. While the Ph.D. students’ shared background influences their clustering, the machine models exhibit a comprehensive grasp of topics, emphasizing the intricate interplay of individual experiences, academic influences, and algorithmic understanding in shaping semantic representations.

This investigation carried out a detailed evaluation of the various hypothesis contributors, blending both quantitative and qualitative analyses. In terms of topic analysis, distinct variations were observed between Control-Human and LLMCG, the latter presenting more expansive thematic coverage. For human evaluation, hypotheses from Ph.D. students paralleled the LLMCG in novelty, reinforcing AI’s growing competence in mirroring human innovative thinking. Furthermore, when juxtaposed with AI models such as Control-Claude , the LLMCG exhibited increased novelty. Deep semantic analysis via t -SNE and BERT representations allowed us to intuitively grasp semantic essence of hypotheses, signaling the possibility of future automated hypothesis assessments. Interestingly, LLMCG appeared to encompass broader complementary domains compared to human input. Taken together, these findings highlight the emerging role of AI in hypothesis generation and provide key insights into hypothesis evaluation across diverse origins.

General discussion

This research delves into the synergistic relationship between LLM and causal graphs in the hypothesis generation process. Our findings underscore the ability of LLM, when integrated with causal graph techniques, to produce meaningful hypotheses with increased efficiency and quality. By centering our investigation on “well-being” we emphasize its pivotal role in psychological studies and highlight the potential convergence of technology and society. A multifaceted assessment approach to evaluate quality by topic analysis, human evaluation and deep semantic analysis demonstrates that AI-augmented methods not only outshine LLM-only techniques in generating hypotheses with superior novelty and show quality on par with human expertise but also boast the capability for more profound conceptual incorporations and a broader semantic spectrum. Such a multifaceted lens of assessment introduces a novel perspective for the scholarly realm, equipping researchers with an enriched understanding and an innovative toolset for hypothesis generation. At its core, the melding of LLM and causal graphs signals a promising frontier, especially in regard to dissecting cornerstone psychological constructs such as “well-being”. This marriage of methodologies, enriched by the comprehensive assessment angle, deepens our comprehension of both the immediate and broader ramifications of our research endeavors.

The prominence of causal graphs in psychology is profound, they offer researchers a unified platform for synthesizing and hypothesizing diverse psychological realms (Borsboom et al., 2021 ; Uleman et al., 2021 ). Our study echoes this, producing groundbreaking hypotheses comparable in depth to early expert propositions. Deep semantic analysis bolstered these findings, emphasizing that our hypotheses have distinct cross-disciplinary merits, particularly when compared to those of individual doctoral scholars. However, the traditional use of causal graphs in psychology presents challenges due to its demanding nature, often requiring insights from multiple experts (Crielaard et al., 2022 ). Our research, however, harnesses LLM’s causal extraction, automating causal pair derivation and, in turn, minimizing the need for extensive expert input. The union of the causal graphs’ systematic approach with AI-driven creativity, as seen with LLMs, paves the way for the future of psychological inquiry. Thanks to advancements in AI, barriers once created by causal graphs’ intricate procedures are being dismantled. Furthermore, as the era of big data dawns, the integration of AI and causal graphs in psychology augments research capabilities, but also brings into focus the broader implications for society. This fusion provides a nuanced understanding of the intricate sociopsychological dynamics, emphasizing the importance of adapting research methodologies in tandem with technological progress.

In the realm of research, LLMs serve a unique purpose, often by acting as the foundation or baseline against which newer methods and approaches are assessed. The demonstrated productivity enhancements by generative AI tools, as evidenced by Noy and Zhang ( 2023 ), indicate the potential of such LLMs. In our investigation, we pit the hypotheses generated by such substantial models against our integrated LLMCG approach. Intriguingly, while these LLMs showcased admirable practicality in their hypotheses, they substantially lagged behind in terms of innovation when juxtaposed with the doctoral student and LLMCG group. This divergence in results can be attributed to the causal network curated from 43k research papers, funneling the vast knowledge reservoir of the LLM squarely into the realm of scientific psychology. The increased precision in hypothesis generation by these models fits well within the framework of generative networks. Tong et al. ( 2021 ) highlighted that, by integrating structured constraints, conventional neural networks can accurately generate semantically relevant content. One of the salient merits of the causal graph, in this context, is its ability to alleviate inherent ambiguity or interpretability challenges posed by LLMs. By providing a systematic and structured framework, the causal graph aids in unearthing the underlying logic and rationale of the outputs generated by LLMs. Notably, this finding echoes the perspective of Pan et al. ( 2023 ), where the integration of structured knowledge from knowledge graphs was shown to provide an invaluable layer of clarity and interpretability to LLMs, especially in complex reasoning tasks. Such structured approaches not only boost the confidence of researchers in the hypotheses derived but also augment the transparency and understandability of LLM outputs. In essence, leveraging causal graphs may very well herald a new era in model interpretability, serving as a conduit to unlock the black box that large models often represent in contemporary research.

In the ever-evolving tapestry of research, every advancement invariably comes with its unique set of constraints, and our study was no exception. On the technical front, a pivotal challenge stemmed from the opaque inner workings of the GPT. Determining the exact machinations within the GPT that lead to the formation of specific causal pairs remains elusive, thereby reintroducing the age-old issue of AI’s inherent lack of transparency (Buruk, 2023 ; Cao and Yousefzadeh, 2023 ). This opacity is magnified in our sparse causal graph, which, while expansive, is occasionally riddled with concepts that, while semantically distinct, converge in meaning. In tangible applications, a careful and meticulous algorithmic evaluation would be imperative to construct an accurate psychological conceptual landscape. Delving into psychology, which bridges humanities and natural sciences, it continuously aims to unravel human cognition and behavior (Hergenhahn and Henley, 2013 ). Despite the dominance of traditional methodologies (Henrich et al., 2010 ; Shah et al., 2015 ), the present data-centric era amplifies the synergy of technology and humanities, resonating with Hasok Chang’s vision of enriched science (Chang, 2007 ). This symbiosis is evident when assessing structural holes in social networks (Burt, 2004 ) and viewing novelty as a bridge across these divides (Foster et al., 2021 ). Such perspectives emphasize the importance of thorough algorithmic assessments, highlighting potential avenues in humanities research, especially when incorporating large language models for innovative hypothesis crafting and verification.

However, there are some limitations to this research. Firstly, we acknowledge that constructing causal relationship graphs has potential inaccuracies, with ~13% relationship pairs not aligning with human expert estimations. Enhancing the estimation of relationship extraction could be a pathway to improve the accuracy of the causal graph, potentially leading to more robust hypotheses. Secondly, our validation process was limited to 130 hypotheses, however, the vastness of our conceptual landscape suggests countless possibilities. As an exemplar, the twenty pivotal psychological concepts highlighted in Table 3 alone could spawn an extensive array of hypotheses. However, the validation of these surrounding hypotheses would unquestionably lead to a multitude of speculations. A striking observation during our validation was the inconsistency in the evaluations of the senior expert panels (as shown in Table B5 ). This shift underscores a pivotal insight: our integration of AI has transitioned the dependency on scarce expert resources from hypothesis generation to evaluation. In the future, rigorous evaluations ensuring both novelty and utility could become a focal point of exploration. The promising path forward necessitates a thoughtful integration of technological innovation and human expertise to fully realize the potential suggested by our study.

In conclusion, our research provides pioneering insight into the symbiotic fusion of LLMs, which are epitomized by GPT, and causal graphs from the realm of psychological hypothesis generation, especially emphasizing “well-being”. Importantly, as highlighted by (Cao and Yousefzadeh, 2023 ), ensuring a synergistic alignment between domain knowledge and AI extrapolation is crucial. This synergy serves as the foundation for maintaining AI models within their conceptual limits, thus bolstering the validity and reliability of the hypotheses generated. Our approach intricately interweaves the advanced capabilities of LLMs with the methodological prowess of causal graphs, thereby optimizing while also refining the depth and precision of hypothesis generation. The causal graph, of paramount importance in psychology due to its cross-disciplinary potential, often demands vast amounts of expert involvement. Our innovative approach addresses this by utilizing LLM’s exceptional causal extraction abilities, effectively facilitating the transition of intense expert engagement from hypothesis creation to evaluation. Therefore, our methodology combined LLM with causal graphs, propelling psychological research forward by improving hypothesis generation and offering tools to blend theoretical and data-centric approaches. This synergy particularly enriches our understanding of social psychology’s complex dynamics, such as happiness research, demonstrating the profound impact of integrating AI with traditional research frameworks.

Data availability

The data generated and analyzed in this study are partially available within the Supplementary materials . For additional data supporting the findings of this research, interested parties may contact the corresponding author, who will provide the information upon receiving a reasonable request.

Battleday RM, Peterson JC, Griffiths TL (2020) Capturing human categorization of natural images by combining deep networks and cognitive models. Nat Commun 11(1):5418

Article   ADS   PubMed   PubMed Central   Google Scholar  

Bechmann A, Bowker GC (2019) Unsupervised by any other name: hidden layers of knowledge production in artificial intelligence on social media. Big Data Soc 6(1):2053951718819569

Article   Google Scholar  

Binz M, Schulz E (2023) Using cognitive psychology to understand GPT-3. Proc Natl Acad Sci 120(6):e2218523120

Article   CAS   PubMed   PubMed Central   Google Scholar  

Boden MA (2009) Computer models of creativity. AI Mag 30(3):23–23

Google Scholar  

Borsboom D, Deserno MK, Rhemtulla M, Epskamp S, Fried EI, McNally RJ (2021) Network analysis of multivariate data in psychological science. Nat Rev Methods Prim 1(1):58

Article   CAS   Google Scholar  

Burt RS (2004) Structural holes and good ideas. Am J Sociol 110(2):349–399

Buruk O (2023) Academic writing with GPT-3.5: reflections on practices, efficacy and transparency. arXiv preprint arXiv:2304.11079

Cao X, Yousefzadeh R (2023) Extrapolation and AI transparency: why machine learning models should reveal when they make decisions beyond their training. Big Data Soc 10(1):20539517231169731

Chang H (2007) Scientific progress: beyond foundationalism and coherentism1. R Inst Philos Suppl 61:1–20

Cheng K, Guo Q, He Y, Lu Y, Gu S, Wu H (2023) Exploring the potential of GPT-4 in biomedical engineering: the dawn of a new era. Ann Biomed Eng 51:1645–1653

Article   ADS   PubMed   Google Scholar  

Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A (2016) Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep 6(1):27755

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Cohen BA (2017) How should novelty be valued in science? Elife 6:e28699

Article   PubMed   PubMed Central   Google Scholar  

Crielaard L, Uleman JF, Châtel BD, Epskamp S, Sloot P, Quax R (2022) Refining the causal loop diagram: a tutorial for maximizing the contribution of domain expertise in computational system dynamics modeling. Psychol Methods 29(1):169–201

Article   PubMed   Google Scholar  

Devlin J, Chang M W, Lee K & Toutanova (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186)

Diener E, Wirtz D, Tov W, Kim-Prieto C, Choi D-W, Oishi S, Biswas-Diener R (2010) New well-being measures: short scales to assess flourishing and positive and negative feelings. Soc Indic Res 97:143–156

Dowling M, Lucey B (2023) ChatGPT for (finance) research: the Bananarama conjecture. Financ Res Lett 53:103662

Forgeard MJ, Jayawickreme E, Kern ML, Seligman ME (2011) Doing the right thing: measuring wellbeing for public policy. Int J Wellbeing 1(1):79–106

Foster J G, Shi F & Evans J (2021) Surprise! Measuring novelty as expectation violation. SocArXiv

Fredrickson BL (2001) The role of positive emotions in positive psychology: The broaden-and-build theory of positive emotions. Am Psychol 56(3):218

Gu Q, Kuwajerwala A, Morin S, Jatavallabhula K M, Sen B, Agarwal, A et al. (2024) ConceptGraphs: open-vocabulary 3D scene graphs for perception and planning. In 2nd Workshop on Language and Robot Learning: Language as Grounding

Henrich J, Heine SJ, Norenzayan A (2010) Most people are not WEIRD. Nature 466(7302):29–29

Article   ADS   CAS   PubMed   Google Scholar  

Hergenhahn B R, Henley T (2013) An introduction to the history of psychology . Cengage Learning

Jaccard J, Jacoby J (2019) Theory construction and model-building skills: a practical guide for social scientists . Guilford publications

Johnson DR, Kaufman JC, Baker BS, Patterson JD, Barbot B, Green AE (2023) Divergent semantic integration (DSI): Extracting creativity from narratives with distributional semantic modeling. Behav Res Methods 55(7):3726–3759

Kıcıman E, Ness R, Sharma A & Tan C (2023) Causal reasoning and large language models: opening a new frontier for causality. arXiv preprint arXiv:2305.00050

Koehler DJ (1994) Hypothesis generation and confidence in judgment. J Exp Psychol Learn Mem Cogn 20(2):461–469

Krenn M, Zeilinger A (2020) Predicting research trends with semantic and neural networks with an application in quantum physics. Proc Natl Acad Sci 117(4):1910–1916

Lee H, Zhou W, Bai H, Meng W, Zeng T, Peng K & Kumada T (2023) Natural language processing algorithms for divergent thinking assessment. In: Proc IEEE 6th Eurasian Conference on Educational Innovation (ECEI) p 198–202

Madill A, Shloim N, Brown B, Hugh-Jones S, Plastow J, Setiyawati D (2022) Mainstreaming global mental health: Is there potential to embed psychosocial well-being impact in all global challenges research? Appl Psychol Health Well-Being 14(4):1291–1313

McCarthy M, Chen CC, McNamee RC (2018) Novelty and usefulness trade-off: cultural cognitive differences and creative idea evaluation. J Cross-Cult Psychol 49(2):171–198

McGuire WJ (1973) The yin and yang of progress in social psychology: seven koan. J Personal Soc Psychol 26(3):446–456

Miron-Spektor E, Beenen G (2015) Motivating creativity: The effects of sequential and simultaneous learning and performance achievement goals on product novelty and usefulness. Organ Behav Hum Decis Process 127:53–65

Nisbett RE, Peng K, Choi I, Norenzayan A (2001) Culture and systems of thought: holistic versus analytic cognition. Psychol Rev 108(2):291–310

Article   CAS   PubMed   Google Scholar  

Noy S, Zhang W (2023) Experimental evidence on the productivity effects of generative artificial intelligence. Science 381:187–192

Oleinik A (2019) What are neural networks not good at? On artificial creativity. Big Data Soc 6(1):2053951719839433

Otu A, Charles CH, Yaya S (2020) Mental health and psychosocial well-being during the COVID-19 pandemic: the invisible elephant in the room. Int J Ment Health Syst 14:1–5

Pan S, Luo L, Wang Y, Chen C, Wang J & Wu X (2024) Unifying large language models and knowledge graphs: a roadmap. IEEE Transactions on Knowledge and Data Engineering 36(7):3580–3599

Rubin DB (2005) Causal inference using potential outcomes: design, modeling, decisions. J Am Stat Assoc 100(469):322–331

Article   MathSciNet   CAS   Google Scholar  

Sanderson K (2023) GPT-4 is here: what scientists think. Nature 615(7954):773

Seligman ME, Csikszentmihalyi M (2000) Positive psychology: an introduction. Am Psychol 55(1):5–14

Shah DV, Cappella JN, Neuman WR (2015) Big data, digital media, and computational social science: possibilities and perils. Ann Am Acad Political Soc Sci 659(1):6–13

Shardlow M, Batista-Navarro R, Thompson P, Nawaz R, McNaught J, Ananiadou S (2018) Identification of research hypotheses and new knowledge from scientific literature. BMC Med Inform Decis Mak 18(1):1–13

Shin H, Kim K, Kogler DF (2022) Scientific collaboration, research funding, and novelty in scientific knowledge. PLoS ONE 17(7):e0271678

Thomas RP, Dougherty MR, Sprenger AM, Harbison J (2008) Diagnostic hypothesis generation and human judgment. Psychol Rev 115(1):155–185

Thomer AK, Wickett KM (2020) Relational data paradigms: what do we learn by taking the materiality of databases seriously? Big Data Soc 7(1):2053951720934838

Thompson WH, Skau S (2023) On the scope of scientific hypotheses. R Soc Open Sci 10(8):230607

Tong S, Liang X, Kumada T, Iwaki S (2021) Putative ratios of facial attractiveness in a deep neural network. Vis Res 178:86–99

Uleman JF, Melis RJ, Quax R, van der Zee EA, Thijssen D, Dresler M (2021) Mapping the multicausality of Alzheimer’s disease through group model building. GeroScience 43:829–843

Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N & Polosukhin I (2017) Attention is all you need. In Advances in Neural Information Processing Systems

Wang H, Fu T, Du Y, Gao W, Huang K, Liu Z (2023) Scientific discovery in the age of artificial intelligence. Nature 620(7972):47–60

Webber J (2012) A programmatic introduction to neo4j. In Proceedings of the 3rd annual conference on systems, programming, and applications: software for humanity p 217–218

Williams K, Berman G, Michalska S (2023) Investigating hybridity in artificial intelligence research. Big Data Soc 10(2):20539517231180577

Wu S, Koo M, Blum L, Black A, Kao L, Scalzo F & Kurtz I (2023) A comparative study of open-source large language models, GPT-4 and Claude 2: multiple-choice test taking in nephrology. arXiv preprint arXiv:2308.04709

Yu F, Peng T, Peng K, Zheng SX, Liu Z (2016) The Semantic Network Model of creativity: analysis of online social media data. Creat Res J 28(3):268–274

Download references

Acknowledgements

The authors thank Dr. Honghong Bai (Radboud University), Dr. ChienTe Wu (The University of Tokyo), Dr. Peng Cheng (Tsinghua University), and Yusong Guo (Tsinghua University) for their great comments on the earlier version of this manuscript. This research has been generously funded by personal contributions, with special acknowledgment to K. Mao. Additionally, he conceived and developed the causality graph and AI hypothesis generation technology presented in this paper from scratch, and generated all AI hypotheses and paid for its costs. The authors sincerely thank K. Mao for his support, which enabled this research. In addition, K. Peng and S. Tong were partly supported by the Tsinghua University lnitiative Scientific Research Program (No. 20213080008), Self-Funded Project of Institute for Global Industry, Tsinghua University (202-296-001), Shuimu Scholars program of Tsinghua University (No. 2021SM157), and the China Postdoctoral International Exchange Program (No. YJ20210266).

Author information

These authors contributed equally: Song Tong, Kai Mao.

Authors and Affiliations

Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing, China

Song Tong & Kaiping Peng

Positive Psychology Research Center, School of Social Sciences, Tsinghua University, Beijing, China

Song Tong, Zhen Huang, Yukun Zhao & Kaiping Peng

AI for Wellbeing Lab, Tsinghua University, Beijing, China

Institute for Global Industry, Tsinghua University, Beijing, China

Kindom KK, Tokyo, Japan

You can also search for this author in PubMed   Google Scholar

Contributions

Song Tong: Data analysis, Experiments, Writing—original draft & review. Kai Mao: Designed the causality graph methodology, Generated AI hypotheses, Developed hypothesis generation techniques, Writing—review & editing. Zhen Huang: Statistical Analysis, Experiments, Writing—review & editing. Yukun Zhao: Conceptualization, Project administration, Supervision, Writing—review & editing. Kaiping Peng: Conceptualization, Writing—review & editing.

Corresponding authors

Correspondence to Yukun Zhao or Kaiping Peng .

Ethics declarations

Competing interests.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

In this study, ethical approval was granted by the Institutional Review Board (IRB) of the Department of Psychology at Tsinghua University, China. The Research Ethics Committee documented this approval under the number IRB202306, following an extensive review that concluded on March 12, 2023. This approval indicates the research’s strict compliance with the IRB’s guidelines and regulations, ensuring ethical integrity and adherence throughout the study.

Informed consent

Before participating, all study participants gave their informed consent. They received comprehensive details about the study’s goals, methods, potential risks and benefits, confidentiality safeguards, and their rights as participants. This process guaranteed that participants were fully informed about the study’s nature and voluntarily agreed to participate, free from coercion or undue influence.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Tong, S., Mao, K., Huang, Z. et al. Automating psychological hypothesis generation with AI: when large language models meet causal graph. Humanit Soc Sci Commun 11 , 896 (2024). https://doi.org/10.1057/s41599-024-03407-5

Download citation

Received : 08 November 2023

Accepted : 25 June 2024

Published : 09 July 2024

DOI : https://doi.org/10.1057/s41599-024-03407-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

hypothesis generation induction

  • Search Menu

Sign in through your institution

  • Browse content in Arts and Humanities
  • Browse content in Art
  • History of Art
  • Theory of Art
  • Browse content in Classical Studies
  • Classical Literature
  • Browse content in History
  • Colonialism and Imperialism
  • Environmental History
  • Genocide and Ethnic Cleansing
  • History by Period
  • History of Education
  • History of Gender and Sexuality
  • Intellectual History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • Political History
  • Regional and National History
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • Linguistics
  • Browse content in Literature
  • Literary Studies (European)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Music Theory and Analysis
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cultural Studies
  • Browse content in Law
  • Browse content in International Law
  • Public International Law
  • Browse content in Science and Mathematics
  • Browse content in Neuroscience
  • History of Neuroscience
  • Browse content in Psychology
  • Music Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Browse content in Economics
  • Economic History
  • Browse content in Education
  • Higher and Further Education
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • Comparative Politics
  • Political Economy
  • Political Theory
  • US Politics
  • Browse content in Regional and Area Studies
  • Asian Studies
  • Latin American Studies
  • Native American Studies
  • Browse content in Sociology
  • Gender and Sexuality
  • Health, Illness, and Medicine
  • Race and Ethnicity
  • Sociology of Religion
  • Sociology of Education
  • Urban and Rural Studies
  • Reviews and Awards
  • Journals on Oxford Academic
  • Books on Oxford Academic

Charles Peirce's Theory of Scientific Method

  • < Previous chapter
  • Next chapter >

IV The Stages of the Method ( ii ): Deduction and Induction

  • Published: September 1970
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter discusses Peirce's process of framing and testing conjectured explanations of phenomena. Here, science progresses by means of the brilliant imaginative leaps of abduction coupled with carefully controlled evaluation in the verification phase. Systematized items of knowledge may become subject to scientific inquiry only when they are brought down from the shelves to be purified or transformed. It is then that they enter that dynamic process again. There are two main steps in the process of scientific verification of hypotheses: deduction and induction. In order to get a clearer understanding of the whole movement of verification, this chapter draws a general picture of this process before discussing each step separately. In addition, the ability of the inductive phase to converge on the truth is considered, since this is the guarantee of the whole process of inquiry. Hence, the chapter treats the following: a general picture of the verification process; the deductive phase; the inductive phase; two requirements for scientific induction; the parts of induction; the convergence on truth.

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Institutional access

Sign in with a library card.

  • Sign in with username/password
  • Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

Month: Total Views:
August 2023 1
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Logo Factory for Innovative Policy Solutions

From First Principles to Theories: Revisiting the Scientific Method Through Abductive, Deductive, and Inductive Reasoning

Factory for Innovative Policy Solutions Kacper Grass From First Principles to Theories Revisiting the Scientific Method Through Abductive, Deductive, and Inductive Reasoning

Photo by www.esri.com

The aim of this article is to examine how first principles are developed into general theories by reviewing the roles that abduction, deduction, and induction play in the three primary steps of the scientific method: hypothesis generation, hypothesis testing, and theory generation. Kant’s democratic peace theory is first used to illustrate this process, and the example is subsequently extended to show the secondary level of scrutiny that theories must undergo before they can be applied to the empirical world. The article concludes by considering the strengths and weaknesses of scientific theories, particularly in the field of social sciences.

Keywords: first principles, theories, abduction, deduction, induction

Introduction

While the modern scientific method is often attributed to the revolution in human reason and rationality that swept across Europe in the age of Enlightenment, its individual steps had actually undergone one thousand years of development and refinement before they were finally assembled into a systematic process of inquiry by scholars between the 17th and 19th centuries. It was Aristotle who first distinguished between deductive reasoning, a top-down form of logic whereby conclusions are inferred from empirical observations and fundamental rules, and inductive reasoning, a bottom-up form of logic by which fundamental rules are extrapolated from conclusions based on empirical observations. 

In the pursuit of solutions to real-world problems, both methods have been complementary to one another in that deduction allows the researcher to use fundamental rules, or first principles (as Aristotle refers to them), to reach specific conclusions that can later be used to produce generally applicable theories through induction. By repeating this endless process of rigorous scrutiny with the observable yet unexplained phenomena of the natural world, philosophers have been able to create an epistemological framework for human understanding that encompasses everything from Darwin’s theory of evolution in the natural sciences to Marx’s theory of revolution in the social sciences.

Deduction, induction, and first principles

Rene Descartes, a renowned advocate of deductive reasoning, explored the relationship between deduction and first principles. In his Principles of Philosophy , Descartes explains that first principles must possess two conditions. First, “they must be so clear and evident that the human mind […] cannot doubt of their truth” and second, “the knowledge of other things must be so dependent on them [that although] the principles themselves may indeed be known apart from what depends on them, the latter cannot […] be known apart from the former” (Lancaster University, 2003). Accordingly, it is necessary to deduce from those first principles the knowledge of that which depends on them, as “there may be nothing in the whole series of deductions which is not perfectly manifest” (Lancaster University, 2003). 

Francis Bacon, a contemporary of Descartes, makes a similar observation. Referring to first principles as axioms, he notes that if a general axiom proves false, then all intermediate axioms deduced from it may be false as well. For this reason, in his Novum Organum , Bacon advocates proceeding “regularly and gradually from one axiom to another, so that the most general are not reached till the last” (Simpson, n.d.). Therefore, through induction, “each step up the ladder of intellect is thoroughly tested by observation and experimentation before the next step is taken” and “each confirmed axiom becomes a foothold to a higher truth, with the most general axioms representing the last stage of the process” (Simpson, n.d.). 

Abduction and the scientific method

It was not until Charles Sanders Peirce published Deduction, Induction, and Hypothesis that the two ancient methods of reasoning were complemented by a more modern counterpart: abductive reasoning. Peirce’s approach was based on producing what he called a “case” (hypothesis) from a “result” (conclusion) and a “rule” (first principle). Figure 1 offers a comparative view of the three methods of reasoning as outlined in Peirce’s example of the bag of beans.

hypothesis generation induction

While abductive reasoning is the least logically secure of the three methods, it nevertheless facilitates making an inference about the best possible hypotheses to a research question given the limited information available to the researcher. The propositions produced through abduction can be tested subsequently through deduction, by which a valid “result” (conclusion) is inferred from the “case” (hypothesis) and the initial “rule” (first principle). Finally, induction can be used to generalize the products of the previous steps by extrapolating a universal “rule” (generalized first principle or theory) from a specific “result” (conclusion) and a “case” (hypothesis). In this way, the process of abduction-deduction-induction outlines the three basic steps of the scientific method: generating a hypothesis, testing the hypothesis, and generalizing the results or conclusions of the research to generate a theory.

Kant’s democratic peace theory

Between the time Descartes and Bacon investigated deductive and inductive reasoning and the time Peirce introduced his method of deduction, Immanuel Kant laid the groundwork for what would come to be known as the democratic peace theory. In Perpetual Peace , Kant argues that it is reasonable for people to say that “there ought to be no war among us, for we want to make ourselves into a state; that is, we want to establish a supreme […] power which will reconcile our differences peacefully” (Ferraro, n.d.). In turn, it is also reasonable for the resulting state to say that “there ought to be no war between myself and other states” (Ferraro, n.d.). Therefore, if a group of people “can make itself a republic, which by its nature must be inclined to perpetual peace, this gives a fulcrum to the federation with other states so that they may adhere to it and thus secure freedom under the idea of the law of nations” (Ferraro, n.d.). As this federation of free republics grows, it will eventually expand to immerse the whole world in a league of democratic states that adheres to the universal law of nations, a perpetual peace. 

Though Kant preceded Peirce and did not live to learn about abduction, the method can nevertheless be applied to the formation of Kant’s democratic peace theory. What follows is a step-by-step application of abductive, deductive, and inductive reasoning to the three steps of the scientific method—hypothesis generation, hypothesis testing, and theory generation—in the development of Kant’s democratic peace theory.

hypothesis generation induction

Based on the empirical observation that, even when left to their own devices, the majority of people do not behave aggressively with one another and tend to avoid violent conflict, Kant formulated the first principle that the majority of people would not vote for an aggressive war should they be given the choice. Furthermore, if the majority of people in a state would not vote for an aggressive war, then it can be concluded that two states governed by majority rule would avoid war with one another. Therefore, the first step of abductive reasoning leads to the hypothesis that there can be no war in a world composed entirely of democratic states.

hypothesis generation induction

In order to test the validity of the hypothesis generated through abduction, it is necessary to see if a logical conclusion can be inferred from its relationship to the original first principle. Indeed, if the majority of people would not vote for an aggressive war, then there can be no war in a democratic world. Thus, the second step of deduction produces a conclusion that is not only logically valid with respect to the hypothesis and first principle but is also empirically observable. Though by contemporary standards there may not have been any truly democratic states in Kant’s time, the advent of universal suffrage in the 20th century has produced a considerable community of liberal democracies that do indeed maintain peaceful relations with one another. 

hypothesis generation induction

Finally, based on the conclusion deduced from the initial hypothesis, the third and final step of the scientific method aims to generalize the fundamental first principle in order to generate a universally applicable theory. Therefore, if it is already accepted that there can be no war in a democratic world because democracies avoid war with one another, then it can be induced that this is true because the majority of people would not vote for an aggressive war. At this point, the three-step process returns to the first principle with which it began. However, while the first principle was initially nothing more than an axiom based on the empirical observation of individual human behavior, through the scientific method it was developed into a theory that presents a framework for understanding international relations.

From generation to application

Following the first round of the scientific method, what started as a mere hypothesis about human nature is now proposed as a universal theory of global politics. However, before this novel theory can be applied to the empirical world, it should first undergo another round of scientific scrutiny to ensure that the underlying logic that supports it is indeed sound. This time, however, it is not necessary to begin with abductive reasoning as there is no need to generate new hypotheses. Instead, following the initial round of abduction-deduction-induction that led to the theory being generated, the researcher can return to deductive reasoning in order to test its validity on a macro level. Figure 2 outlines this second round of examination.

hypothesis generation induction

At this point, it is important to begin by reviewing all the initial facts, assumptions, and ideas that form the basis of the theory. In the context of Kant’s democratic peace theory, the researcher must ask fundamental questions that might undermine the generalization of the first principle: “Are all societies equally peaceful?” or “Do sociological variables like wealth and culture affect individuals’ likelihood of aggression?”. Moreover, it is crucial to have clear definitions of all the concepts that comprise the theory: “What are the qualifying characteristics of a democratic state?” and “What exactly is meant by an aggressive war?”. By subjecting the theory to such tests, the researcher attempts to reveal any weaknesses that might invalidate the theoretical framework as a whole. Only after this second round of scrutiny can the researcher consider the implications and expectations of the theory before it is ready to be reapplied to the empirical world.

Assuming that Kant’s democratic peace theory passes the second round of scrutiny unscathed, can it finally be treated as a scientific law in the same way as Newton’s law of gravity, for example? Like laws, theories are epistemologically valuable for their descriptive and explanatory nature. They help researchers understand how things are by making comprehensible connections between abstract concepts and the natural world, and they also help explain why things are by mapping the causal relationships between different phenomena. However, unlike laws, they lack an absolute predictive quality. While the first principle that the majority of people would not vote for an aggressive war may hold true today and is most likely to hold true in the foreseeable future, it cannot be said with certainty that it would hold true under all circumstances. A widespread conspiracy, disinformation campaigns, fear tactics, and manipulation of the democratic process could all plausibly contribute to a violation of this first principle. So far, however, Kant’s theory and the first principles on which it rests have given advocates of democratic government reason to be optimistic.

  • Bellucci, F., & Pietarinen, A.-V. (n.d.). Charles Sanders Peirce: Logic. In The Internet Encyclopedia of Philosophy . https://www.iep.utm.edu/peir-log/#SSH2bi
  • Ferraro, V. (n.d.). Immanuel Kant Perpetual Peace: A Philosophical Sketch . Mount Holyoke College. https://www.mtholyoke.edu/acad/intrel/kant/kant1.htm
  • Lancaster University. (2003). History of Philosophy in the 17th and 18th Centuries: Descartes’ “Principles of Philosophy”. https://www.lancaster.ac.uk/users/philosophy/courses/211/Descartes'%20Principles.htm
  • Simpson, D. (n.d.). Francis Bacon (1561-1626). In The Internet Encyclopedia of Philosophy . https://iep.utm.edu/bacon/#SH2k
  • Toshkov, D. (2016). Research Design in Political Science (1st ed.). Palgrave Macmillan.

Apply first principles thinking yourself?

hypothesis generation induction

Would you like to apply first principles thinking yourself and have your problem-solving experience published in the First Principles Thinking Review ? Then be sure to check out the submission guidelines and send us your rough idea or topic proposal. Our editorial team would be happy to work with you to turn that idea into an article. 

Share this page

Disclaimer  : The views, thoughts and opinions expressed in submissions published by FIPS reflect those of the authors and do not necessarily reflect the views held by FIPS, the FIPS team or the authors' employer.

Copyrights  : You are more than welcome to share this article. If you want to use this material, for example when writing an article of your own, keep in mind that we use cc license BY-NC-SA. Learn more about the cc license   here  .

What's new?

Using First Principles Thinking to Solve Our Hiring Heroes Problem in the United States

Using First Principles Thinking to Solve Our Hiring Heroes Problem in the United States

hypothesis generation induction

FIPS Releases its First Principles Thinking Review (Volume 3 / Issue 1)

hypothesis generation induction

“First Principles” by Thomas E. Ricks: A Book Review

hypothesis generation induction

First Principles Thinking as a Collaborative Effort to Bring Hope to a Troubled Neighbourhood

hypothesis generation induction

Can First Principles Thinking Spur Creative Ideas About Peace?

hypothesis generation induction

The Role of First Principles Thinking in Singapore’s Nation-Building

Unsplash - Using First Principles Thinking in Marketing How I Generated Media Attention to Help Feed

Using First Principles Thinking in Marketing: How I Generated Media Attention to Help Feed the Homeless

First Principles Thinking Review (Volume 2 / Issue 1) by the Factory for Innovative Policy Solutions

FIPS Releases its First Principles Thinking Review (Volume 2 / Issue 1)

First Principles Thinking Innovation Kmar Hachicha Factory for Innovative Policy Solutions

What I Discovered Writing a Thesis on First Principles Thinking: An Interview with Kmar Hachicha

Giza Pyramid, first principles thinking, Factory for Innovative Policy Solutions

First Principles Thinking as a Tool to Come Up with Possible Explanations: How Were the Pyramids of Giza Built?

hypothesis generation induction

ACM Digital Library home

  • Advanced Search

Epistemological foundations of the JSM method for automatic hypothesis generation

  • 21 citation

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options.

  • Gusakova S (2023) Expansion of the Functionality of the Intellectual Psychological and Forensic System to Solve New Problems Pattern Recognition and Image Analysis 10.1134/S1054661823030185 33 :3 (345-349) Online publication date: 1-Sep-2023 https://dl.acm.org/doi/10.1134/S1054661823030185
  • Finn V (2020) Exact Epistemology and Artificial Intelligence Automatic Documentation and Mathematical Linguistics 10.3103/S0005105520030073 54 :3 (140-173) Online publication date: 1-May-2020 https://dl.acm.org/doi/10.3103/S0005105520030073
  • Finn V (2019) On the Heuristics of JSM Research (Additions to Articles) Automatic Documentation and Mathematical Linguistics 10.3103/S0005105519050078 53 :5 (250-282) Online publication date: 1-Sep-2019 https://dl.acm.org/doi/10.3103/S0005105519050078
  • Show More Cited By

Index Terms

Theory of computation

Recommendations

On the class of jsm reasoning that uses the isomorphism of inductive inference rules.

This paper defines a special class of JSM reasoning whose strategies use the isomorphism of direct products of lattices that represent inductive inference rules. It is shown that the JSM reasoning formed by inductive inferences rules, analogical ...

On the Heuristics of JSM Research (Additions to Articles)

The logical means of detecting empirical regularities using the JSM method of automated research support are considered. Generators of hypotheses about the causes and hypotheses about predictions that are stored in sequences of expandable fact ...

The intelligent analysis of the cytotoxic activity of a chemical compound using the strategies of the JSM method

The intelligent analysis of the cytotoxic activities of trifluoro-substituted pyrazolo(1,5-a)pyrimidines was carried out using the strategies of the JSM method, viz., induction, analogy, and abduction. The cytotoxic activity of the compounds relating to ...

Information

Published in.

Springer-Verlag

Berlin, Heidelberg

Publication History

Author tags.

  • JSM method for automatic hypothesis generation
  • JSM reasoning
  • JSM-reasoning quality evaluation
  • epistemological foundations of the JSM method for automatic hypothesis generation
  • natural-scientific problem of induction
  • problem of induction
  • procedural semantics
  • quasi-axiomatic theories

Contributors

Other metrics, bibliometrics, article metrics.

  • 21 Total Citations View Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0
  • Gusakova S Dobrinin D Kharchevnikova N (2019) Comparison of Data Representation Languages in the Structure–Activity Problem Automatic Documentation and Mathematical Linguistics 10.3103/S0005105519050030 53 :5 (225-233) Online publication date: 1-Sep-2019 https://dl.acm.org/doi/10.3103/S0005105519050030
  • Gusakova S Okhlupina A (2019) Intelligent DSM Systems as an Automated Support Tool for Scientific Research on Handwriting Automatic Documentation and Mathematical Linguistics 10.3103/S0005105519030063 53 :3 (114-121) Online publication date: 1-May-2019 https://dl.acm.org/doi/10.3103/S0005105519030063
  • Christen P Fabbro O (2019) Cybernetical Concepts for Cellular Automaton and Artificial Neural Network Modelling and Implementation 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) 10.1109/SMC.2019.8913839 (4124-4130) Online publication date: 6-Oct-2019 https://dl.acm.org/doi/10.1109/SMC.2019.8913839
  • Mikheyenkova M Klimova S (2018) Knowledge Discovery in Social Research Automatic Documentation and Mathematical Linguistics 10.3103/S0005105518060079 52 :6 (318-329) Online publication date: 1-Nov-2018 https://dl.acm.org/doi/10.3103/S0005105518060079
  • Finn V Shesternikova O (2018) The Heuristics of Detection of Empirical Regularities by JSM Reasoning Automatic Documentation and Mathematical Linguistics 10.3103/S0005105518050023 52 :5 (215-247) Online publication date: 1-Sep-2018 https://dl.acm.org/doi/10.3103/S0005105518050023
  • Kotelnikov E (2018) TextJSM Automatic Documentation and Mathematical Linguistics 10.3103/S0005105518010089 52 :1 (24-34) Online publication date: 1-Jan-2018 https://dl.acm.org/doi/10.3103/S0005105518010089
  • Gusakova S (2018) Analysis of Spaces of Similarity Generated by a Fact Base in JSM Problems Automatic Documentation and Mathematical Linguistics 10.3103/S0005105518010065 52 :1 (44-50) Online publication date: 1-Jan-2018 https://dl.acm.org/doi/10.3103/S0005105518010065

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

We can help you reset your password using the email address linked to your BioOne Complete account.

hypothesis generation induction

  • BioOne Complete
  • BioOne eBook Titles
  • By Publisher
  • About BioOne Digital Library
  • How to Subscribe & Access
  • Library Resources
  • Publisher Resources
  • Instructor Resources
  • FIGURES & TABLES
  • DOWNLOAD PAPER SAVE TO MY LIBRARY

Helping students understand and generate appropriate hypotheses and test their subsequent predictions — in science in general and biology in particular — should be at the core of teaching the nature of science. However, there is much confusion among students and teachers about the difference between hypotheses and predictions. Here, I present evidence of the problem and describe steps that scientists actually follow when employing scientific reasoning strategies. This is followed by a proposed solution for helping students effectively explore this important aspect of the nature of science.

hypothesis generation induction

KEYWORDS/PHRASES

Publication title:, collection:, publication years.

Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center

experiments disproving spontaneous generation

  • When did science begin?
  • Where was science invented?

Blackboard inscribed with scientific formulas and calculations in physics and mathematics

scientific hypothesis

Our editors will review what you’ve submitted and determine whether to revise the article.

  • National Center for Biotechnology Information - PubMed Central - On the scope of scientific hypotheses
  • LiveScience - What is a scientific hypothesis?
  • The Royal Society - Open Science - On the scope of scientific hypotheses

experiments disproving spontaneous generation

scientific hypothesis , an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an “If…then” statement summarizing the idea and in the ability to be supported or refuted through observation and experimentation. The notion of the scientific hypothesis as both falsifiable and testable was advanced in the mid-20th century by Austrian-born British philosopher Karl Popper .

The formulation and testing of a hypothesis is part of the scientific method , the approach scientists use when attempting to understand and test ideas about natural phenomena. The generation of a hypothesis frequently is described as a creative process and is based on existing scientific knowledge, intuition , or experience. Therefore, although scientific hypotheses commonly are described as educated guesses, they actually are more informed than a guess. In addition, scientists generally strive to develop simple hypotheses, since these are easier to test relative to hypotheses that involve many different variables and potential outcomes. Such complex hypotheses may be developed as scientific models ( see scientific modeling ).

Depending on the results of scientific evaluation, a hypothesis typically is either rejected as false or accepted as true. However, because a hypothesis inherently is falsifiable, even hypotheses supported by scientific evidence and accepted as true are susceptible to rejection later, when new evidence has become available. In some instances, rather than rejecting a hypothesis because it has been falsified by new evidence, scientists simply adapt the existing idea to accommodate the new information. In this sense a hypothesis is never incorrect but only incomplete.

The investigation of scientific hypotheses is an important component in the development of scientific theory . Hence, hypotheses differ fundamentally from theories; whereas the former is a specific tentative explanation and serves as the main tool by which scientists gather data, the latter is a broad general explanation that incorporates data from many different scientific investigations undertaken to explore hypotheses.

Countless hypotheses have been developed and tested throughout the history of science . Several examples include the idea that living organisms develop from nonliving matter, which formed the basis of spontaneous generation , a hypothesis that ultimately was disproved (first in 1668, with the experiments of Italian physician Francesco Redi , and later in 1859, with the experiments of French chemist and microbiologist Louis Pasteur ); the concept proposed in the late 19th century that microorganisms cause certain diseases (now known as germ theory ); and the notion that oceanic crust forms along submarine mountain zones and spreads laterally away from them ( seafloor spreading hypothesis ).

Help | Advanced Search

Computer Science > Computation and Language

Title: explainable biomedical hypothesis generation via retrieval augmented generation enabled large language models.

Abstract: The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial for achieving accurate information. In this protocol, we present RUGGED (Retrieval Under Graph-Guided Explainable disease Distinction), a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation, identifying validated paths forward. Relevant biomedical information from publications and knowledge bases are reviewed, integrated, and extracted via text-mining association analysis and explainable graph prediction models on disease nodes, forecasting potential links among drugs and diseases. These analyses, along with biomedical texts, are integrated into a framework that facilitates user-directed mechanism elucidation as well as hypothesis exploration through RAG-enabled LLMs. A clinical use-case demonstrates RUGGED's ability to evaluate and recommend therapeutics for Arrhythmogenic Cardiomyopathy (ACM) and Dilated Cardiomyopathy (DCM), analyzing prescribed drugs for molecular interactions and unexplored uses. The platform minimizes LLM hallucinations, offers actionable insights, and improves the investigation of novel therapeutics.
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: [cs.CL]
  (or [cs.CL] for this version)

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Gen Generalizing Induction Hypotheses

  • P n = "if double n = double m , then n = m "
  • P O (i.e., "if double O = double m then O = m ")
  • P n → P ( S n ) (i.e., "if double n = double m then n = m " implies "if double ( S n ) = double m then S n = m ").
  • "if double n = double m then n = m "
  • "if double ( S n ) = double m then S n = m ".
  • Q = "if double n = 10 then n = 5 "
  • R = "if double ( S n ) = 10 then S n = 5 ".
  • First, suppose m = 0 , and suppose n is a number such that double n = double m . We must show that n = 0 . Since m = 0 , by the definition of double we have double n = 0 . There are two cases to consider for n . If n = 0 we are done, since this is what we wanted to show. Otherwise, if n = S n' for some n' , we derive a contradiction: by the definition of double we would have n = S ( S ( double n' )) , but this contradicts the assumption that double n = 0 .
  • Otherwise, suppose m = S m' and that n is again a number such that double n = double m . We must show that n = S m' , with the induction hypothesis that for every number s , if double s = double m' then s = m' . By the fact that m = S m' and the definition of double , we have double n = S ( S ( double m' )) . There are two cases to consider for n . If n = 0 , then by definition double n = 0 , a contradiction. Thus, we may assume that n = S n' for some n' , and again by the definition of double we have S ( S ( double n' )) = S ( S ( double m' )) , which implies by inversion that double n' = double m' . Instantiating the induction hypothesis with n' thus allows us to conclude that n' = m' , and it follows immediately that S n' = S m' . Since S n' = n and S m' = m , this is just what we wanted to show. ☐

Exercise: 3 stars, recommended (gen_dep_practice)

Exercise: 3 stars, optional (index_after_last_informal), exercise: 3 stars (gen_dep_practice_opt), exercise: 3 stars (app_length_cons), exercise: 4 stars, optional (app_length_twice).

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

"by induction hypothesis" or "by THE induction hypothesis"

One of the techniques for proving statements in mathematics is "mathematical induction" ( wikipedia entry). Very informally and not precisely speaking, when conducting a proof using this technique, (1) one proves that the statement is true in the simplest possible case, then (2) one assumes that the statement is true for a more complex case, and then (3) one proves that the assumption from step 2 implies the validity of the statement for even more complex case.

The assumption from step 2 is called "induction hypothesis". When executing step 3, one often reffers to this assumption. How should one refer to it? Is it

In each of the wikipedia examples there is "the", which I believe to be correct, since we are referring to a specific, concrete assumption that has been made. I've seen mathematical textbooks that use only this variant, but also ones that mix both variants; the same goes for research articles.

  • mathematics

abebebebahabe's user avatar

  • Always use 'the' here. –  Mitch Commented Oct 1, 2019 at 15:23
  • 1 Missing "the" is often a sign of second-language speakers of English who do not use definite articles in their first language. If the meaning is obvious (e.g. there is only one induction hypothesis at this point), it may not be picked up by editors or reviewers. –  Henry Commented Oct 1, 2019 at 15:40
  • 1 @Mitch By induction hypothesis, we can deduce. This is fine. The is optional. String theory tells us that .... –  David M Commented Oct 1, 2019 at 16:13
  • @DavidM 1) that sounds awful. 2) I would think the rare instances of without the article would be be nonnative speakers (of non-arthrous languages). 3) String theory is not induction hypothesis. –  Mitch Commented Oct 1, 2019 at 16:43
  • @Mitch So String Theory has special dispensation to drop it's article? I'm a native speaker like you, and while I'd probably use the article, I don't think it would offend me to hear it omitted. –  David M Commented Oct 1, 2019 at 16:46

The wording

by induction hypothesis

is a common solecism found in many mathematical texts. The natural way to say it in English is:

by the induction hypothesis.

The grammatical analysis is that one is referring to a particular hypothesis, which would require the definite article. One might say grammatically

by hypothesis X, ...

that is, X being the name of a given hypothesis, or

by hypothesis, ...

and in the following give an instantiation of the hypothesis with other things in order to make a deduction.

It is ungrammatical to say 'We use induction hypothesis' or 'Induction hypothesis implies...'; 'induction hypothesis' by itself is not a proper NP.

However 'by induction hypothesis' is a common wording , half as common as 'by the induction hypothesis', so it must be a partially accepted idiom patterned after 'by induction'.

Mitch's user avatar

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged mathematics or ask your own question .

  • Featured on Meta
  • Announcing a change to the data-dump process
  • Upcoming initiatives on Stack Overflow and across the Stack Exchange network...
  • We spent a sprint addressing your requests — here’s how it went

Hot Network Questions

  • Splitting table header labels
  • Teaching students how to check the validity of their proofs
  • Designing an attitude indicator - Having issues with inclinometer
  • Did Arab Christians use the word "Allah" before Islam?
  • How to receive large files guaranteeing authenticity, integrity and sending time
  • The maximum area of a pentagon inside a circle
  • Does the question "will I get transplanted" make sense to your ears?
  • Why don't we call value investing "timing the market"?
  • Include clickable 'Fig.'/'Table' in \ref without duplication in captions
  • Left crank arm misaligned on climb
  • Always orient a sundial towards polar north?
  • Why does "They be naked" use the base form of "be"?
  • How to get not active object?
  • Is it rude to ask Phd student to give daily report?
  • What are good reasons for declining to referee a manuscript that hasn't been posted on arXiv?
  • Is there an equivalent of caniuse for commands on posix systems?
  • Does the universe include everything, or merely everything that exists?
  • Declension in book dedication
  • Can I cause a star to go supernova by altering the four fundamental forces?
  • What is the meaning of the "Super 8 - Interactive Teaser" under "EXTRAS" in Portal 2?
  • Wait, ASCII was 128 characters all along?
  • Is "farfel" an idiolectical quirk/part of a familect?
  • Symbol denoting parity eigenvalue
  • Is it worth it to apply to jobs that have over 100 applicants or have been posted for few days?

hypothesis generation induction

Boosting k -Induction with Continuously-Refined Invariants

  • Conference paper
  • First Online: 01 January 2015
  • Cite this conference paper

hypothesis generation induction

  • Dirk Beyer 15 ,
  • Matthias Dangl 15 &
  • Philipp Wendler 15  

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9206))

Included in the following conference series:

  • International Conference on Computer Aided Verification

4629 Accesses

50 Citations

1 Altmetric

\(k\) -induction is a promising technique to extend bounded model checking from falsification to verification. In software verification, \(k\) -induction works only if auxiliary invariants are used to strengthen the induction hypothesis. The problem that we address is to generate such invariants (1) automatically without user-interaction, (2) efficiently such that little verification time is spent on the invariant generation, and (3) that are sufficiently strong for a \(k\) -induction proof. We boost the \(k\) -induction approach to significantly increase effectiveness and efficiency in the following way: We start in parallel to \(k\) -induction a data-flow-based invariant generator that supports dynamic precision adjustment and refine the precision of the invariant generator continuously during the analysis, such that the invariants become increasingly stronger. The \(k\) -induction engine is extended such that the invariants from the invariant generator are injected in each iteration to strengthen the hypothesis. The new method solves the above-mentioned problem because it (1) automatically chooses an invariant by step-wise refinement, (2) starts always with a lightweight invariant generation that is computationally inexpensive, and (3) refines the invariant precision more and more to inject stronger and stronger invariants into the induction system. We present and evaluate an implementation of our approach, as well as all other existing approaches, in the open-source verification-framework CPAchecker . Our experiments show that combining \(k\) -induction with continuously-refined invariants significantly increases effectiveness and efficiency, and outperforms all existing implementations of \(k\) -induction-based verification of C programs in terms of successful results.

A preliminary version of this article appeared as technical report [ 8 ].

You have full access to this open access chapter,  Download conference paper PDF

Similar content being viewed by others

hypothesis generation induction

Safety Verification and Refutation by k-Invariants and k-Induction

hypothesis generation induction

DepthK: A k-Induction Verifier Based on Invariant Inference for C Programs

Handling loops in bounded model checking of c programs via k-induction.

  • Induction Hypothesis
  • Loop Iteration
  • Invariant Generation
  • Verification Task
  • Induction Algorithm

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Advances in software verification in recent years have lead to increased efforts towards applying formal verification methods to industrial software, in particular operating-systems code [ 3 , 4 , 34 ]. One model-checking technique that is implemented by half of the verifiers that participated in the 2015 Competition on Software Verification [ 7 ] is bounded model checking (BMC) [ 16 , 17 , 22 ]. For unbounded systems, BMC can be used only for falsification, not for verification [ 15 ]. This limitation to falsification can be overcome by combining BMC with mathematical induction and thus extending it to verification [ 26 ]. Unfortunately, inductive approaches are not always powerful enough to prove the required verification conditions, because not all program invariants are inductive [ 2 ]. Using the more general \(k\) -induction  [ 38 ] instead of standard induction is more powerful [ 37 ] and has already been implemented in the DMA-race analysis tool Scratch  [ 27 ] and in the software verifier Esbmc  [ 35 ]. Nevertheless, additional supportive measures are often required to guide \(k\) -induction and take advantage of its full potential [ 25 ]. Our goal is to provide a powerful and competitive approach for reliable, general-purpose software verification based on BMC and \(k\) -induction, implemented in a state-of-the-art software-verification framework.

Our contribution is a new combination of \(k\) -induction-based model checking with automatically-generated continuously-refined invariants that are used to strengthen the induction hypothesis, which increases the effectiveness and efficiency of the approach. BMC and \(k\) -induction are combined in an algorithm that iteratively increments the induction parameter  k (iterative deepening). The invariant generation runs in parallel to the \(k\) -induction proof construction, starting with relatively weak (but inexpensive to compute) invariants, and increasing the strength of the invariants over time as long as the analysis continues. The \(k\) -induction-based proof construction adopts the currently known set of invariants in every new proof attempt. This approach can verify easy problems quickly (with a small initial  k and weak invariants), and is able to verify complex problems by increasing the effort (by incrementing  k and searching for stronger invariants). Thus, it is both efficient and effective. In contrast to previous work [ 35 ], the new approach is sound. We implemented our approach as part of the open-source software-verification framework CPAchecker  [ 12 ], and we perform an extensive experimental comparison of our implementation against the two existing tools that use \(k\) -induction and against other common software-verification approaches.

Contributions. We make the following contributions:

a novel approach for providing continuously-refined invariants from data-flow analysis with precision adjustment in order to repeatedly inject invariants to \(k\) -induction,

an effective and efficient tool implementation of a framework for software verification with \(k\) -induction that allows to express all existing approaches to \(k\) -induction in a uniform, module-based, configurable architecture , and

an extensive experimental evaluation of (a) all approaches and their implementations in the framework, (b) the two existing \(k\) -induction tools Cbmc and Esbmc , and (c) the two different approaches predicate analysis and value analysis; the result being that the new technique outperforms all existing \(k\) -induction-based approaches to software verification.

Availability of Data and Tools. Our experiments are based on benchmark verification tasks from the 2015 Competition on Software Verification. All benchmarks, tools, and results of our evaluation are available on a supplementary web page Footnote 1 .

Safe example program example-safe , which cannot be proven with existing \(k\) -induction-based approaches

Unsafe example program example-unsafe , where some approaches may produce a wrong proof

Example. We illustrate the problem of \(k\) -induction that we address, and the strength of our approach, on two example programs. Both programs encode an automaton, which is typical, e.g., for software that implements a communication protocol. The automaton has a finite set of states, which is encoded by variable  s , and two data variables  x1 and  x2 . There are some state-dependent calculations (lines 6 and 7 in both programs) that alternatingly increment x1  and  x2 , and a calculation of the next state (lines 9 and 10 in both programs). The state variable cycles through the range from 1 to 4. These calculations are done in a loop with a non-deterministic number of iterations. Both programs also contain a safety property (the label ERROR should not be reachable). The program example-safe in Fig.  1 checks that in every fourth state, the values of x1  and  x2 are equal; it satisfies the property. The program example-unsafe in Fig.  2 checks that when the loop exits, the value of state variable  s is not greater or equal to 4; it violates the property.

First, note that the program example-safe is difficult or impossible to prove with many classical software-verification approaches other than \(k\) -induction: (1) BMC cannot prove safety for this program because the loop may run arbitrarily long. (2) Explicit-state model checking fails because of the huge state space ( x1 and x2 can get arbitrarily large). (3) Predicate analysis with counterexample-guided abstraction refinement (CEGAR) and interpolation is able to prove safety, but only if the predicate \( x1 = x2 \) gets discovered. If the interpolants contain instead only predicates such as \( x1 = 1\) , \( x2 = 1\) , \( x1 = 2\) , etc., the predicate analysis will not terminate. Which predicates get discovered is hard to control and usually depends on internal interpolation heuristics of the satisfiability-modulo-theory (SMT) solver. (4) Traditional 1-induction is also not able to prove the program safe because the assertion is checked only in every fourth loop iteration (when s equals 1). Thus, the induction hypothesis is too weak (the program state s = 4 , x1 = 0 , x2 = 1 is a counterexample for the step case in the induction proof).

Intuitively, this program should be provable by \(k\) -induction with a  k of at least 4. However, for every  k , there is a counterexample to the inductive-step case that refutes the proof. For such a counterexample, set s = \(-k\) , x1 = 0 , x2 = 1 at the beginning of the loop. Starting in this state, the program would increment  s k  times (induction hypothesis) and then reach s = 1 with property-violating values of x1  and  x2 in iteration  \(k+1\) (inductive step). It is clear that  s can never be negative, but this fact is not present in the induction hypothesis, and thus, the proof fails. This illustrates the general problem of \(k\) -induction-based verification: safety properties often do not hold in unreachable parts of the state space of a program, and \(k\) -induction alone does not distinguish between reachable and unreachable parts of the state space. Therefore, approaches based on \(k\) -induction without auxiliary invariants will fail to prove safety for program example-safe .

This program could of course be verified more easily if it were rewritten to contain a stronger safety property such as \(s \ge 1 \wedge s \le 4 \wedge (s = 2 \Rightarrow x1 = x2 +1) \wedge (s \ne 2 \Rightarrow x1 = x2 )\) (which is a loop invariant and allows a proof by 1-induction without auxiliary invariants). However, our goal is to automatically verify real programs, and programmers usually neither write down trivial properties such as \(s \ge 1\) nor more complex properties such as \(s \ne 2 \Rightarrow x1 = x2 \) .

Our approach of combining \(k\) -induction with invariants proves the program safe with \(k = 4\) and the invariant \(s \ge 1\) . This invariant is easy to find automatically using an inexpensive data-flow analysis, such as an interval analysis. For larger programs, a more complex invariant might be necessary, which might get generated at some point by our continuous strengthening of the invariant. Furthermore, stronger invariants can reduce the k  that is necessary to prove a program. For example, the invariant \(s \ge 1 \wedge s \le 4 \wedge (s \ne 2 \Rightarrow x1 = x2 )\) (which is still weaker than the full loop invariant above) allows to prove the program with \(k = 2\) . Thus, our strengthening of invariants can also shorten the inductive proof procedure and lead to better performance.

An existing approach tries to solve this problem of a too-weak induction hypothesis by initializing only the variables of the loop-termination condition to a non-deterministic value in the step case, and initializing all other variables to their initial value in the program [ 35 ]. However, this approach is not strong enough for the program example-safe and even produces a wrong proof (unsound result) for the program example-unsafe . This second example program contains a different safety property about  s , which is violated. Because the variable  s does not appear in the loop-termination condition, it is not set to an arbitrary value in the step case as it should be, and the inductive proof wrongly concludes that the program is safe because the induction hypothesis is too strong, leading to a missed bug and a wrong result. Our approach does not suffer from this unsoundness, because we add only invariants to the induction hypothesis that the invariant generation has proven to hold.

Related Work. The use of auxiliary invariants is a common technique in software verification [ 2 , 9 , 10 , 18 , 19 , 20 , 23 , 30 , 36 ], and techniques combining data-flow analysis and SMT solvers also exist [ 28 , 31 ]. In most cases, the purpose is to speed up the analysis. For \(k\) -induction, however, the use of invariants is crucial in making the analysis terminate at all (cf. Fig.  1 ). There are several approaches to software verification using BMC in combination with \(k\) -induction.

Split-Case Induction. We use the split-case k-induction technique [ 26 , 27 ], where the base case and the step case are checked in separate steps. Earlier versions of Scratch  [ 27 ] that use this technique transform programs with multiple loops into programs with only one single monolithic loop using a standard approach [ 1 ]. The alternative of recursively applying the technique to nested loops is discarded by the authors of Scratch  [ 27 ], because the experiments suggested it was less efficient than checking the single loop that is obtained by the transformation. We also experimented with single-loop transformation, but our experimental results suggest that checking all loops at once in each case instead of checking the monolithic transformation result (which also encodes all loops in one) has no negative performance impact, so for simplicity, we omit the transformation. Scratch also supports combined-case k-induction  [ 25 ], for which all loops are cut by replacing them with k copies each for the base and the step case, and setting all loop-modified variables to non-deterministic values before the step case. That way, both cases can be checked at once in the transformed program and no special handling for multiple loops is required. When using combined-case \(k\) -induction, Scratch requires loops to be manually annotated with the required k values, whereas its implementation of split-case \(k\) -induction supports iterative deepening of k as in our implementation. Contrary to Scratch , we do not focus on one specific problem domain [ 26 , 27 ], but want to provide a solution for solving a wide range of heterogeneous verification tasks.

Auxiliary Invariants. While both the split-case and the combined-case \(k\) -induction supposedly succeed with weaker auxiliary invariants than for example the inductive invariant approach [ 5 ], the approaches still do require auxiliary invariants in practice, and the tool Scratch requires these invariants to be annotated manually [ 25 , 27 ]. There are techniques for automatically generating invariants that may be used to help inductive approaches to succeed (e.g. [ 2 , 9 , 20 ]. These techniques, however, do not justify their additional effort because they are not guaranteed to provide the required invariants on time, especially if strong auxiliary invariants are required. Based on previous ideas of supporting \(k\) -induction with invariants generated by lightweight data-flow analysis [ 24 ], we therefore strive to leverage the power of the \(k\) -induction approach to succeed with auxiliary invariants generated by a data-flow analysis based on intervals. However, to handle cases where it is necessary to invest more effort into invariant generation, we increase the precision of these invariants over time.

Invariant Injection. A verification tool using a strategy similar to ours is PKind  [ 28 , 33 ], a model checker for Lustre programs based on \(k\) -induction. In PKind , there is a parallel computation of auxiliary invariants, where candidate invariants derived by templates are iteratively checked via \(k\) -induction and, if successful, added to the set of known invariants [ 32 ]. While this allows for strengthening the induction hypothesis over time, the template-based approach lacks the flexibility that is available to an invariant generator using dynamic precision refinement [ 11 ], and the required additional induction proofs are potentially expensive. We implemented checking candidate invariants with \(k\) -induction as a possible strategy of our invariant generation component.

Unsound Strengthening of Induction Hypothesis. Esbmc does not require additional invariants for \(k\) -induction, because it assigns non-deterministic values only to the loop-termination condition variables before the inductive-step case [ 35 ] and thus retains more information than our as well as the Scratch implementation [ 25 , 27 ], but \(k\) -induction in Esbmc is therefore potentially unsound. Our goal is to perform a real proof of safety by removing all pre-loop information in the step case, thus treating the unrolled iterations in the step case truly as “any k consecutive iterations”, as is required for the mathematical induction. Our approach counters this lack of information by employing incrementally-refined invariant generation.

Parallel Induction. PKind checks the base case and the step case in parallel, and Esbmc supports parallel execution of the base case, the forward condition, and the inductive-step case. In contrast, our base case and inductive-step case are checked sequentially, while our invariant generation runs in parallel to the base- and step-case checks.

2 k -Induction with Continuously-Refined Invariants

Our verification approach consists of two algorithms that run concurrently. One algorithm is responsible for generating program invariants, starting with an imprecise invariant, continuously refining (strengthening) the invariant. The other algorithm is responsible for finding error paths with BMC, and for constructing safety proofs with \(k\) -induction, for which it periodically picks up the new invariant that the former algorithm has constructed so far. The \(k\) -induction algorithm uses information from the invariant generation, but not vice versa. In our presentation, we assume that each program contains at most one loop; in our implementation, we handle programs with multiple loops by checking all loops together.

Iterative-Deepening \(\varvec{k}\) -Induction. Algorithm 1 shows our extension of the \(k\) -induction algorithm to a combination with continuously-refined invariants. Starting with an initial value for the bound k , e.g., 1, we iteratively increase the value of  k after each unsuccessful attempt at finding a specification violation or proving correctness of the program using \(k\) -induction. The following description of our approach to \(k\) -induction is based on split-case \(k\) -induction  [ 25 ], where for the propositional state variables s and \(s'\) within a state-transition system that represents the program, the predicate I ( s ) denotes that s is an initial state, \(T(s,s')\) states that a transition from s to \(s'\) exists, and P ( s ) asserts the safety property for the state s .

Base Case. Lines 3 to 5 implement the base case , which consists of running BMC with the current bound  k . This means that starting from an initial program state, all paths of the program up to a maximum path length  \(k-1\) are explored. If an error path is found, the algorithm terminates.

Forward Condition. Otherwise we check whether there exists a path with length \(k' > k - 1\) in the program, or whether we have already fully explored the state space of the program (lines 6 to 8). In the latter case the program is safe and the algorithm terminates. This check is called the forward condition [ 29 ].

Inductive Step. Checking the forward condition can, however, only prove safety for programs with finite (and short) loops. Therefore, the algorithm also attempts an inductive proof (lines 9 to 14). The inductive-step case checks if, after every sequence of k  loop iterations without a property violation, there is also no property violation before loop iteration  \(k+1\) . For model checking of software, however, this check would often fail inconclusively without auxiliary invariants [ 8 ]. In our approach, we make use of the fact that the invariants that were generated so far by the concurrently-running invariant-generation algorithm hold, and conjunct these facts to the induction hypothesis. Thus, the inductive-step case proves a program safe if the following condition is unsatisfiable:

where \( Inv \) is the currently available program invariant, and \(s_n, \ldots , s_{n+k}\) is any sequence of states. If this condition is satisfiable, then the induction check is inconclusive, and the program is not yet proved safe or unsafe with the current value of  k and the current invariant. If during the time of the satisfiability check of the step case, a new (stronger) invariant has become available (condition in line 14 is false), we immediately re-check the step case with the new invariant. This can be done efficiently using an incremental SMT solver for the repeated satisfiability checks in line 12. Otherwise, we start over with an increased value of  k .

Note that the inductive-step case is similar to a BMC check for the presence of error paths of length exactly  \({k+1}\) . However, as the step case needs to consider any consecutive \(k+1\) loop iterations, and not only the first such iterations, it does not assume that the execution of the loop iterations begins in an initial state. Instead, it assumes that there is a sequence of k iterations without any property violation (induction hypothesis).

Continuous Invariant Generation. Our continuous invariant generation incrementally produces stronger and stronger program invariants. It is based on iterative refinement, each time using an increased precision. After each strengthening of the invariant, it can be used as injection invariant by the \(k\) -induction procedure. It may happen that this analysis proves safety of the program all by itself, but this is not its main purpose here.

Our \(k\) -induction module works with any kind of invariant-generation procedure, as long as its precision, i.e., its level of abstraction, is configurable. We implemented two different invariant-generation approaches: KI and DF, described below.

Configurable design of a \(k\) -induction framework

We use the design of Fig.  3 to explain our flexible and modular framework for \(k\) -induction: \(k\) -induction is a verification technique, i.e., an invariant generation. In this paper, the main algorithm is thus the \(k\) -induction, as defined in Algorithm 1. We denote the algorithm by KI. If invariants are generated and injected into KI, we denote this injection by KI \(\leftarrow \) . Thus, the use of generated invariants that are produced by a data-flow analysis (DF) are denoted by KI \(\leftarrow \) DF. If the invariant generator continuously refines the invariants and repeatedly injects those invariants into KI, this is denoted by

more specifically, if data-flow analysis with dynamic precision adjustment (our new contribution) is used, we have

and if the PKind approach is used, i.e., KI is used to construct invariants, we have

Now, since the second KI, which constructs invariants for injection into the first KI, can again get invariants injected, we can further build an approach

that combines all approaches such that the invariant-generating KI benefits from the invariants generated with DF, and the main KI algorithm that tries to prove program safety benefits from both invariant generators.

KI. PKind  [ 33 ] introduced the idea to construct invariants for injection in parallel, using a template-based method that extracts candidate invariants from the program and verifies their validity using \(k\) -induction  [ 32 ]. If the candidate invariants are found to be valid, they are injected to the main \(k\) -induction procedure. We re-implemented the PKind approach in our framework

using a separate instance of \(k\) -induction to prove candidate invariants. Being based on \(k\) -induction, the power of this technique is continuously increased by increasing  k . We derive the candidate invariants by taking the negations of assumptions on the control-flow paths to error locations. Similar to our Algorithm 2, each time this \(k\) -induction algorithm succeeds in proving a candidate invariant, the previously-known invariant is strengthened with this newly generated invariant. In our tool, we used an instance of Algorithm 1 to implement this approach. We are thus able to further combine this technique with other auxiliary invariant-generation approaches.

DF. As a second invariant-generation approach (our contribution), we use the reachability algorithm \(\mathsf {CPAAlgorithm}\) for configurable program analysis with dynamic precision adjustment [ 11 ]. Algorithm 2 shows our continuous invariant generation. The initial program invariant is represented by the formula  true . We start with running the invariant-generating analysis once with a coarse initial precision (line 4). After each run of the program-invariant generation, we strengthen the previously-known program invariant with the newly-generated invariant (line 7, note that the program invariant \( Inv \) is not a safety invariant) and announce it globally (such that the \(k\) -induction algorithm can inject it). If the analysis was able to prove safety of the program, the algorithm terminates (lines 5 to 6). Otherwise, the analysis is restarted with a higher precision. The  \(\mathsf {CPAAlgorithm}\) takes as input a configurable program analysis (CPA), a set of initial abstract states, and a precision. It returns a set of reachable abstract states that form an over-approximation of the reachable program states. Depending on the used CPA and the precision, the analysis by \(\mathsf {CPAAlgorithm}\) can be efficient and abstract like data-flow analysis or expensive and precise like model checking.

For invariant generation, we choose an abstract domain based on expressions over intervals [ 8 ]. Note that this is not a requirement of our approach, which works with any kind of domain. Our choice is based on the high flexibility of this domain, which can be fast and efficient as well as precise. For this CPA, the precision is a triple ( Y ,  n ,  w ), where \(Y \subseteq X\) is a specific selection of important program variables, n  is the maximal nesting depth of expressions in the abstract state, and w  is a boolean specifying whether widening should be used. Those variables that are considered important will not be over-approximated by joining abstract states. With a higher nesting depth, more precise relations between variables can be represented. The use of widening ensures timely termination (at the expense of a lower precision), even for programs with loops with many iterations, like those in the examples of Figs.  1 and 2 . An in-depth description of this abstract domain is presented in a technical report [ 8 ].

3 Experimental Evaluation

We implemented all existing approaches to \(k\) -induction, compare all configurations with each other, and the best configuration with other \(k\) -induction-based software verifiers, as well as to two standard approaches to software verification: predicate and value analysis.

Benchmark Verification Tasks. As benchmark set we use verification tasks from the 2015 Competition on Software Verification (SV-COMP’15) [ 7 ]. We took all 3 964 verification tasks from the categories ControlFlow , DeviceDrivers64 , HeapManipulation , Sequentialized , and Simple . The remaining categories were excluded because they use features (such as bit-vectors, concurrency, and recursion) that not all configurations of our evaluation support. A total of 1 148 verification tasks in the benchmark set contain a known specification violation. Although we cannot expect an improvement for these verification tasks when using auxiliary invariants, we did not exclude them because this would unfairly give advantage to the new approach (which spends some effort generating invariants, which are not helpful when proving existence of an error path).

Experimental Setup. All experiments were conducted on computers with two 2.6 GHz 8-Core CPUs (Intel Xeon E5-2560 v2) with 135 GB of RAM. The operating system was Ubuntu 14.04 (64 bit), using Linux 3.13 and OpenJDK 1.7. Each verification task was limited to two CPU cores, a CPU run time of 15 min, and a memory usage of 15 GB. The benchmarking framework BenchExec Footnote 2 ensures precise and reproducible results.

Presentation. All benchmarks, tools, and the full results of our evaluation are available on a supplementary web page. Footnote 3 All reported times are rounded to two significant digits. We use the scoring scheme of SV-COMP’15 to calculate a score for each configuration. For every real bug found, 1 point is assigned, for every correct safety proof, 2 points are assigned. A score of 6 points is subtracted for every wrong alarm (false positive) reported by the tool, and 12 points are subtracted for every wrong proof of safety (false negative). This scoring scheme values proving safety higher than finding error paths, and significantly punishes wrong answers, which is in line with the community consensus [ 7 ] on difficulty of verification vs. falsification and importance of correct results. We consider this a good fit for evaluating an approach such as \(k\) -induction, which targets at producing safety proofs.

In Figs.  4 and 5 , we present experimental results using a plot of quantile functions for accumulated scores as introduced by the Competition on Software Verification [ 6 ], which shows the score and CPU time for successful results and the score for wrong answers. A data point ( x ,  y ) of a graph means that for the respective configuration the sum of the scores of all wrong answers and the scores for all correct answers with a run time of less than or equal to y  seconds is  x . For the left-most point ( x ,  y ) of each graph, the x -value shows the sum of all negative scores for the respective configuration and the y -value shows the time for the fastest successful result. For the right-most point ( x ,  y ) of each graph, the x -value shows the total score for this configuration, and the y -value shows the maximal run time. A configuration can be considered better, the further to the right (the closer to 0) its graph begins (fewer wrong answers), the further to the right it ends (more correct answers), and the lower its graph is (less run time).

Comparison of \(\varvec{k}\) - Induction-Based Approaches. We implemented all approaches in the Java -based open-source software-verification framework CPAchecker  [ 12 ], which is available online Footnote 4 under the Apache 2.0 License. For the experiments, we used version 1.4.5-cav15 of CPAchecker , with SMTInterpol  [ 21 ] as SMT solver (using uninterpreted functions and linear arithmetic over integers and reals). The \(k\) -induction algorithm of CPAchecker was configured to increment  k by 1 after each try (in Algorithm 1, \(\mathsf {inc}(k) = k+1\) ). The precision refinement of the DF-based continuous invariant generation (Algorithm 2) was configured to increment the number of important program variables in the first, third, fifth, and any further precision refinements. The second precision refinement increments the expression-nesting depth, and the fourth disables the widening.

We evaluated the following groups of \(k\) -induction approaches: (1) without any auxiliary invariants (KI), (2) with auxiliary invariants of different precisions generated by the DF approach (KI \(\leftarrow \) DF), and (3) with continuously-refined invariants

The \(k\) -induction-based configuration using no auxiliary invariants (KI) is an instance of Algorithm 1 where \(\mathsf {get\_currently\_known\_invariant}()\) always returns \( true \) as invariant and Algorithm 2 does not run at all.

The configurations using generated invariants (KI \(\leftarrow \) DF) are also instances of Algorithm 1. Here, Algorithm 2 runs in parallel, however, it terminates after one loop iteration. We denote these configurations with triples ( s ,  n ,  w ) that represent the precision ( Y ,  n ,  w ) of the invariant generation with s  being the size of the set of important program variables ( \(s = |Y|\) ). For example, the first of these configurations, \((0,1, true )\) , has no variables in the set  Y of important program variables (i.e., all variables get over-approximated by the merge operator), the maximum nesting depth of expressions in the abstract state is 1, and the widening operator is used. The remaining configurations we use are  \((8,2, true )\) , \((16,2, true )\) , and  \((16,2, false )\) . These configurations were selected because they represent some of the extremes of the precisions that are used during dynamic invariant generation. It is impossible to cover every possible valid configuration within the scope of this paper.

There are three configurations using continuously-refined invariants: (1) using the \(k\) -induction approach similar to PKind to generate invariants, refining by increasing k , denoted as

(2) using the DF -based approach to generate invariants, refining by precision adjustment, denoted as

and (3) using both approaches in parallel combination, denoted as

All configurations using invariant generation run the generation in parallel to the main \(k\) -induction algorithm, an instance of Algorithm 1.

Score and Reported Results. The configuration KI with no invariant generation receives the lowest score of  \(2\,246\) , and (as expected) can verify only \(1\,531\)  programs successfully. This shows that it is indeed important in practice to enhance \(k\) -induction-based software verification with invariants. The configurations KI \(\leftarrow \) DF using invariant generation produce similar numbers of correct results (around \(2\,400\) ), improving upon the results of the plain \(k\) -induction without auxiliary invariants by a score of  \(1\,700\) to  \(1\,800\) . Even though these configurations solve a similar number of programs, a closer inspection reveals that each of the configurations is able to correctly solve significant amounts of programs where the other configurations run into timeouts. This observation explains the high score of \(4\,249\) points achieved by our approach of injecting the continuously-refined invariants generated with data-flow analysis into the \(k\) -induction engine (configuration

). By combining the advantages of fast and coarse precisions with those of slow but fine precisions, it correctly solves  \(2\,507\) verification tasks, which is 45 more than the best of the chosen configurations without dynamic refinement. Using a \(k\) -induction-based invariant generation as done by PKind (configuration

) is also a successful technique for improving the amount of solvable verification tasks, and thus, combining both invariant-generation approaches with continuously refining their precision and injecting the generated invariants into the \(k\) -induction engine (configuration

) is the most effective of all evaluated \(k\) -induction-based approaches, with a score of  \(4\,282\) , and \(2\,519\) correct results. The few wrong proofs produced by the configurations are not due to conceptual problems, but only due to incompleteness in the analyzer’s handling of certain constructs such as unbounded arrays and pointer aliasing.

Performance. Table  1 shows that by far the largest amount of time is spent by the configuration KI (no auxiliary invariants), because for those programs that cannot be proved without auxiliary invariants, the \(k\) -induction procedure loops incrementing  k until the time limit is reached. The wall times and CPU times for the correct results correlate roughly with the amount of correct results, i.e., on average about the same amount of time is spent on correct verifications, whether or not invariant generation is used. This shows that the overhead of generating auxiliary invariants is well-compensated.

The configurations with invariant generation have a relatively higher CPU time compared to their wall time because these configurations spend some time generating invariants in parallel to the \(k\) -induction algorithm. The results show, however, that the time spent for the continuously-refined invariant generation clearly pays off as the configuration using both data-flow analysis and \(k\) -induction for invariant generation is not only the one with the most correct results, but at the same time one of the two fastest configurations with only 320 h in total. Even though they produced much more correct results, the configurations

did not exceed the times of the chosen configurations using invariant generation without continuous refinement. The configuration

using only \(k\) -induction to continuously generate invariants is slower, but produces results for some programs where the configuration

fails. The results show that the combination of the techniques reaps the benefits of both.

These results show that the additional effort invested in generating auxiliary invariants is well-spent, as it even decreases the overall time due to the fewer timeouts. As expected, the continuously-refined invariants solve many tasks quicker than the configurations using invariant generation with high precisions and without refinement.

Final value of k. The bottom of Table  1 shows some statistics about the final values of  k for the correct safety proofs. There are only small differences between the maximum k  values of most of the configurations. Interestingly, the configuration using non-dynamic invariant generation with high precision has a higher maximum final value of  k than the others, because for the verification task afnp2014_true-unreach-call.c.i , a strong invariant generated only with this configuration allowed the proof to succeed. This effect is also observable in the continuously-refined configurations using invariants generated by data-flow analysis: They are also able to solve this verification task, and, by dynamically increasing the precision, find the required auxiliary invariant even earlier with loop bounds 112 and  111, respectively. There is also a verification task in the benchmark set, gj2007_true-unreach-call.c.i , where most configurations need to unroll a loop with bound 100 to prove safety, while the strong invariant generation technique allows the proof to succeed earlier, at a loop bound of 16. The continuously-refined configurations benefit from the same effect:

solve this task at loop bounds 22 and 19, respectively.

Comparison with Other Tools. For comparison with other \(k\) -induction-based tools, we evaluated Esbmc and Cbmc , two software model checkers with support for \(k\) -induction. For Cbmc , we used version 5.1 in combination with a wrapper script for split-case \(k\) -induction provided by M. Tautschnig. For Esbmc we used version 1.25.2 in combination with a wrapper script that enables \(k\) -induction (based on the SV-COMP’13 submission [ 35 ]). We also provide results for the experimental parallel \(k\) -induction of Esbmc , but note that our benchmark setup is not focused on parallelization (using only two CPU cores and a CPU-time limit instead of wall time). The CPAchecker configuration in this comparison is the one with continuously-refined invariants and both invariant generators (

). Table  2 gives the results; Fig.  4 shows the quantile functions of the accumulated scores for each configuration. The results for Cbmc are not competitive, which may be attributed to the experimental nature of its \(k\) -induction support.

Score. CPAchecker in configuration

successfully verifies almost 500 tasks (20 %) more than Esbmc . Furthermore, it has only 1 missed bug, which is related to unsoundness in the handling of some C features, whereas Esbmc has more than 150 wrong safety proofs. This large number of wrong results must be attributed to the unsound heuristic of Esbmc for strengthening the induction hypothesis, where it retains potentially incorrect information about loop-modified variables [ 35 ]. We have previously also implemented this approach in CPAchecker and obtained similar results [ 8 ]. The large number of wrong proofs reduces the confidence in the soundness of the correct proofs. Consequently, the score achieved by CPAchecker in configuration

is much higher than the score of Esbmc (4 282 compared to 1 674 points). This clear advantage is also visible in Fig.  4 . The parallel version of Esbmc performs somewhat better than its sequential version, and misses fewer bugs. This is due to the fact that the base case and the step case are performed in parallel, and the loop bound  k is incremented independently for each of them. The base case is usually easier to solve for the SMT solver, and thus the base-case checks proceed faster than the step-case checks (reaching a higher value of  k sooner). Therefore, the parallel version manages to find some bugs by reaching the relevant  k in the base-case checks earlier than in the step-case checks, which would produce a wrong safety proof at reaching  k . However, the number of wrong proofs is still much higher than with our approach, which is conceptually sound. Thus, the score of the new, sound approach is more than 2 500 points higher.

Performance. Table  2 shows that our approach needs only 10 % more CPU time than the sequential version of Esbmc for solving a much higher number of tasks, and even needs less CPU and wall time than the parallel version of Esbmc . This indicates that due to our invariants, we succeed more often with fewer loop unrollings, and thus in less time. It also shows that the effort invested for generating the invariants is well spent.

Final Value of k. The bottom of Table  2 contains some statistics on the final value of  k that was needed to verify a program. The table shows that for safe programs, CPAchecker needs a loop bound that is (on average) only about one third of the loop bound that Esbmc needs. This advantage is due to the use of generated invariants, which make the induction proofs easier and likely to succeed with a smaller number of  k . The verification task array_true-unreach-call2.i is solved by Esbmc after completely unwinding the loop, therefore reaching the large k -value  \(2\,048\) . In the parallel version, the (quicker) detached base case hits this bound while the inductive step case is still at \(k=1\,952\) .

Comparison with Other Approaches. We also compare our combination of \(k\) -induction with continuously-refined invariants with other common approaches for software verification. We use for comparison two analyses based on CEGAR, a predicate analysis [ 13 ] and a value analysis [ 14 ]. Both are implemented in CPAchecker , which allows us to compare the approaches inside the same tool, using the same run-time environment, SMT solver, etc., and focus only on the conceptual differences between the analyses.

Figure 5 shows a quantile plot to compare the configuration

4 Conclusion

We have presented the novel idea of injecting invariants into \(k\) -induction that are generated using data-flow analysis with dynamic precision adjustment, and contribute a publicly available implementation of our idea within the software-verification framework CPAchecker . Our extensive experiments show that the new approach outperforms all existing implementations of \(k\) -induction for software verification, and that it is competitive compared to other, more mature techniques for software verification. We showed that a sound, effective, and efficient \(k\) -induction approach to general-purpose software verification is possible, and that the additional resources required to achieve these combined benefits are negligible if invested judiciously. At the same time, there is still room for improvement of our technique. An interesting improvement would be to add an information flow between the two cooperating algorithms in the reverse direction. If the \(k\) -induction procedure could tell the invariant generation which facts it misses to prove safety, this could lead to a more efficient and effective approach to generate invariants that are specifically tailored to the needs of the \(k\) -induction proof. Already now, CPAchecker is parsimonious in terms of unrollings, compared to other tools. The low k -values required to prove many programs show that even our current invariant generation is powerful enough to produce invariants that are strong enough to help cut down the necessary number of loop unrollings. \(k\) -induction-guided precision refinement might direct the invariant generation towards providing weaker but still useful invariants for \(k\) -induction more efficiently.

http://www.sosy-lab.org/~dbeyer/cpa-k-induction/

(successfully evaluated by the CAV 2015 Artifact Evaluation Committee)

https://github.com/dbeyer/benchexec

http://cpachecker.sosy-lab.org

Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)

Google Scholar  

Awedh, M., Somenzi, F.: Automatic invariant strengthening to prove properties in bounded model checking. In: Proceedings of DAC, pp. 1073–1076. ACM/IEEE (2006)

Ball, T., Cook, B., Levin, V., Rajamani, S.K.: SLAM and static driver verifier: technology transfer of formal methods inside microsoft. In: Proceedings of IFM, LNCS, vol. 2999, pp. 1–20. Springer (2004)

Ball, T., Levin, V., Rajamani, S.K.: A decade of software model checking with SLAM. Commun. ACM 54 (7), 68–76 (2011)

Article   Google Scholar  

Barnett, M., Leino, K.R.M.: Weakest-precondition of unstructured programs. In: Proceedings of PASTE, pp. 82–87. ACM (2005)

Beyer, D.: Second competition on software verification. In: Proceedings of TACAS, LNCS, vol. 7795, pp. 594–609. Springer (2013)

Beyer, D.: Software verification and verifiable witnesses. In: Proceedings of TACAS, LNCS, vol. 9035, pp. 401–416. Springer (2015)

Beyer, D., Dangl, M., Wendler, P.: Combining k-induction with continuously-refined invariants. Technical Report MIP-1503, University of Passau, January 2015. arXiv:1502.00096

Beyer, D., Henzinger, T.A., Majumdar, R., Rybalchenko, A.: Invariant synthesis for combined theories. In: Proceedings of VMCAI, LNCS, vol. 4349, pp. 378–394. Springer (2007)

Beyer, D., Henzinger, T.A., Majumdar, R., Rybalchenko, A.: Path invariants. In: Procedings of PLDI, pp. 300–309. ACM (2007)

Beyer, D., Henzinger, T.A., Théoduloz, G.: Program analysis with dynamic precision adjustment. In: Proceedings of ASE, pp. 29–38. IEEE (2008)

Beyer, D., Keremoglu, M.: CPAchecker : A tool for configurable software verification. In: Proceedings of CAV, LNCS, vol. 6806, pp. 184–190. Springer (2011)

Beyer, D., Keremoglu, M.E., Wendler, P.: Predicate abstraction with adjustable-block encoding. In: Proceedings of FMCAD, pp. 189–197. FMCAD (2010)

Beyer, D., Löwe, S.: Explicit-state software model checking based on CEGAR and interpolation. In: Proceedings of FASE, LNCS, vol. 7793, pp. 146–162. Springer (2013)

Biere, A.: Handbook of Satisfiability. IOS Press, Amsterdam (2009)

MATH   Google Scholar  

Biere, A., Cimatti, A., Clarke, E.M., Strichman, O., Zhu, Y.: Bounded model checking. Adv. Comput. 58 , 117–148 (2003)

Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs. In: Proceedings of TACAS, LNCS, vol. 1579, pp. 193–207. Springer (1999)

Bjørner, N., Browne, A., Manna, Z.: Automatic generation of invariants and intermediate assertions. Theor. Comput. Sci. 173 (1), 49–87 (1997)

Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X.: A static analyzer for large safety-critical software. In: Proceedings of PLDI, pp. 196–207. ACM (2003)

Bradley, A.R., Manna, Z.: Property-directed incremental invariant generation. FAC 20 (4–5), 379–405 (2008)

Christ, J., Hoenicke, J., Nutz, A.: SMTInterpol: An interpolating SMT solver. In: Proceedings of SPIN, LNCS, vol. 7385, pp. 248–254. Springer (2012)

Cordeiro, L., Fischer, B., Silva, J.P.M.: SMT-based bounded model checking for embedded ANSI-C software. In: Proceedings of ASE, pp. 137–148. IEEE (2009)

Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Procedings of POPL, pp. 84–96 (1978)

Donaldson, A.F., Haller, L., Kroening, D.: Strengthening induction-based race checking with lightweight static analysis. In: Proceedings of VMCAI, LNCS, vol. 6538, pp. 169–183. Springer, Heidelberg (2011)

Donaldson, A.F., Haller, L., Kroening, D., Rümmer, P.: Software verification using k -induction. In: Proceeding of Static Analysis. LNCS, vol. 6887, pp. 351–368. Springer (2011)

Donaldson, A.F., Kroening, D., Rümmer, P.: Automatic analysis of scratch-pad memory code for heterogeneous multicore processors. In: Proceedings of TACAS, LNCS, vol. 6015, pp. 280–295. Springer (2010)

Donaldson, A.F., Kröning, D., Rümmer, P.: Automatic analysis of DMA races using model checking and k-induction. FMSD 39 (1), 83–113 (2011)

Garoche, P.-L., Kahsai, T., Tinelli, C.: Incremental invariant generation using logic-based automatic abstract transformers. In: Proceedings of NFM, LNCS, vol. 7871, pp. 139–154. Springer (2013)

Große, D., Le, H.M., Drechsler, R.: Proving transaction and system-level properties of untimed SystemC TLM designs. In: Proceedings of MEMOCODE, pp. 113–122. IEEE (2010)

Gupta, A., Rybalchenko, A.: InvGen: an efficient invariant generator. In: Proceedings of CAV, LNCS, vol. 5643, pp. 634–640. Springer (2009)

Albarghouthi, A., Gurfinkel, A., Li, Y., Chaki, S., Chechik, M.: UFO: verification with interpolants and abstract interpretation. In: Proceedings of TACAS, LNCS, vol. 7795, pp. 637–640. Springer (2013)

Kahsai, T., Ge, Y., Tinelli, C.: Instantiation-based invariant discovery. In: Proceedings of NFM, LNCS, vol. 6617, pp. 192–206. Springer (2011)

Kahsai, T., Tinelli, C.: Pkind: a parallel k-induction based model checker. In: Proceedings of International Workshop on Parallel and Distributed Methods in Verification, EPTCS 72, pp. 55–62 (2011)

Khoroshilov, A., Mutilin, V., Petrenko, A., Zakharov, V.: Establishing linux driver verification process. In: Proceedings of PSI, LNCS, vol. 5947, pp. 165–176. Springer (2010)

Morse, J., Cordeiro, L., Nicole, D., Fischer, B.: Handling unbounded loops with ESBMC 1.20. In: Proceedings of TACAS, LNCS, vol. 7795, pp. 619–622. Springer (2013)

Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Scalable analysis of linear systems using mathematical programming. In: Proceedings of VMCAI, LNCS, vol. 3385, pp. 25–41. Springer (2005)

Sheeran, M., Singh, S., Stålmarck, G.: Checking safety properties using induction and a SAT-solver. In: Proceedings of FMCAD, LNCS, vol. 1954, pp. 108–125. Springer (2000)

Wahl, T.: The k-induction principle (2013). http://www.ccs.neu.edu/home/wahl/Publications/k-induction.pdf

Download references

Acknowledgments

We thank M. Tautschnig and L. Cordeiro for explaining the optimal available parameters for k -induction, for the verifiers Cbmc and Esbmc , respectively.

Author information

Authors and affiliations.

University of Passau, Passau, Germany

Dirk Beyer, Matthias Dangl & Philipp Wendler

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

University of Oxford, Oxford, United Kingdom

Daniel Kroening

Carnegie Mellon University, Moffett Field, California, USA

Corina S. Păsăreanu

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper.

Beyer, D., Dangl, M., Wendler, P. (2015). Boosting k -Induction with Continuously-Refined Invariants. In: Kroening, D., Păsăreanu, C. (eds) Computer Aided Verification. CAV 2015. Lecture Notes in Computer Science(), vol 9206. Springer, Cham. https://doi.org/10.1007/978-3-319-21690-4_42

Download citation

DOI : https://doi.org/10.1007/978-3-319-21690-4_42

Published : 16 July 2015

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-21689-8

Online ISBN : 978-3-319-21690-4

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

cells-logo

Article Menu

hypothesis generation induction

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Graft-specific regulatory t cells for long-lasting, local tolerance induction.

hypothesis generation induction

1. Introduction

2. materials and methods, 2.2. generation of allospecific tregs, 2.3. retrovirus production and cell transduction, 2.4. suppression assay, 2.5. adoptive cell transfer, 2.6. depleting antibodies, 2.7. skin transplantation, 2.8. elispot assay, 2.9. histology, 2.10. data analysis, 3.1. allospecific ctregs prolonged allograft survival, 3.2. local accumulation and long-term survival of tregs restrain graft rejection, 3.3. combination of anti-thy1.2 antibodies with rapamycin-enabled tolerance induction by treg therapy, 4. discussion, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest, abbreviations.

ATGanti-thymocyte globulin
GvHDgraft-versus-host disease
MLRmixed lymphocyte reaction
MOImultiplicity of infection
SOTsolid organ transplantation
Tregsregulatory T cells
Teffseffector T cells
cTregsconverted Tregs
nTregsnatural Tregs
  • Roemhild, A.; Otto, N.M.; Moll, G.; Abou-El-Enein, M.; Kaiser, D.; Bold, G.; Schachtner, T.; Choi, M.; Oellinger, R.; Landwehr-Kenzel, S.; et al. Regulatory T cells for minimising immune suppression in kidney transplantation: Phase I/IIa clinical trial. BMJ 2020 , 371 , m3734. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lowsky, R.; Strober, S. Combined kidney and hematopoeitic cell transplantation to induce mixed chimerism and tolerance. Bone Marrow Transpl. 2019 , 54 (Suppl. S2), 793–797. [ Google Scholar ] [ CrossRef ]
  • Sato, Y.; Passerini, L.; Piening, B.D.; Uyeda, M.J.; Goodwin, M.; Gregori, S.; Snyder, M.P.; Bertaina, A.; Roncarolo, M.G.; Bacchetta, R. Human-engineered Treg-like cells suppress FOXP3-deficient T cells but preserve adaptive immune responses in vivo. Clin. Transl. Immunol. 2020 , 9 , e1214. [ Google Scholar ] [ CrossRef ]
  • Leventhal, J.R.; Mathew, J.M. Outstanding questions in transplantation: Tolerance. Am. J. Transpl. 2020 , 20 , 348–354. [ Google Scholar ] [ CrossRef ]
  • Massart, A.; Ghisdal, L.; Abramowicz, M.; Abramowicz, D. Operational tolerance in kidney transplantation and associated biomarkers. Clin. Exp. Immunol. 2017 , 189 , 138–157. [ Google Scholar ] [ CrossRef ]
  • Levitsky, J.; Burrell, B.E.; Kanaparthi, S.; Turka, L.A.; Kurian, S.; Sanchez-Fueyo, A.; Lozano, J.J.; Demetris, A.; Lesniak, A.; Kirk, A.D.; et al. Immunosuppression Withdrawal in Liver Transplant Recipients on Sirolimus. Hepatology 2020 , 72 , 569–583. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kirk, A.D.; Adams, A.B.; Durrbach, A.; Ford, M.L.; Hildeman, D.A.; Larsen, C.P.; Vincenti, F.; Wojciechowski, D.; Woodle, E.S. Optimization of de novo belatacept-based immunosuppression administered to renal transplant recipients. Am. J. Transpl. 2021 , 21 , 1691–1698. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Shaked, A.; DesMarais, M.R.; Kopetskie, H.; Feng, S.; Punch, J.D.; Levitsky, J.; Reyes, J.; Klintmalm, G.B.; Demetris, A.J.; Burrell, B.E.; et al. Outcomes of immunosuppression minimization and withdrawal early after liver transplantation. Am. J. Transpl. 2019 , 19 , 1397–1409. [ Google Scholar ] [ CrossRef ]
  • Ferreira, L.M.R.; Muller, Y.D.; Bluestone, J.A.; Tang, Q. Next-generation regulatory T cell therapy. Nat. Rev. Drug Discov. 2019 , 18 , 749–769. [ Google Scholar ] [ CrossRef ]
  • Wing, J.B.; Tanaka, A.; Sakaguchi, S. Human FOXP3(+) Regulatory T Cell Heterogeneity and Function in Autoimmunity and Cancer. Immunity 2019 , 50 , 302–316. [ Google Scholar ] [ CrossRef ]
  • Kumar, P.; Saini, S.; Khan, S.; Surendra Lele, S.; Prabhakar, B.S. Restoring self-tolerance in autoimmune diseases by enhancing regulatory T-cells. Cell Immunol. 2019 , 339 , 41–49. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Noyan, F.; Zimmermann, K.; Hardtke-Wolenski, M.; Knoefel, A.; Schulde, E.; Geffers, R.; Hust, M.; Huehn, J.; Galla, M.; Morgan, M.; et al. Prevention of Allograft Rejection by Use of Regulatory T Cells With an MHC-Specific Chimeric Antigen Receptor. Am. J. Transpl. 2017 , 17 , 917–930. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Haudebourg, T.; Poirier, N.; Vanhove, B. Depleting T-cell subpopulations in organ transplantation. Transpl. Int. 2009 , 22 , 509–518. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Joffre, O.; Santolaria, T.; Calise, D.; Al Saati, T.; Hudrisier, D.; Romagnoli, P.; van Meerwijk, J.P. Prevention of acute and chronic allograft rejection with CD4 + CD25 + Foxp3 + regulatory T lymphocytes. Nat. Med. 2008 , 14 , 88–92. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lee, K.; Nguyen, V.; Lee, K.M.; Kang, S.M.; Tang, Q. Attenuation of donor-reactive T cells allows effective control of allograft rejection using regulatory T cell therapy. Am. J. Transpl. 2014 , 14 , 27–38. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Yamashita, K.; Goto, R.; Zaitsu, M.; Nagatsu, A.; Oura, T.; Watanabe, M.; Aoyagi, T.; Suzuki, T.; Shimamura, T.; Kamiyama, T.; et al. Inducion of Operational Tolerance by a Cell Therapy Using Donor Ag-Pulsed Tregs in Living Donor Liver Transplantation. Transplantation 2014 , 98 , 6. [ Google Scholar ] [ CrossRef ]
  • Masteller, E.L.; Warner, M.R.; Tang, Q.; Tarbell, K.V.; McDevitt, H.; Bluestone, J.A. Expansion of functional endogenous antigen-specific CD4 + CD25 + regulatory T cells from nonobese diabetic mice. J. Immunol. 2005 , 175 , 3053–3059. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Noyan, F.; Lee, Y.S.; Zimmermann, K.; Hardtke-Wolenski, M.; Taubert, R.; Warnecke, G.; Knoefel, A.K.; Schulde, E.; Olek, S.; Manns, M.P.; et al. Isolation of human antigen-specific regulatory T cells with high suppressive function. Eur. J. Immunol. 2014 , 44 , 2592–2602. [ Google Scholar ] [ CrossRef ]
  • Hoffmann, P.; Boeld, T.J.; Eder, R.; Huehn, J.; Floess, S.; Wieczorek, G.; Olek, S.; Dietmaier, W.; Andreesen, R.; Edinger, M. Loss of FOXP3 expression in natural human CD4( + )CD25( + ) regulatory T cells upon repetitive in vitro stimulation. Eur. J. Immunol. 2009 , 39 , 1088–1097. [ Google Scholar ] [ CrossRef ]
  • Zhou, X.; Bailey-Bucktrout, S.L.; Jeker, L.T.; Penaranda, C.; Martinez-Llordella, M.; Ashby, M.; Nakayama, M.; Rosenthal, W.; Bluestone, J.A. Instability of the transcription factor Foxp3 leads to the generation of pathogenic memory T cells in vivo. Nat. Immunol. 2009 , 10 , 1000–1007. [ Google Scholar ] [ CrossRef ]
  • Huynh, A.; DuPage, M.; Priyadharshini, B.; Sage, P.T.; Quiros, J.; Borges, C.M.; Townamchai, N.; Gerriets, V.A.; Rathmell, J.C.; Sharpe, A.H.; et al. Control of PI(3) kinase in Treg cells maintains homeostasis and lineage stability. Nat. Immunol. 2015 , 16 , 188–196. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Miyao, T.; Floess, S.; Setoguchi, R.; Luche, H.; Fehling, H.J.; Waldmann, H.; Huehn, J.; Hori, S. Plasticity of Foxp3( + ) T cells reflects promiscuous Foxp3 expression in conventional T cells but not reprogramming of regulatory T cells. Immunity 2012 , 36 , 262–275. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sakaguchi, S.; Vignali, D.A.; Rudensky, A.Y.; Niec, R.E.; Waldmann, H. The plasticity and stability of regulatory T cells. Nat. Rev. Immunol. 2013 , 13 , 461–467. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Newton, R.; Priyadharshini, B.; Turka, L.A. Immunometabolism of regulatory T cells. Nat. Immunol. 2016 , 17 , 618–625. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tenspolde, M.; Zimmermann, K.; Weber, L.C.; Hapke, M.; Lieber, M.; Dywicki, J.; Frenzel, A.; Hust, M.; Galla, M.; Buitrago-Molina, L.E.; et al. Regulatory T cells engineered with a novel insulin-specific chimeric antigen receptor as a candidate immunotherapy for type 1 diabetes. J. Autoimmun. 2019 , 103 , 102289. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Saetzler, V.; Riet, T.; Schienke, A.; Henschel, P.; Freitag, K.; Haake, A.; Heppner, F.L.; Buitrago-Molina, L.E.; Noyan, F.; Jaeckel, E.; et al. Development of Beta-Amyloid-Specific CAR-Tregs for the Treatment of Alzheimer’s Disease. Cells 2023 , 12 , 2115. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pieper, T.; Roth, K.D.R.; Glaser, V.; Riet, T.; Buitrago-Molina, L.E.; Hagedorn, M.; Lieber, M.; Hust, M.; Noyan, F.; Jaeckel, E.; et al. Generation of Chimeric Antigen Receptors against Tetraspanin 7. Cells 2023 , 12 , 1453. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Henschel, P.; Landwehr-Kenzel, S.; Engels, N.; Schienke, A.; Kremer, J.; Riet, T.; Redel, N.; Iordanidis, K.; Saetzler, V.; John, K.; et al. Supraphysiological FOXP3 expression in human CAR-Tregs results in improved stability, efficacy, and safety of CAR-Treg products for clinical application. J. Autoimmun. 2023 , 138 , 103057. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • McGovern, J.; Holler, A.; Thomas, S.; Stauss, H.J. Forced Fox-P3 expression can improve the safety and antigen-specific function of engineered regulatory T cells. J. Autoimmun. 2022 , 132 , 102888. [ Google Scholar ] [ CrossRef ]
  • Nikolouli, E.; Hardtke-Wolenski, M.; Hapke, M.; Beckstette, M.; Geffers, R.; Floess, S.; Jaeckel, E.; Huehn, J. Alloantigen-Induced Regulatory T Cells Generated in Presence of Vitamin C Display Enhanced Stability of Foxp3 Expression and Promote Skin Allograft Acceptance. Front. Immunol. 2017 , 8 , 748. [ Google Scholar ] [ CrossRef ]
  • Hardtke-Wolenski, M.; Taubert, R.; Noyan, F.; Sievers, M.; Dywicki, J.; Schlue, J.; Falk, C.S.; Ardesjo Lundgren, B.; Scott, H.S.; Pich, A.; et al. Autoimmune hepatitis in a murine autoimmune polyendocrine syndrome type 1 model is directed against multiple autoantigens. Hepatology 2015 , 61 , 1295–1305. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sagoo, P.; Lombardi, G.; Lechler, R.I. Relevance of regulatory T cell promotion of donor-specific tolerance in solid organ transplantation. Front. Immunol. 2012 , 3 , 184. [ Google Scholar ] [ CrossRef ]
  • Popow, I.; Leitner, J.; Grabmeier-Pfistershammer, K.; Majdic, O.; Zlabinger, G.J.; Kundi, M.; Steinberger, P. A comprehensive and quantitative analysis of the major specificities in rabbit antithymocyte globulin preparations. Am. J. Transpl. 2013 , 13 , 3103–3113. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hester, J.; Schiopu, A.; Nadig, S.N.; Wood, K.J. Low-dose rapamycin treatment increases the ability of human regulatory T cells to inhibit transplant arteriosclerosis in vivo. Am. J. Transpl. 2012 , 12 , 2008–2016. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fraser, H.; Safinia, N.; Grageda, N.; Thirkell, S.; Lowe, K.; Fry, L.J.; Scotta, C.; Hope, A.; Fisher, C.; Hilton, R.; et al. A Rapamycin-Based GMP-Compatible Process for the Isolation and Expansion of Regulatory T Cells for Clinical Trials. Mol. Ther. Methods Clin. Dev. 2018 , 8 , 198–209. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Juneja, T.; Kazmi, M.; Mellace, M.; Saidi, R.F. Utilization of Treg Cells in Solid Organ Transplantation. Front. Immunol. 2022 , 13 , 746889. [ Google Scholar ] [ CrossRef ]
  • Brennan, T.V.; Tang, Q.; Liu, F.C.; Hoang, V.; Bi, M.; Bluestone, J.A.; Kang, S.M. Requirements for prolongation of allograft survival with regulatory T cell infusion in lymphosufficient hosts. J. Surg. Res. 2011 , 169 , e69–e75. [ Google Scholar ] [ CrossRef ]
  • Cabello-Kindelan, C.; Mackey, S.; Sands, A.; Rodriguez, J.; Vazquez, C.; Pugliese, A.; Bayer, A.L. Immunomodulation Followed by Antigen-Specific Treg Infusion Controls Islet Autoimmunity. Diabetes 2020 , 69 , 215–227. [ Google Scholar ] [ CrossRef ]
  • Bashuda, H.; Kimikawa, M.; Seino, K.; Kato, Y.; Ono, F.; Shimizu, A.; Yagita, H.; Teraoka, S.; Okumura, K. Renal allograft rejection is prevented by adoptive transfer of anergic T cells in nonhuman primates. J. Clin. Investig. 2005 , 115 , 1896–1902. [ Google Scholar ] [ CrossRef ]
  • Todo, S.; Yamashita, K.; Goto, R.; Zaitsu, M.; Nagatsu, A.; Oura, T.; Watanabe, M.; Aoyagi, T.; Suzuki, T.; Shimamura, T.; et al. A pilot study of operational tolerance with a regulatory T-cell-based cell therapy in living donor liver transplantation. Hepatology 2016 , 64 , 632–643. [ Google Scholar ] [ CrossRef ]
  • Wagner, J.C.; Ronin, E.; Ho, P.; Peng, Y.; Tang, Q. Anti-HLA-A2-CAR Tregs prolong vascularized mouse heterotopic heart allograft survival. Am. J. Transpl. 2022 , 22 , 2237–2245. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Abbas, A.K.; Trotta, E.; Simeonov, D.R.; Marson, A.; Bluestone, J.A. Revisiting IL-2: Biology and therapeutic prospects. Sci. Immunol. 2018 , 3 , eaat1482. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Siepert, A.; Ahrlich, S.; Vogt, K.; Appelt, C.; Stanko, K.; Kuhl, A.; van den Brandt, J.; Reichardt, H.M.; Nizze, H.; Lehmann, M.; et al. Permanent CNI treatment for prevention of renal allograft rejection in sensitized hosts can be replaced by regulatory T cells. Am. J. Transpl. 2012 , 12 , 2384–2394. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tang, Q.; Leung, J.; Peng, Y.; Sanchez-Fueyo, A.; Lozano, J.J.; Lam, A.; Lee, K.; Greenland, J.R.; Hellerstein, M.; Fitch, M.; et al. Selective decrease of donor-reactive T(regs) after liver transplantation limits T(reg) therapy for promoting allograft tolerance in humans. Sci. Transl. Med. 2022 , 14 , eabo2628. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tsang, J.Y.; Tanriver, Y.; Jiang, S.; Xue, S.A.; Ratnasothy, K.; Chen, D.; Stauss, H.J.; Bucy, R.P.; Lombardi, G.; Lechler, R. Conferring indirect allospecificity on CD4+CD25+ Tregs by TCR gene transfer favors transplantation tolerance in mice. J. Clin. Investig. 2008 , 118 , 3619–3628. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hull, C.M.; Nickolay, L.E.; Estorninho, M.; Richardson, M.W.; Riley, J.L.; Peakman, M.; Maher, J.; Tree, T.I. Generation of human islet-specific regulatory T cells by TCR gene transfer. J. Autoimmun. 2017 , 79 , 63–73. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Komatsu, N.; Okamoto, K.; Sawa, S.; Nakashima, T.; Oh-hora, M.; Kodama, T.; Tanaka, S.; Bluestone, J.A.; Takayanagi, H. Pathogenic conversion of Foxp3+ T cells into TH17 cells in autoimmune arthritis. Nat. Med. 2014 , 20 , 62–68. [ Google Scholar ] [ CrossRef ]
  • Herzog, R.W.; Kuteyeva, V.; Saboungi, R.; Terhorst, C.; Biswas, M. Reprogrammed CD4( + ) T Cells That Express FoxP3( + ) Control Inhibitory Antibody Formation in Hemophilia A Mice. Front. Immunol. 2019 , 10 , 274. [ Google Scholar ] [ CrossRef ]
  • Passerini, L.; Rossi Mel, E.; Sartirana, C.; Fousteri, G.; Bondanza, A.; Naldini, L.; Roncarolo, M.G.; Bacchetta, R. CD4( + ) T cells from IPEX patients convert into functional and stable regulatory T cells by FOXP3 gene transfer. Sci. Transl. Med. 2013 , 5 , 215ra174. [ Google Scholar ] [ CrossRef ]
  • Newrzela, S.; Cornils, K.; Li, Z.; Baum, C.; Brugman, M.H.; Hartmann, M.; Meyer, J.; Hartmann, S.; Hansmann, M.L.; Fehse, B.; et al. Resistance of mature T cells to oncogene transformation. Blood 2008 , 112 , 2278–2286. [ Google Scholar ] [ CrossRef ]
  • Rosenberg, S.A.; Restifo, N.P.; Yang, J.C.; Morgan, R.A.; Dudley, M.E. Adoptive cell transfer: A clinical path to effective cancer immunotherapy. Nat. Rev. Cancer 2008 , 8 , 299–308. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sahu, B.; Pihlajamaa, P.; Zhang, K.; Palin, K.; Ahonen, S.; Cervera, A.; Ristimaki, A.; Aaltonen, L.A.; Hautaniemi, S.; Taipale, J. Human cell transformation by combined lineage conversion and oncogene expression. Oncogene 2021 , 40 , 5533–5547. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kremer, J.; Henschel, P.; Simon, D.; Riet, T.; Falk, C.; Hardtke-Wolenski, M.; Wedemeyer, H.; Noyan, F.; Jaeckel, E. Membrane-bound IL-2 improves the expansion, survival, and phenotype of CAR Tregs and confers resistance to calcineurin inhibitors. Front. Immunol. 2022 , 13 , 1005582. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Porter, D.L.; Levine, B.L.; Kalos, M.; Bagg, A.; June, C.H. Chimeric antigen receptor-modified T cells in chronic lymphoid leukemia. New Engl. J. Med. 2011 , 365 , 725–733. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Nobles, C.L.; Sherrill-Mix, S.; Everett, J.K.; Reddy, S.; Fraietta, J.A.; Porter, D.L.; Frey, N.; Gill, S.I.; Grupp, S.A.; Maude, S.L.; et al. CD19-targeting CAR T cell immunotherapy outcomes correlate with genomic modification by vector integration. J. Clin. Investig. 2020 , 130 , 673–685. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Boardman, D.A.; Philippeos, C.; Fruhwirth, G.O.; Ibrahim, M.A.; Hannen, R.F.; Cooper, D.; Marelli-Berg, F.M.; Watt, F.M.; Lechler, R.I.; Maher, J.; et al. Expression of a Chimeric Antigen Receptor Specific for Donor HLA Class I Enhances the Potency of Human Regulatory T Cells in Preventing Human Skin Transplant Rejection. Am. J. Transpl. 2017 , 17 , 931–943. [ Google Scholar ] [ CrossRef ]
  • MacDonald, K.G.; Hoeppli, R.E.; Huang, Q.; Gillies, J.; Luciani, D.S.; Orban, P.C.; Broady, R.; Levings, M.K. Alloantigen-specific regulatory T cells generated with a chimeric antigen receptor. J. Clin. Investig. 2016 , 126 , 1413–1424. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Seltrecht, N.; Hardtke-Wolenski, M.; Iordanidis, K.; Jonigk, D.; Galla, M.; Schambach, A.; Buitrago-Molina, L.E.; Wedemeyer, H.; Noyan, F.; Jaeckel, E. Graft-Specific Regulatory T Cells for Long-Lasting, Local Tolerance Induction. Cells 2024 , 13 , 1216. https://doi.org/10.3390/cells13141216

Seltrecht N, Hardtke-Wolenski M, Iordanidis K, Jonigk D, Galla M, Schambach A, Buitrago-Molina LE, Wedemeyer H, Noyan F, Jaeckel E. Graft-Specific Regulatory T Cells for Long-Lasting, Local Tolerance Induction. Cells . 2024; 13(14):1216. https://doi.org/10.3390/cells13141216

Seltrecht, Nadja, Matthias Hardtke-Wolenski, Konstantinos Iordanidis, Danny Jonigk, Melanie Galla, Axel Schambach, Laura Elisa Buitrago-Molina, Heiner Wedemeyer, Fatih Noyan, and Elmar Jaeckel. 2024. "Graft-Specific Regulatory T Cells for Long-Lasting, Local Tolerance Induction" Cells 13, no. 14: 1216. https://doi.org/10.3390/cells13141216

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

COMMENTS

  1. Hypothesis

    That is, such a process of hypothesis generation itself is induction. However, deduction is used in the process of applying mathematical induction to the proof of a mathematical proposition. In other words, mathematical induction as well as its preceding hypothesis generation can be viewed as a special form of hypothetico-deductive method ...

  2. Data-Driven Hypothesis Generation in Clinical Research: What We Learned

    Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study ...

  3. Automating psychological hypothesis generation with AI: when large

    Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We ...

  4. The Research Hypothesis: Role and Construction

    Hypothesis Generation: Modes of Inference. There is a paucity of empirical data regarding the way (or ways) in which hypotheses are formulated by scientists and even less information about whether these methods vary across disciplines. ... In his essay, Deduction, Induction, Hypothesis, Peirce presents an abductive syllogism: Rule: "All the ...

  5. Strategies in Abduction: Generating and Selecting Diagnostic Hypotheses

    Induction, for Peirce, includes any use of empirical evidence to test or support a hypothesis. Abduction, by contrast, is reasoning that introduces hypotheses into inquiry in the first place: "abduction commits us to nothing. ... When hypothesis generation is discussed in the medical literature, it is done primarily within the framework of ...

  6. On the Logic of Hypothesis Generation

    Hypothesis generation is concerned with hypotheses that are not yet ruled out by the data, and is as such a purely logical process. Hypothesis evaluation then proceeds with further investigating the possible hypotheses in order to select one of which the predictions agree sufficiently with reality. This distinction has been inspired by Peirce ...

  7. IV The Stages of the Method ( ii ): Deduction and Induction

    "The same rule follows us into the logic of induction and hypothesis" (2.737). ... If, however, the event has not been used in the generation of the hypothesis, then it not only may be considered in evaluating the hypothesis, but must be so used. Such a prediction "virtually antecedes" the investigator's knowledge of it, since in this ...

  8. Active inductive inference in children and adults: A constructivist

    A defining aspect of being human is an ability to reason about the world by generating and adapting ideas and hypotheses. Here we explore how this ability develops by comparing children's and adults' active search and explicit hypothesis generation patterns in a task that mimics the open-ended process of scientific induction.

  9. PDF 1.0 Introdu Ov a 7Testing

    Quadrant Hypothesis Generation is used to identify a set of hypotheses when the outcome is likely to be determined by just two driving forces. The latter two techniques are particularly useful in identifying a set of ... deduction and induction. Abductive reasoning starts with a set of facts. One then develops hypotheses that, if true, would ...

  10. On the Logic of Hypothesis Generation

    It has been argued in the introductory chapter to this volume that, when dealing with non-deductive reasoning forms like abduction and induction, a distinction between hypothesis generation and ...

  11. From First Principles to Theories: Revisiting the Scientific Method

    The aim of this article is to examine how first principles are developed into general theories by reviewing the roles that abduction, deduction, and induction play in the three primary steps of the scientific method: hypothesis generation, hypothesis testing, and theory generation. Kant's democratic peace theory is first used to illustrate this process, and the example is subsequently ...

  12. Epistemological foundations of the JSM method for automatic hypothesis

    Part I considers the so-called problem of induction, viz., the history of its genesis and elaboration, and formulates the main principles and logical tools of the JSM method for automatic hypothesis generation, including JSM reasoning. JSM reasoning is demonstrated to be a synthesis of three cognitive procedures: induction, analogy, and abduction.

  13. Hypothesis Generation in Social Work Research

    Abstract. This article describes the process of generating hypotheses from empirical, qualitalive data. Arguing that a discovery oriented, qualitative method of hypothesis generation has great potential for the development of social work knowledge, the paper shows how the grounded theory method originated by Glaser and Strauss (1976) builds on both induction and deduction and develops the ...

  14. Hypothesis Generation in Biology: A Science Teaching Challenge ...

    First, a logical fallacy of induction is affirming the hypothesis without considering other explanations — there may be other hypotheses that explain the observed result. The case may simply be that females prefer to mate with short-legged males. ... Paul K. Strode "Hypothesis Generation in Biology: A Science Teaching Challenge & Potential ...

  15. PDF Grounding Compositional Hypothesis Generation in Specific Instances

    human hypothesis generation. As a result, we propose an alternative, Instance Driven Generator (IDG), that constructs bottom-up hypotheses directly out of encountered positive in-stances of a concept. Using a novel rule induction task based on the children's game Zendo, we compare these "bottom-up" and "top-down" approaches to inference.

  16. 3.6: Mathematical Induction

    1 + 2 + 3 + ⋯ + n. ∑n i = 1i2. Mathematical induction can be used to prove that an identity is valid for all integers n ≥ 1. Here is a typical example of such an identity: 1 + 2 + 3 + ⋯ + n = n(n + 1) 2. More generally, we can use mathematical induction to prove that a propositional function P(n) is true for all integers n ≥ a.

  17. Scientific hypothesis

    The generation of a hypothesis frequently is described as a creative process and is based on existing scientific knowledge, intuition, or experience. Therefore, although scientific hypotheses commonly are described as educated guesses, they actually are more informed than a guess. In addition, scientists generally strive to develop simple ...

  18. [2407.12888] Explainable Biomedical Hypothesis Generation via Retrieval

    The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG ...

  19. Hypothesis Generation and Interpretation

    Academic investigators and practitioners working on the further development and application of hypothesis generation and interpretation in big data computing, with backgrounds in data science and engineering, or the study of problem solving and scientific methods or who employ those ideas in fields like machine learning will find this book of ...

  20. Hypothesis Generation

    Hypothesis Generation. Economics should be, as a science, concerned with formulating theories of ideas and reality that produce descriptions of how to understand phenomenon and create experiences, hypotheses generation, and data which need to be proven or disproven through testing and further analyses. ... Induction or inference is the process ...

  21. Gen: Generalizing Induction Hypotheses

    Gen Generalizing Induction Hypotheses. Gen. Require Export Poly. In the previous chapter, we noticed the importance of controlling the exact form of the induction hypothesis when carrying out inductive proofs in Coq. In particular, we need to be careful about which of the assumptions we move (using intros ) from the goal to the context before ...

  22. Scientific Endeavor Flashcards

    Scientific Endeavor. Hypothesis generation by anology, induction, deduction, intuition. Click the card to flip 👆. Analogy- similiar situations lead to similar results. Induction- arises from observation of a specific phenomena. Deduction- based off better -established scientific knowledge. Intuition- statement as to what "seems right". Click ...

  23. The Scientific Endeavor: Chapter 3 Flashcards

    Study with Quizlet and memorize flashcards containing terms like Hypothesis generation by analogy:, Hypothesis generation by induction:, Hypothesis generation by deduction and more. Try Magic Notes and save time.

  24. "by induction hypothesis" or "by THE induction hypothesis"

    The wording . by induction hypothesis. is a common solecism found in many mathematical texts. The natural way to say it in English is: by the induction hypothesis.. The grammatical analysis is that one is referring to a particular hypothesis, which would require the definite article.

  25. Joe Mauer plans to pay homage to St. Paul in Hall of Fame induction

    Joe Mauer's literal storybook career, from the little league fields of St. Paul to the bright lights of Target Field, culminates on Sunday when he joins the Baseball Hall of Fame. Why it matters ...

  26. Boosting k-Induction with Continuously-Refined Invariants

    The \(k\)-induction engine is extended such that the invariants from the invariant generator are injected in each iteration to strengthen the hypothesis. The new method solves the above-mentioned problem because it (1) automatically chooses an invariant by step-wise refinement, (2) starts always with a lightweight invariant generation that is ...

  27. Cells

    Background: Solid organ transplantation is hindered by immune-mediated chronic graft dysfunction and the side effects of immunosuppressive therapy. Regulatory T cells (Tregs) are crucial for modulating immune responses post-transplantation; however, the transfer of polyspecific Tregs alone is insufficient to induce allotolerance in rodent models. Methods: To enhance the efficacy of adoptive ...