big data scientific research

Open access
Published: 14 May 2024

15 years of Big Data: a systematic literature review

Davide Tosi 1 ,
Redon Kokaj 1 &
Marco Roccetti 2

Journal of Big Data volume 11 , Article number: 73 ( 2024 ) Cite this article

6058 Accesses

4 Citations

2 Altmetric

Metrics details

Big Data is still gaining attention as a fundamental building block of the Artificial Intelligence and Machine Learning world. Therefore, a lot of effort has been pushed into Big Data research in the last 15 years. The objective of this Systematic Literature Review is to summarize the current state of the art of the previous 15 years of research about Big Data by providing answers to a set of research questions related to the main application domains for Big Data analytics; the significant challenges and limitations researchers have encountered in Big Data analysis, and emerging research trends and future directions in Big Data. The review follows a predefined procedure that automatically searches five well-known digital libraries. After applying the selection criteria to the results, 189 primary studies were identified as relevant, of which 32 were Systematic Literature Reviews. Required information was extracted from the 32 studies and summarized. Our Systematic Literature Review sketched the picture of 15 years of research in Big Data, identifying application domains, challenges, and future directions in this research field. We believe that a substantial amount of work remains to be done to align and seamlessly integrate Big Data into data-driven advanced software solutions of the future.

Introduction

Over the past 15 years, Big Data has emerged as a foundational pillar providing support to an extensive range of different scientific fields, from medicine and healthcare [ 1 ] to engineering [ 2 ], finance and marketing [ 3 , 4 , 5 ], politics [ 6 ], social networks analysis [ 7 , 8 ], and telecommunications [ 9 ], to cite only a few examples. This 15-year period has witnessed a significant increase in research efforts aimed at unraveling the major problems in Big Data, with an almost innumerable array of potential solutions and data sources [ 10 , 11 , 12 , 13 ]. This has resulted in a boundless world of scientific papers that, in the end, have demonstrated the twofold, ambivalent nature of Big Data. On one side, in fact, we have had a confirmation of the pivotal role played by this scientific field in shaping the technological advancements of our time. On the other side, an approach to the comprehension of Big Data, based on this endless universe of ten of thousand technical papers, each specializing in its specific sector, however natural it might seem, has become not sustainable because it has often made researchers confuse (or mixing) the theory (of Big Data) with the practice or use of it. We cannot ignore that there have also been numerous active attempts to describe the general landscape of Big Data through survey papers. Nonetheless, again, given the vastness of the subject, the majority of them did not shun the trap of pre-formed models and have tried to respond, as closely as possible, to the concrete requirements coming from just one sub-field or from the point of view of a few perspectives. In this complex context, to take at least one step further into the knowledge of the state of the art of Big Data research over the above-mentioned period of time, we have decided to conduct a different form of comprehensive exploration which was not biased by the specificity of some given sectors or confounded by single technical perspectives. To do that, we have adopted the methodology termed systematic literature review (SLR), as proposed by Kitchenham and Charters [ 14 ] in the field of software engineering [ 15 , 16 ]. Although SLR proceeds through a set of well-defined steps, also in this case, an initial choice has to be made regarding the most crucial parameters through which the subject of investigation should be explored. In the case of Big Data, our primary focus has been on gaining insights into the principal application domains of Big Data, unraveling the major challenges and limitations encountered by researchers in the analysis of the typically enormous datasets they manage, and unveiling the emerging trends and directions in future Big Data research.

Guided by the structured methodology imposed by SLR, we hence started with three research questions that matched the points raised before: essentially, (i) most common application domains, (ii) current research challenges and limitations, and (iii) emerging future trends and directions. From this point on, we proceeded following the SLR steps. Basically: first, we translated the three research questions above into specific search terms, through which five different digital libraries were investigated, namely: Scopus , IEEE Explore , ACM Digital Library , SpringerLink and Google Scholar . Upon completion of the search activity (detailed in the following section of this paper), 189 primary studies that matched our generic search criteria were identified. Of these 189, only 32 of these studies were actually reviews. Since the target of our study was to provide a panoramic view of this 15-year Big Data research period, (a) shedding light on the prevalent application domains, (b) highlighting the hurdles faced by researchers, and (c) finally outlining the potential trajectories for future research, we focused on the analysis of just these 32 survey studies.

With this paper, we do not want to conduct a traditional literature review on a very extensive topic like Big Data. Traditional scientific surveys can include many more studies and corresponding papers, and they are mainly built with an eye toward generalizability and inclusion rather than selectivity and relevance. As a consequence, those approaches often bring to us no much more than a mere summary of the topic of interest. SLRs, instead, start from the legitimate presumption to be more than merely a summary of a topic. In essence, they distinguish themselves from ordinary surveys of the available literature because they are specifically built to add to the identification of all publications on a topic also all the following activities: explicit formulation of a search objective, identification and description of a search procedure, definition of criteria for inclusion and exclusion of publications, literature selection, and information extraction only based on a transparent evaluation of the quality of publications. Not only this, but an SLR should also provide insightful information on the current state of research on a topic, starting from a given set of research questions and following a formal methodological procedure, designed to reduce distortions caused by an overly generous and restrictive selection of the literature, while guaranteeing the reliability of the selected publications. Hence, to pursue these objectives, an SLR should start with the definition of the criteria for determining what should be included/excluded before conducting the search. Not to mention that, typically, an SLR should be performed mainly using electronic literature databases. It should be also noticed that such a structured approach should document all the information gathered (and the steps taken as part of this process), with the aim of making the paper selection process completely visible and reproducible [ 17 ].

In the end, we know very well that a point-to-point analysis of the set of almost 320 papers from which we have started our SLR could have brought more (generic) information than that provided by the circa 30 papers finally selected by our SLR. Nonetheless, it is highly likely that this information would have been somewhat redundant, more prone to defects and personal biases, and finally, also more boring to read.

With this SLR, we aim to contribute, in a focused and structured way, to Big Data research in several ways: from one side, we provide researchers with a clear picture of how Big Data application domains changed over time; then, we highlight challenges faced by academia and industry and their evolution in the last 15 years; finally, we sketch a set of open points that researchers will take into consideration in the next future.

We can conclude that, while our collective understanding of Big Data has grown after this investigation, this analysis has underscored again the fact that in this field, a kind of optimal stability emerges in terms of research interests through the even distribution among applications domains/challenges/future trends. From one side, we observe a pervasive adoption of Big Data solutions in all everyday life domains (such as Energy [ 18 ], Smart Cities [ 9 ], and Healthcare [ 19 ].) On the other hand, researchers have spent a lot of effort managing data quality, designing and developing advanced frameworks to manage Big Data in real-time, focusing on security and privacy. However, many challenges still remain open to seamlessly integrate Big Data into data-driven advanced software solutions of the future, such as mitigating energy consumption, optimizing algorithms, increasing framework security with privacy and ethical focus, intersecting Artificial Intelligence and Machine Learning technologies, opening data sets, improving interoperability among different stakeholders, and considering societal and business changes.

The remainder of this paper is organized as follows: in Sect " Research method ", we run the SLR methodology on our Big Data use case (with the definition of our research questions, the search strategy, the inclusion/exclusion criteria, the study quality assessment questionnaire, and the data extraction from primary studies). All this is in the dual attempt to explain the abstract methodology, as well as its application in our field. Section " SLR: implementation " describes how we conducted the review and the results obtained in each stage and step of our SLR; Section " SLR: results " shows our findings, briefly summarizing each of our selected primary studies; Section " Discussion " discusses critically those findings garnering special attention in our analytical process; Section " Threats to validity " discusses the possible threats to the validity of our study; Section " Conclusion " demonstrates the conclusions we drew for our SLR.

A taxonomy of key concepts for Big Data evolution over the last 15 years is presented in Fig. 1 .

Taxonomy of Big Data evolution over the last 15 years

Research method

Research questions.

This SLR has been conducted following the procedure defined by Kitchenham and Charters. As such, in the first step, we defined the research questions (RQ) that will drive the entire review methodology.

As we define the research questions that will guide our SLR, it is crucial to establish a balance between the breadth and depth of our investigation. After careful consideration and to ensure that our review maintains a focused and meaningful scope, it has been decided to narrow down our research questions to the following three:

RQ1 : what are the most common application domains for Big Data analytics, and how have they evolved over time?

RQ2 : what are the major challenges and limitations that researchers have encountered in Big Data analysis, and how have they been addressed?

RQ3 : what are the emerging research trends and directions in Big Data that will likely shape the field in the next 5 to 10 years?

Search strategy

SLR begins by looking for relevant studies related to our research questions. To do this, we find appropriate search terms using the method outlined by Kitchenham and Charters, which suggests to consider three aspects: Population (P), Interventions (I), and Outcomes (O).

We identified the following relevant search terms for each aspect in our review:

Population : Big Data, real-time data analytics, large datasets.

Intervention : methodologies, techniques, domains, architectures, solutions.

Outcomes : research trends, future directions, emerging technologies, challenges, SLR, Systematic Literature Review.

The search string was constructed as follows:

P refers to population terms, I refers to intervention terms and O refers to outcome terms, all of which are connected through boolean operators AND and OR.

Searches string may take the exemplar form like the following:

Since we need to find and study primary studies related to our research questions, the selection of appropriate digital libraries/search engines to search for the articles needed is essential. For this reason, it has been decided to use the following state-of-the-art sources:

Scopus : a multidisciplinary database that covers a broad range of research fields.

IEEE Xplore : an invaluable resource for technology and engineering-related SLR.

ACM Digital Library : a comprehensive collection of relevant articles, conference papers, and journals focused on computer science and information technology.

SpringerLink : an extensive collection of academic articles in the fields that align closely with our research interests.

Google Scholar : a freely accessible web search engine that indexes scholarly literature across various disciplines.

We aim to ensure a comprehensive and focused literature search by utilizing these sources, thereby facilitating a thorough and methodical research.

Inclusion/Exclusion criteria

In this stage of the SLR, we need to make an accurate selection of the studies extracted. To do this, we must define some rigorous inclusion/exclusion criteria, to decide which studies are going to be useful for our purpose. To achieve this, studies were excluded based on the following criteria:

Studies published before the 15-year time frame

Studies in languages other than English

Exclude non-academic sources, including blogs, news articles, marketing materials, and reports from non-academic organizations

Studies that are only marginally related to Big Data or the specific topics within our research questions.

In conclusion, all those studies that are not cut off by the exclusion criteria above are to be considered as included. They are called “Primary Studies” (PS).

Study quality assessment

Kitchenham and Charters stresses the necessity of assessing the quality of primary studies to reduce bias and enhance the validity of the evaluation process. In our research, we employ a study quality assessment to make sure that we have only the most relevant results for our research.

To achieve this, we formulated a five question study quality questionnaire, which serves as the foundation for assessing the quality of the primary studies:

QA1 : has the primary study established a well-defined research objective?

QA2 : did the primary study comprehensively describe its research methods and data sources?

QA3 : has the technique or approach undergone a trustworthy validation?

QA4 : has the primary study effectively identified and discussed the significant challenges and limitations encountered in Big Data analysis?

QA5 : are the findings, research trends, and directions clearly presented and directly connected to the study’s objectives or goals?

Hence, we applied the formulated questionnaire to the included PSs to assess their quality. The output of this SLR stage will be discussed in Section 4.

Data extraction

The data extraction process entails gathering relevant information from the chosen primary studies to address the research questions. To facilitate this process, we have created a dedicated data extraction form, as shown in Table 1 . As suggested in Kitchenham and Charters, we used the test-retest process to check the consistency and accuracy of the extracted data with respect to the original sources. After finishing the data extraction for all the selected studies, we randomly selected 3 primary studies and performed a second extraction of the data. No inconsistencies were detected.

SLR: implementation

In this section, we describe step-by-step the implementation and execution of the different stages of our SLR. Figure 2 depicts the search stages followed and the resulting number of primary studies for each stage.

In stage 1, an automated search was performed by applying the search string to the digital libraries. The software used for the management of the references is Zotero (www.zotero.org), a popular choice for SLRs. We began the research using the following research string:

(“big data” OR “real-time data analytics” OR “large datasets”) AND (“methodologies” OR “techniques” OR “domains” OR “architectures” OR “solutions”) AND (“research trends” OR “future directions” OR “emerging technologies” OR “challenges” OR “SLR” OR “Systematic Literature Review”). As a result, we found a total of 4204 studies. The reason for this many results could be attributed mostly to the main topic of this SLR being “Big Data”, a hugely popular field, especially in the last few years.

In stage 2, we used the Zotero’s duplicate identification tool, and we found a total of 25 duplicates. Additionally, 1 duplicate was found manually, bringing the total number of results to 4178 articles.

In stage 3, studies were excluded based on the title and the language. Fortunately, all the documents were in English, so we just needed to focus on the title, eliminating what had no use for our research. This cut down the total number to 553.

In stage 4, we eliminated the articles whose abstracts had marginal or no interest at all to us. At the end of the process, 189 Primary Studies were left, 32 of which were SLRs.

To ensure the best quality possible for our SLR, we have collected generic information on all the 189 studies that passed the Primary Study check. This information is depicted in Figs. 3 and 4 . We then proceeded with an in-depth full-text review for the 32 PSs, which are the main subject of our SLR.

Stages of the applied search strategy

Figure 3 depicts the distribution per year for all the 189 studies. Our SLR focuses on the evolution of Big Data in the last 15 years. In any case, no studies before 2012 were detected. The reason for this could be attributed to the fact that before then Big Data, as a research topic, was not as popular.

Number of filtered primary studies and number of total citations

Figure 4 represents the total number of citations per year for our selected 32 Primary Studies. The graph clearly shows that the most recent studies have not been cited as much. Particularly, even though the studies released in the last two years compose about one third of our selected primary studies (11 out of 32), we can see that they have not been cited as much in comparison to the previous years. The lower citation rate may indicate that recently, researchers have focused more on understudied areas or more recent emerging trends, suggesting that the field of Big Data is currently undergoing an evolution. However, further analysis of the quality, methodology and context of these studies is necessary for more concrete conclusions.

Number of total PSs per year

For further clarity, we elaborated Table 2 to represent the chosen articles by highlighting the first author’s family name, the venue, the title of each PS, and a short introduction that highlights the main findings of each PS. Note that the ”J” indicates that the article has been published in a journal.

To better understand the influence of the selected Primary Studies over time, we created a bubble chart to show the most cited documents by aggregating the PSs with the same publication year (see Fig. 5 ). The size of each bubble is proportional to the number of citations.

Bubble chart showing the number of primary studies and total citations per year of publication

SLR: results

The study of the PSs allowed us to pinpoint exactly which research question (RQ1-RQ3) is answered by each primary study. Table 3 summarizes our findings.

As previously stated, it is important to assess the quality of each study. In subsect. " Study quality assessment ", we developed a brief questionnaire that would help us determine the quality of a primary study. Table 4 shows the results of this quality check. It uses a simple “Yes,” “No,” or “na” (used when we don’t have enough information to answer) to fill out the Quality Assessment questionnaire.

From now on, we will briefly summarize each study and its findings.

PS1—A comprehensive and systematic literature review on the Big Data management techniques in the internet of things [ 20 ]

In this article, the authors explored the Big Data management techniques applied to the internet of things. Big Data was initially applied for healthcare monitoring, smart cities, and industrial systems. Over time, with the evolution of IoT, it expanded to include broader topics: healthcare applications involved health state monitoring and predictive modeling, smart cities encompassed traffic management, energy efficiency and security, while industrial systems employed Big Data to improve scalability and security. The application landscape broadened emphasizing the importance of quality attributes such as performance, efficiency, reliability, and scalability in ensuring the success of Big Data Analytics systems in IoT across ever-evolving domains.

The challenges and open issues in Big Data Analytics within IoT span various dimensions, including centralized architectures, energy consumption in data collection, blockchain limitations, communication challenges, and diverse data features.

For future research, the exploration of AI for intelligent mobile data collection will take on a more relevant role, combining compressive sensing with AI for communication challenges and utilizing new optimization algorithms for data processing. To ensure security and privacy in IoT, Big Data Analytics could involve cryptography mechanisms, a data perception layer and a lightweight framework with AI. Addressing these challenges is essential for advancing Big Data Analytics in the evolving landscape of IoT applications.

PS2—A comprehensive review on Big Data for industries challenges and opportunities [ 21 ]

The article explores the transformative impact of Big Data Analytics in power systems, mineral industries, and manufacturing. In power systems, it revolutionizes fault detection, enables early warning systems and predicts future electricity demand, enhancing reliability and decision-making. For mineral industries, Big Data improves data storage, processing and analytics, optimizing exploration, extraction, and resource management. In manufacturing, it facilitates data-driven decision-making, comprehensive product quality assessment, and streamlined supply chain management for increased operational efficiency.

The study also highlights challenges in implementing Big Data Analytics, emphasizing the crucial need for precise data quality assessment models and secure frameworks. Machine learning and data analytics play a pivotal role in overcoming challenges, particularly in fault detection, load forecasting, and reservoir management. The call for open-source databases and integration with machine learning addresses the scarcity of datasets, reflecting challenges in maximizing Big Data’s potential.

Furthermore, the paper recommends future research trends, including advanced data quality assessment models, frameworks for high-dimensional data and solutions for secure communication. Emphasizing open-source databases and integrating machine learning promotes a collaborative and transparent approach. The call for interpretable models reflects a trend toward understanding and optimizing Big Data Analytics. Overall, these recommendations shape the future direction of Big Data applications in diverse industries.

PS3—A survey on IoT Big Data current status, 13 V’s challenges, and future directions [ 22 ]

The document delves into the landscape of Big Data Analytics, particularly exploring its integration with the Internet of Things. Application domains such as energy, healthcare, transportation, and smart cities emerge prominently. The discussion unfolds how these domains have evolved, signalling a shift towards IoT-driven intelligent applications.

Within this expansive terrain, the study identifies and elucidates 13 major challenges encapsulated by the “13 V’s”. These challenges span traditional aspects like volume, velocity, and variety, extending to less common concerns like vagueness and location-aware data processing. The document also offers innovative solutions, like edge-based processing and semantic representation, as strategies to manage these complex challenges.

In regards to the future, the document outlines emerging trends anticipated to define the Big Data landscape in the coming 5 to 10 years. These include a focus on energy-efficient data acquisition, the integration of machine learning and deep learning for advanced analytics, a strategic emphasis on edge and fog infrastructures, the evolving paradigm of multi-cloud data management, a shift towards data-oriented network addressing, and the increasing adoption of blockchain technology. These trends collectively indicate a trajectory towards more efficient, scalable, and secure practices in Big Data Analytics, particularly within the realm of IoT applications.

PS4—A systematic literature review on features of deep learning in Big Data analytics [ 23 ]

The document navigates the evolution of Big Data, emphasizing challenges and the rise of machine learning, particularly Deep Learning. Machine learning’s widespread use, observed in areas like healthcare and finance, underscores its crucial role. Even in complex data scenarios, its effectiveness is evident, as demonstrated by the U.S. Department of Homeland Security’s success in identifying threats.

Recognizing a gap in existing research, the document proposes a review focusing on Deep Learning in Big Data Analytics. The goal is to explore features like hierarchical layers and high-level abstraction. The study emphasizes Deep Learning’s strength in handling extensive datasets, its versatility, and its ability to prevent over fitting.

This exploration into Big Data’s journey underscores the central role of machine learning. The proposed review, specifically focusing on Deep Learning in Big Data Analytics, not only captures current advancements but also suggests there’s more to discover in the future where Big Data and machine learning intersect.

PS5—A systematic survey of data mining and Big Data analysis in internet of things [ 24 ]

The document navigates through diverse applications of Big Data Analytics, illustrating its transformative journey across sectors. Notably, it tracks the evolution within healthcare and finance, showcasing how Big Data has become integral to these domains over time.

Going further, the research dives into the various challenges of Big Data analysis. It identifies three main challenges: dealing with societal changes, understanding how businesses use IoT, and solving technical issues like security and connectivity. The study emphasizes the need to adapt to society’s changing needs, categorize IoT uses in business and front technical problems for effective Big Data analysis.

Moreover, the research anticipates future trends, in particular the rising importance of Big Data frameworks in handling expansive IoT-generated data. The intersection of these frameworks with data mining in the IoT domain emerges as a pivotal focus, pointing toward exciting possibilities and potential paths for future research in the realm of Big Data.

PS6—Access methods for Big Data: current status and future directions [ 25 ]

The document explores diverse applications of Big Data Analytics in research, education, urban planning, transportation, environmental modeling, energy conservation, and homeland security, emphasizing its transformative potential.

It addresses challenges like heterogeneity, scale, timeliness, privacy, and the evolving processing paradigms due to data volume surpassing computational resources.

Future directions include the need for systems handling structured and unstructured data, embedded analytics for real-time processing, innovative paradigms, application frameworks, and advanced databases ensuring transactional semantics. The research underscores the importance of tools addressing ethical, security, and privacy concerns.

PS7—An industrial Big Data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities [ 26 ]

This research introduces an innovative Big Data pipeline designed for industrial analytics in manufacturing.

The pipeline excels in integrating legacy and smart devices, ensuring cross-network communication, and adhering to open standards, marking a significant evolution in the field. The document showcases the pipeline’s ability to handle complexities, integrate older systems, ensure reliability, and scale efficiently in industrial data analytics.

The future plan involves implementing the pipeline to validate its architecture, particularly in predictive maintenance for Wind Turbines and Air Handling Units, contributing to the evolving landscape of Big Data Analytics.

PS8—Applications of Big Data in emerging management disciplines: a literature review using text mining [ 27 ]

This study explores diverse applications of Big Data Analytics across twelve emerging management domains, emphasizing their dynamic nature over time.

It addresses adoption challenges, focusing on data quality, resource management, and distinguishing between the ability and capability of organizations in using Big Data Analytics. The research underscores the thoughtful adoption of Big Data Analytics and the importance of measuring its business value comprehensively. It acknowledges the difficulty of translating insights into real-time actionable items.

Looking forward, the study proposes a framework connecting emerging management domains with conventional practices, suggesting future research areas in human resources, marketing, sales, strategy, and services. The research emphasizes the need for in-depth exploration to integrate emerging domains into established management practices, providing valuable insights for research and practical application.

PS9—Applying Big Data analytics in higher education: a systematic mapping study [ 28 ]

The document conducts a thorough exploration of Big Data Analytics (BDA) in Higher Education Institutions from 2010 to 2020. It uncovers diverse BDA applications in three domains: Educational Quality, Decision-Making Process, and Information Management.

Challenges in BDA adoption include handling large data volumes, addressing privacy concerns, and dealing with resource constraints. The study emphasizes the need for practical outcomes, automated tools, and validated frameworks.

Despite robust research interest, the field exhibits immaturity, with a prevalence of conference papers indicating an early development stage. The study calls for increased empirical research to fortify the evidence base and foster a more mature BDA integration in higher education.

PS10—Artificial intelligence approaches and mechanisms for Big Data analytics: a systematic study [ 29 ]

The SLR explores AI-driven Big Data Analytics, emphasizing machine learning, knowledge-based reasoning, decision-making algorithms, and search methods. Applications, notably in supervised learning, aim to enhance precision and efficiency but grapple with complexity and scalability issues.

Challenges encompass processing vast, heterogeneous data, ensuring system security, and addressing qualitative parameters. Fog computing emerges as a potential solution, yet security concerns remain under-explored.

Emerging trends spotlight Big Data Analytics for IoT through fog computing, the need for enhanced algorithms handling extensive data, and the necessity to address data quality issues in unstructured formats.

PS11—Bibliometric mining of research directions and trends for Big Data [ 30 ]

The research identifies key application domains, with particular focus on China, and emerging directions such as Machine Learning and Healthcare.

Navigating challenges, the study introduces a semi-automatic method, utilizing blacklists and thesauri to enhance precision in identifying research directions. This favors a balance between automation and expert input.

The study forecasts Big Data’s future using a growth rate criterion, emphasizing Machine Learning and Deep Learning. Moreover, the study suggests applying its methodology not only to Big Data but also to various research areas, such as Machine Learning, showcasing its potential applicability in diverse research areas.

PS12—Big Data adoption: state of the art and research challenges [ 31 ]

The study explores the widespread adoption of Big Data Analytics across diverse sectors such as finance, education, healthcare, and more. It identifies a need for increased research in untapped areas like education and healthcare, suggesting potential transformative effects.

Challenges in current Big Data research include the need for refined theoretical models, adaptable data collection methods, and larger sample sizes to ensure accuracy. The study recommends a mixed-method approach to address these challenges effectively.

The study, although not explicitly stating upcoming trends, suggests a changing research focus in both developing and developed countries. It indicates a growing awareness of untapped opportunities, hinting at a future emphasis on specific situations and new factors in Big Data adoption.

PS13—Big Data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions [ 32 ]

This research provides a comprehensive overview of Big Data Analytics. Exploring application domains, it traces Big Data’s historical integration across education, healthcare, finance, national security, and Industry 4.0 components like IoT and smart cities.

Delving into challenges, the research highlights skill shortages, dataset management, privacy, scalability, and intellectual property issues. Solutions range from software-defined data management to innovative truthfulness and privacy preservation methods.

Looking ahead, the study identifies some emerging trends: sourcing data from education and diverse IoT devices, refining pre-processing, advancing data management, enhancing privacy, and exploring deep learning methods. These trends forecast a dynamic future for Big Data Analytics, shaping the field in the next years.

PS14—Big Data analytics in healthcare: a systematic literature review and road map for practical implementation [ 33 ]

The paper conducts a thorough examination of Big Data Analytics (BDA) applications in healthcare, introducing the novel Med-BDA architecture.

Notably, the work addresses challenges inherent in BDA (such as increased costs, difficulty in acquiring a relevant skill set, rapidly expanding technology stack, and heightened management overhead), presenting a comprehensive road map to alleviate issues such as cost escalation and skill acquisition hurdles.

The document concludes by outlining the potential for extensions to Med-BDA and its applicability to diverse Big Data domains, showcasing a forward-looking perspective in BDA research and application.

PS15—Big Data analytics in telecommunications: literature review and architecture recommendations [ 34 ]

The document explores Big Data Analytics in TELCO, introducing LambdaTel as a proposed solution for batch and streaming data processing. It discusses Big Data Analytics applications like CRM and Customer Attrition.

Challenges, such as the lack of standardized architecture, are acknowledged. LambdaTel addresses these challenges through a structured approach, emphasizing security and recommending the usage Python.

While not explicitly talking about future trends, the document suggests a commitment to ongoing adaptation, seen in recommendations like Python usage, Dockerized implementation and the application of LambdaTel in a local Telco company for cross-selling/up-selling.

PS16—Big Data analytics meets social media: a systematic review of techniques, open issues, and future directions [ 35 ]

The document highlights social media’s transformative impact in healthcare, emphasizing its role in patient support and disease tracking. It emphasizes leveraging social platforms for patient support, disease prevention, and real-time tracking of contagious diseases.

The review highlights challenges in both content and network-oriented approaches, such as privacy concerns, scalability limitations, and accuracy enhancement with incomplete data. Comprehensive resolution remains an open frontier, requiring innovative solutions for privacy preservation and accurate predictions.

The paper also highlights emerging trends in Big Data Analytics, emphasizing real-time and predictive analysis, and addressing challenges in sentiment analysis. It identifies under explored areas like political and e-commerce applications, underscoring the expanding trajectory of Big Data Analytics. Furthermore, it emphasizes the evolving complexities of linguistic analysis, underlining the need for domain-dependent sentiment analysis, and addressing challenges like sarcasm detection.

PS17—Big Data and its future in computational biology: a literature review [ 36 ]

The document underscores the growing significance of Big Data in computational biology and healthcare, particularly in the conversion of healthcare records into digital formats. It highlights the major application domains, focusing on optimizing health and medical care through electronic health data.

Challenges include the under-utilization of electronic health data and the need to convert raw data into actionable information. Despite increasing interest, the field lacks comprehensive literature reviews.

The document outlines emerging trends in Big Data for computational biology and bio informatics. It emphasizes the pivotal role of volume, variety, and velocity in defining Big Data’s impact on bio informatics. Key technologies, including Hadoop and MapReduce, are discussed, illustrating their significance in the field. The integration of Big Data technology is shown to enhance biological findings and facilitate real-time identification of high-risk patients. However, limitations, such as narrow study focuses, are noted.

PS18—Big Data and sentiment analysis: a comprehensive and systematic literature review [ 37 ]

The document delves into the diverse applications of Big Data Analytics, spotlighting its evolution, notably in sentiment analysis for marketing and disaster response.

Challenges identified include data quality issues and the absence of standardized disaster-related datasets. The limitations of centralized data mining algorithms for distributed systems are acknowledged, urging exploration into other platforms (YARN is directly cited as an example). The analysis underscores the need for immediate and improved performance, emphasizing real-time analysis.

In the future, it is important for researchers to carefully look into specific methods like Hadoop, MapReduce, and deep learning. This will help us better understand what these methods are good at and where they might struggle.

PS19—Big Data applications on the internet of things: a systematic literature review [ 38 ]

This document explores the evolving applications of Big Data, from understanding customer sentiments to enhancing disaster response. Hadoop emerges as a popular framework.

Challenges include robust data acquisition from IoT devices, addressing security concerns and optimizing system scalability.

Future directions involve improving algorithms for efficiency, addressing energy consumption, and exploring the synergy of Big Data and machine learning for emergency systems.

PS20—Big Data in education: a state of the art, limitations, and future research directions [ 39 ]

The paper talks about how Big Data Analytics is used in various areas, especially in education, with a noticeable increase in publications from 2014 to 2019. It highlights important topics like how students behave, creating models, using data for education, improving systems, and adding Big Data (as a topic) to study plans.

Researchers face challenges in employing qualitative methods and data collection techniques, highlighting the need for quantitative approaches and more robust methodologies.

Future research should emphasize quantifying Big Data’s impact, adopting efficient solutions, exploring new tools and developing frameworks for educational applications. Integrating the concept of Big Data into study plans requires significant restructuring and well-designed learning activities.

PS21—Big Data in healthcare—a comprehensive bibliometric analysis of current research trends [ 40 ]

This document unveils the dynamic evolution of Big Data Analytics across diverse application domains, with a notable surge in research activities within the healthcare sector since 2012.

While the study discusses various related studies and challenges in Big Data analysis, it does not directly address or provide specific solutions to those challenges.

Looking ahead, the document reveals emerging trends and directions shaping the future of Big Data Analytics over the next 5 to 10 years. Key themes include data analytics, predictive analytics, and collaborative networks, providing a glimpse into the evolving landscape of research endeavors.

PS22—Big Data life cycle in shop-floor-trends and challenges [ 41 ]

The document explores Big Data Analytics in manufacturing, emphasizing its application domains like maintenance, automation, and decision-making.

Challenges include data measurement errors, high-frequency sampling issues, and the need for real-time processing. The study notes a shift to scalable storage options and highlights the importance of efficient data management.

Emerging trends involve the prominent role of AI and statistical approaches in data processing, coupled with a growing emphasis on data privacy. The study concludes with a call for future work focused on developing a consolidated framework for the Big Data life cycle in manufacturing.

PS23—Big Data testing techniques: taxonomy, challenges and future trends [ 42 ]

The paper explores the shift from traditional to advanced testing methods to address challenges in ETL processes, data quality, and node failures.

Addressing major challenges in Big Data analysis, the paper emphasizes the inadequacy of traditional testing, highlighting specific difficulties like ETL testing, node failure prevention, and unit-level debugging. It showcases evolving strategies employed by researchers to ensure the quality of Big Data systems.

Looking ahead, the document outlines emerging research trends shaping the future of Big Data Analytics. It identifies trends such as combinatorial testing techniques, fault tolerance testing, and model-driven entity reconciliation testing as key areas for future exploration.

PS24—Big Data with cognitive computing: a review for the future [ 43 ]

The paper explores the application domains of Big Data Analytics, highlighting its early stage in conjunction with cognitive computing, particularly in healthcare.

Challenges in adoption are attributed to a perceived lack of strategic value. The study categorizes issues into data, process, and management challenges, emphasizing the potential of integrating cognitive computing to overcome barriers.

Regarding emerging trends, there’s a rising interest in cognitive computing. The research encourages more global collaboration and highlights a gap in understanding how Big Data studies impact decision-making processes.

PS25—Current approaches for executing Big Data science projects-a systematic literature review [ 44 ]

The paper explores the landscape of Big Data Analytics. Regarding the common application domains and their evolution, the study notes a significant increase in articles. Workshops play a crucial role in shaping the trajectory, reflecting a robust and expanding interest in Big Data Analytics, influenced by technological advancements.

It also addresses challenges in Big Data analysis, with a focus on workflows and agility. While acknowledging the conceptual nature of agility papers, a gap between theoretical benefits and practical implementation is underscored, necessitating further exploration to optimize agile frameworks for data science projects.

The study highlights emerging trends in Big Data, emphasizing the need for integrated frameworks in data science. It points out a research gap in standardized approaches, urging further exploration for innovative methodologies.

PS26—Data quality affecting Big Data analytics in smart factories: research themes, issues and methods [ 45 ]

This review explores the growing applications of Big Data Analytics in Smart Factories, emphasizing an upsurge in empirical case studies on production, process monitoring, and quality tracing.

Challenges involve key data quality issues (missing, anomalous, noisy, and old data), as well as ISO-defined data quality dimensions. While technical methods prevail, an integrated approach combining technical and non-technical methods for comprehensive data quality management is highlighted. Theoretical insights focus on data quality dimensions, issues, and resolutions, while practical implications underscore the need for collaboration and integrated methods.

The study calls for future research in frameworks, data quality requirements, and emerging scenarios, contributing to Big Data Analytics evolution in Smart Factories.

PS27—Harnessing Big Data analytics for healthcare: a comprehensive review of frameworks, implications, applications, and impacts [ 46 ]

The study meticulously explores the landscape of Big Data Analytics in healthcare. Noteworthy application domains, such as multi modal data analysis and fusion, natural language processing, and electronic health records, emerge from this exploration.

Some challenges faced in Big Data analysis are presented in the document, highlighting issues like data quality, privacy concerns, and a shortage of skilled professionals. It emphasizes the necessity for interoperability and standardization while identifying ongoing challenges in multi modality, ethical considerations, and bias mitigation.

The research outlines emerging trends and directions in Big Data, emphasizing the importance of ongoing exploration in areas like multi modality, data mining, precision medicine, ethical considerations, and the broader understanding of the Big Data Ecosystem.

PS28—Leveraging Big Data in smart cities: a systematic review [ 47 ]

Big Data Analytics has evolved across diverse domains, expanding from finance and healthcare to smart cities and e-commerce. This evolution has been marked by a transformative impact on industries.

Challenges in Big Data, including security, privacy, and scalability issues, have prompted innovative solutions. Advanced encryption, anonymization techniques, and scalable computing frameworks address these concerns.

Looking ahead, emerging trends highlight the fusion of Big Data with AI, machine learning, and technologies like edge computing. Ethical considerations gain prominence and quantum computing’s potential is explored for handling massive datasets.

PS29—Roles and capabilities of enterprise architecture in Big Data analytics technology adoption and implementation [ 48 ]

The document explores the evolution and current state of Big Data Analytics, highlighting its diverse applications in domains like healthcare and finance.

Researchers have grappled with challenges such as data privacy and scalability, addressing them through innovations like advanced encryption and scalable algorithms.

Looking forward, emerging trends include the integration of Artificial Intelligence and Machine Learning for enhanced analytics and a growing focus on ethics and responsible data use. The intersection of Big Data with edge computing and IoT also opens new frontiers for real-time analytics.

PS30—Security and privacy challenges of Big Data adoption: a qualitative study in telecommunication industry [ 49 ]

The research investigates the evolution of Big Data Analytics applications across diverse domains, emphasizing healthcare, finance, marketing, and telecommunications.

Challenges include data security and privacy, addressed through advanced encryption and privacy-preserving techniques.

In the future, emerging trends highlight explainable AI, ethical data practices, and innovations in handling streaming data, graph databases, and blockchain integration.

PS31—The role of AI, machine learning, and Big Data in digital twinning: a systematic literature review, challenges, and opportunities [ 50 ]

The document explores diverse applications of Big Data Analytics across industries like healthcare, energy, and manufacturing. It underscores the evolution of these applications, highlighting a focus on optimization, diagnostics, and predictive analytics.

Challenges include data collection difficulties, picking the right AI models that are both accurate and fast and the ongoing need for standardization in digital twinning.

The document anticipates future trends, emphasizing the integration of AI, Machine Learning, and Big Data, particularly in digital twinning. It sets the stage for ongoing research in optimizing industrial processes, predictive analytics, healthcare, and smart city implementations.

PS32—The state of the art and taxonomy of Big Data analytics: view from new Big Data framework [ 51 ]

The document extensively explores the landscape of Big Data Analytics, emphasizing the dominant role of Hadoop while acknowledging the rise of Apache Spark in recent years.

Major challenges in the field involve handling diverse data formats, optimizing algorithms for evolving hardware configurations, and bridging the gap between complex systems and end-users through user-friendly visualization techniques.

It anticipates future advancements in applications, specifically in domains like e-commerce and the IoT, while expressing optimism about increased investments in Big Data technology.

In the last 15 years, Big Data has found applications across various domains, evolving over time in line with the evolution of technologies and new business needs. Some of the most common application domains for Big Data Analytics include:

Business and Finance, for example, to detect fraud detection by analyzing large datasets and identifying patterns indicative of fraudulent activities or to study customer behavior, preferences, and trends to improve marketing strategies.

Healthcare, for example, to forecast disease outbreaks, patient admission rates, and treatment outcomes, or to personalize medicine with the analysis of genetic data for ad-hoc treatments.

Retail, for example, to automatically manage and optimize inventories, and stock levels by predicting demands, or to create recommender systems to targeted and segmented customers’ profiles.

Manufacturing, for example, to predict and schedule maintenance needs and potential equipment failures by analyzing sensor data, or to improve product quality by monitoring and analyzing production processes.

Telecommunications, for example, to optimize at real-time network performance and areas for improvement, or to predict customer churn by identifying factors and customers’ behaviors that contribute to customer churn.

Government, Public Services, and Transportation, for example, to plan efficient urban mobility, traffic management, and resource allocation in Smart Cities, or to predict and prevent criminal activities, or to optimize energy distribution and reduce wastage, or to optimize transportation routes, reduce delivery times, and vehicle fleets for efficiency and cost savings.

Media, Entertainment, and Education, for example, to recommend movies, music, or articles based on users’ behaviors and preferences, or to tailor content and advertising by studying users’ behaviors, or to improve educational impact by analyzing student performance.

In Fig. 6 , we show the distribution of the studies addressing the three research questions (RQ1-RQ3), from which we has started initially our investigation: 31 PSs discuss common application domains where the use of Big Data solutions is relevant (RQ1); 30 PSs analyze research challenges and limitations of Big Data (RQ2); 28 PSs highlight emerging research trends and directions in Big Data (RQ3). The total number of papers addressing the 3 RQs is different from the number of the selected 32 PSs, since we observed overlaps and intersections (e.g., a PS can address multiple RQs.)

Distribution of studies addressing the three research questions

To better understand the main focus of the PSs, Fig. 7 shows the distribution of studies addressing the three research questions, but this time, we made it avoiding intersections (i.e., each primary study can only be part of one of the 3 categories.) We can classify 12 PSs as papers that mainly focus on RQ1, 10 PSs mainly focus on RQ2, and 10 PSs on RQ3. The homogeneous distribution of the primary studies allows us to be optimistic about the results of our research since we had a good number of studies to answer each of our research questions.

Distribution of studies mainly addressing the three research questions

To further make clear the main focuses of our studies, we decided to categorize each one. Figures 8 , 9 , and 10 show the focus of the documents for each Research Question (note that the sum of the categorized documents may be greater than the number of studies that answer that RQ, because they may overlap and be part of more than one category).

Categorization of RQ1 studies

Categorization of RQ2 studies

Categorization of RQ3 studies

Having clarified this, we now discuss the findings of our SLR. We divided this discussion in three sections, one for each Research Question, so that we could clearly define which elements answer which question.

RQ1: what are the most common application domains for Big Data analytics, and how have they evolved over time?

Delving into the realm of Big Data across various sectors over the last 15 years reveals a narrative of evolution and adaptation. Initially rooted in finance, healthcare and marketing, the domain of Big Data analytics has undergone a metamorphosis, embracing applications from computational biology to education and manufacturing, expanding into the avant-garde concept of digital twinning. This dynamic evolution is evident in studies investigating Big Data management techniques on the Internet of Things, where the focus has shifted from basic health state monitoring to sophisticated predictive modeling. This evolution signifies a maturation of Big Data analytics, with an increased focus on nuanced attributes like performance, efficiency, reliability, and scalability.

RQ2: what are the major challenges and limitations that researchers have encountered in Big Data analysis, and how have they been addressed?

Shifting our focus to the challenges within the Big Data analytics landscape, a complex history of persistent hurdles and inventive solutions comes into focus. The studies converge on a common thread, unraveling ongoing challenges encapsulated in the trio of data quality, scalability, and privacy/security concerns. Researchers faced with these challenges have become architects of innovative solutions, leveraging advanced algorithms, distributed frameworks, and privacy-preserving techniques. These solutions reflect a commitment to advancing the field in response to the complexities of handling vast and dynamic datasets.

In the implementation of Big Data Analytics, diverse challenges emerge. A dedicated study on industries points to crucial issues in data quality assessment models and secure frameworks. Here, the role of machine learning and data analytics, particularly in fault detection and reservoir management, becomes pivotal. The interconnected nature of these challenges emphasizes the importance of a comprehensive approach to implementation. Beyond technological challenges, ethical considerations surrounding data privacy and security take center stage. Researchers stress the significance of tools addressing ethical concerns, underlining that responsible deployment is intrinsic to the ethical use of Big Data Analytics.

In response to these challenges, the industry advocates for innovative solutions, emphasizing AI-driven approaches, cryptography mechanisms, and lightweight frameworks with AI. This recognition underscores the need for inventive strategies to navigate the intricate integration of Big Data into rapidly evolving technological landscapes.

RQ3: what are the emerging research trends and directions in Big Data that will likely shape the field in the next 5 to 10 years?

Looking into the next 5 to 10 years, several trends are expected to shape the landscape of Big Data Analytics. One significant trend involves making data acquisition more energy-efficient, a move that aligns with broader sustainability goals. The integration of machine learning and deep learning techniques is anticipated to enhance the analytical capabilities of Big Data systems, enabling more accurate predictions and insights. Another noteworthy trend is the emphasis on edge and fog infrastructures, signifying a shift towards decentralized processing for faster data processing and decision-making, especially relevant in the context of the Internet of Things. Importantly, these trends extend beyond technological advancements to include ethical considerations. As Big Data assumes a pivotal role in decision-making processes, these ethical dimensions must be at the forefront. This involves dealing with the tricky ethical issues that come with having such a big influence through data analytics.

In essence, the trajectory of Big Data analytics in the coming years is a dual journey, one that advances technologically with a keen eye on efficiency and, concurrently, prioritizes ethical practices. It’s a future where innovation and responsibility go hand in hand, defining a landscape that reflects both progress and ethical consciousness.

Threats to validity

Ensuring the validity of a SLR is essential for the development of a reliable study. For this reason, in this section, we examine potential threats to construct, internal and external validity, aiming to maintain the robustness of our findings.

Construct validity determines whether the implementation of the SLR aligns with its initial objectives. The efficacy of our search process and the relevance of search terms are crucial concerns. While our search terms were derived from well-defined research questions and adjusted based on that, the completeness and comprehensiveness of these terms may be subject to limitations. Additionally, the use of different keywords might have returned other relevant studies that have not been taken into consideration. A potential language bias may also exist due to the exclusion of non-English articles, representing a limitation that should be acknowledged in the overall validity of the research.

Internal validity assesses the extent to which the design and execution of the study minimize systematic errors. A key focus is on the process of data extraction from the selected primary studies. Some required data may not have been explicitly expressed or were entirely missing, posing a potential threat to internal validity. To minimize this risk, the SLR process has been supervised by another person in order to minimize error into the process.

External validity examines the extent to which the observed effects of the study can be applied beyond its scope. In this SLR, we concentrated on research questions and quality assessments to mitigate the risk of limited generalizability. However, the study’s focus on the specific domain of Big Data research may limit external validity. Moreover, the dynamic nature of Big Data and the predefined time frame (last 15 years) could affect the generalizability of findings. Recognizing these constraints, the outcomes of this SLR are considered generalizable within the specified context of Big Data research.

By acknowledging these potential threats to validity, we strive to enhance the credibility and reliability of our SLR, contributing valuable insights to the evolving landscape of Big Data research.

Over the past 15 years, Big Data has become a crucial player in various fields, adapting to technological shifts and meeting the changing needs of businesses. This review has taken a closer look at how Big Data has been applied, its challenges, and what we can expect in the near future. 189 studies were ultimately found, 32 of which were SLRs analyzed for this study.

Big Data started in areas like Business, Healthcare, and Marketing, but its influence has ultimately grown. Now, it helps predict disease outbreaks, manage retail inventory, forecast equipment failures in manufacturing, improve network performance, optimize urban planning, personalize media content, and enhance education.

Dealing with Big Data hasn’t been without challenges. Issues like ensuring data quality, handling scalability, and maintaining privacy and security have been persistent. Researchers have responded with creative solutions, using advanced algorithms and privacy measures.

Looking to the future, the trends suggest exciting developments. Making data acquisition more energy-efficient and integrating advanced machine learning techniques are on the horizon. There is a shift toward decentralized processing, especially with the Internet of Things in mind. Importantly, these trends aren’t just about technology; they also emphasize ethical considerations. Ethical issues need careful attention as Big Data becomes more influential in decision-making processes.

To summarize, the future of Big Data is a journey that combines technological progress with a strong ethical stance. It’s a path where innovation and responsibility walk hand in hand, shaping a landscape that advances both technologically and ethically. The last 15 years have set the stage and the road ahead invites us to keep exploring and engaging with the ever-evolving world of Big Data.

Data availability

No datasets were generated or analysed during the current study.

Bibliography

Tosi D, Campi AS. How schools affected the covid-19 pandemic in Italy: data analysis for Lombardy Region, Campania Region, and Emilia Region. Future Internet. 2021. https://doi.org/10.3390/fi13050109 .

Article Google Scholar

Davoudian A, Liu M. Big Data systems: a software engineering perspective. ACM Comput Surv. 2020. https://doi.org/10.1145/3408314 .

Kushwaha AK, Kar AK. Language model-driven chatbot for business to address marketing and selection of products. In: Sharma SK, Dwivedi YK, Metri B, Rana NP, editors. Re-imagining diffusion and adoption of information technology and systems: a continuing conversation. Cham: Springer; 2020. p. 16–28. https://doi.org/10.1007/978-3-030-64849-7_3 .

Chapter Google Scholar

Kushwaha AK, Kar AK. Micro-foundations of artificial intelligence adoption in business: making the shift. In: Sharma SK, Dwivedi YK, Metri B, Rana NP, editors. Re-imagining diffusion and adoption of information technology and systems: a continuing conversation. Cham: Springer; 2020. p. 249–60. https://doi.org/10.1007/978-3-030-64849-7_22 .

Dong W, Liao S, Zhang Z. Leveraging financial social media data for corporate fraud detection. J Manag Inf Syst. 2018;35(2):461–87. https://doi.org/10.1080/07421222.2018.1451954 .

Kushwaha AK, Mandal S, Pharswan R, Kar AK, Ilavarasan PV. Studying online political behaviours as rituals: a study of social media behaviour regarding the CAA. In: Sharma SK, Dwivedi YK, Metri B, Rana NP, editors. Re-imagining diffusion and adoption of information technology and systems: a continuing conversation. Cham: Springer; 2020. p. 315–26. https://doi.org/10.1007/978-3-030-64861-9_28 .

Fronzetti Colladon A, Gloor P, Iezzi DF. Editorial introduction: the power of words and networks. Int J Inf Manag. 2020;51: 102031. https://doi.org/10.1016/j.ijinfomgt.2019.10.016 .

Kushwaha AK, Kar AK, Ilavarasan PV. Predicting retweet class using deep learning. In: Piuri V, Raj S, Genovese A, Srivastava R, editors. Trends in deep learning methodologies: hybrid computational intelligence for pattern analysis. Cambridge: Academic Press; 2021. p. 89–112. https://doi.org/10.1016/B978-0-12-822226-3.00004-0 .

Tosi D. Cell phone Big Data to compute mobility scenarios for future smart cities. Int J Data Sci Anal. 2017;4:265–84. https://doi.org/10.1007/s41060-017-0061-2 .

Chen H, Chiang RHL, Storey VC. Business intelligence and analytics: from Big Data to big impact. MIS Q. 2012;36(4):1165–88. https://doi.org/10.2307/41703503 .

Wamba SF, Ngai E, Riggins F, Akter S. Big Data and business analytics adoption and use: a step toward transforming operations and production management? Bingley: Emerald Group Publishing Limited; 2017.

Google Scholar

George G, Osinga EC, Lavie D, Scott BA. Big Data and data science methods for management research. Acad Manag. 2016. https://doi.org/10.5465/amj.2016.4005 .

Curtin J, Kauffman RJ, Riggins FJ. Making the ‘MOST’ out of RFID technology: a research agenda for the study of the adoption, usage and impact of RFID. Inf Technol Manag. 2007;8(2):87–110. https://doi.org/10.1007/s10799-007-0010-1 .

Kitchenham BA, Charters S. Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE 2007-001, Keele University and Durham University Joint Report. 2007. https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf . Accessed 15 Jan 2024

Tosi D, Morasca S. Supporting the semi-automatic semantic annotation of web services: a systematic literature review. Inf Softw Technol. 2015;61:16–32. https://doi.org/10.1016/j.infsof.2015.01.007 .

Tahir A, Tosi D, Morasca S. A systematic review on the functional testing of semantic web services. J Syst Softw. 2013;86(11):2877–89. https://doi.org/10.1016/j.jss.2013.06.064 .

Briner RB, Denyer D. 112 systematic review and evidence synthesis as a practice and scholarship tool. In: Rousseau DM, editor. The Oxford handbook of evidence-based management. Oxford: Oxford University Press; 2012. https://doi.org/10.1093/oxfordhb/9780199763986.013.0007 .

Tosi D, Marzorati S, La Rosa M, Dondossola G, Terruggia R. Big Data from cellular networks: how to estimate energy demand at real-time. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), IEEE. 2015. pp. 1–10. https://doi.org/10.1109/DSAA.2015.7344881 .

Cappi R, Casini L, Tosi D, Roccetti M. Questioning the seasonality of SARS-COV-2: a Fourier spectral analysis. BMJ Open. 2022. https://doi.org/10.1136/bmjopen-2022-061602 .

Naghib A, Jafari Navimipour N, Hosseinzadeh M, Sharifi A. A comprehensive and systematic literature review on the Big Data management techniques in the internet of things. Wirel Netw. 2023;29(3):1085–144. https://doi.org/10.1007/s11276-022-03177-5 .

Sarker S, Arefin MS, Kowsher M, Bhuiyan T, Dhar PK, Kwon O-J. A comprehensive review on Big Data for industries: challenges and opportunities. IEEE Access. 2023;11:744–69. https://doi.org/10.1109/ACCESS.2022.3232526 .

Bansal M, Chana I, Clarke S. A survey on IoT Big Data: current status, 13 V’s challenges, and future directions. ACM Comput Surv. 2020. https://doi.org/10.1145/3419634 .

Hordri NF, Samar A, Yuhaniz SS, Shamsuddin SM. A systematic literature review on features of deep learning in Big Data analytics. Int J Adv Soft Comput Appl. 2017;9(1):32–49.

Zhong Y, Chen L, Dan C, Rezaeipanah A. A systematic survey of data mining and Big Data analysis in internet of things. J Supercomput. 2022;78(17):18405–53. https://doi.org/10.1007/s11227-022-04594-1 .

Rashid ANMB. Access methods for Big Data: current status and future directions. EAI Endorsed Trans Scal Inf Syst. 2017;4(15):1–14. https://doi.org/10.4108/eai.28-12-2017.153520 .

O’Donovan P, Leahy K, Bruton K, O’Sullivan DTJ. An industrial Big Data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities. J Big Data. 2015. https://doi.org/10.1186/s40537-015-0034-z .

Kushwaha AK, Kar AK, Dwivedi YK. Applications of Big Data in emerging management disciplines: a literature review using text mining. Int J Inf Manag Data Insights. 2021. https://doi.org/10.1016/j.jjimei.2021.100017 .

Alkhalil A, Abdallah MAE, Alogali A, Aljaloud A. Applying Big Data analytics in higher education: a systematic mapping study. Int J Inf Commun Technol Educ. 2021;17(3):29–51. https://doi.org/10.4018/IJICTE.20210701.oa3 .

Rahmani AM, Azhir E, Ali S, Mohammadi M, Ahmed OH, Ghafour MY, Ahmed SH, Hosseinzadeh M. Artificial intelligence approaches and mechanisms for Big Data analytics: a systematic study. PeerJ Computer Sci. 2021;7:1–28. https://doi.org/10.7717/peerj-cs.488 .

Lundberg L. Bibliometric mining of research directions and trends for Big Data. J Big Data. 2023. https://doi.org/10.1186/s40537-023-00793-6 .

Baig MI, Shuib L, Yadegaridehkordi E. Big Data adoption: state of the art and research challenges. Inf Process Manag. 2019. https://doi.org/10.1016/j.ipm.2019.102095 .

Ikegwu AC, Nweke HF, Anikwe CV, Alo UR, Okonkwo OR. Big Data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions. Clust Comput. 2022;25(5):3343–87. https://doi.org/10.1007/s10586-022-03568-5 .

Imran S, Mahmood T, Morshed A, Sellis T. Big Data analytics in healthcare: a systematic literature review and roadmap for practical implementation. IEEE/CAA J Autom Sin. 2021;8(1):1–22. https://doi.org/10.1109/JAS.2020.1003384 .

Zahid H, Mahmood T, Morshed A, Sellis T. Big Data analytics in telecommunications: literature review and architecture recommendations. IEEE/CAA J Autom Sin. 2020;7(1):18–38. https://doi.org/10.1109/JAS.2019.1911795 .

Bazzaz Abkenar S, Haghi Kashani M, Mahdipour E, Jameii SM. Big Data analytics meets social media: a systematic review of techniques, open issues, and future directions. Telemat Inf. 2021. https://doi.org/10.1016/j.tele.2020.101517 .

ElSayed IA, ElDahshan K, Hefny H, ElSayed EK. Big Data and its future in computational biology: a literature review. J Computer Sci. 2021;17(12):1222–8. https://doi.org/10.3844/jcssp.2021.1222.1228 .

Hajiali M. Big Data and sentiment analysis: a comprehensive and systematic literature review. Concurr Comput Pract Exp. 2020. https://doi.org/10.1002/cpe.5671 .

Ahmadova U, Mustafayev M, Kiani Kalejahi B, Saeedvand S, Rahmani AM. Big Data applications on the internet of things: a systematic literature review. Int J Commun Syst. 2021. https://doi.org/10.1002/dac.5004 .

Baig MI, Shuib L, Yadegaridehkordi E. Big Data in education: a state of the art, limitations, and future research directions. Int J Educ Technol High Educ. 2020;17(1):1–23.

Reshi AA, Shah ARIF, Shafi S, Qadri MH. Big Data in healthcare a comprehensive bibliometric analysis of current research trends. Scal Comput. 2023;24(3):531–49. https://doi.org/10.12694/scpe.v24i3.2155 .

Pulikottil T, Estrada-Jimenez LA, Abadía JJP, Carrera-Rivera A, Torayev A, Rehman HU, Mo F, Nikghadam-Hojjati S, Barata J. Big Data life cycle in shop-floor-trends and challenges. IEEE Access. 2023;11:30008–26. https://doi.org/10.1109/ACCESS.2023.3253286 .

Arshad I, Alsamhi SH, Afzal W. Big Data testing techniques: taxonomy, challenges and future trends. Computers Mater Contin. 2023;74(2):2739–70. https://doi.org/10.32604/cmc.2023.030266 .

Gupta S, Kar AK, Baabdullah A, Al-Khowaiter WAA. Big Data with cognitive computing: a review for the future. Int J Inf Manag. 2018;42:78–89. https://doi.org/10.1016/j.ijinfomgt.2018.06.005 .

Saltz JS, Krasteva I. Current approaches for executing Big Data science projects-a systematic literature review. PeerJ Computer Sci. 2022. https://doi.org/10.7717/PEERJ-CS.862 .

Liu C, Peng G, Kong Y, Li S, Chen S. Data quality affecting Big Data analytics in smart factories: research themes, issues and methods. Symmetry. 2021. https://doi.org/10.3390/sym13081440 .

Ahmed A, Xi R, Hou M, Shah SA, Hameed S. Harnessing Big Data analytics for healthcare: a comprehensive review of frameworks, implications, applications, and impacts. IEEE Access. 2023;11:112891–928. https://doi.org/10.1109/ACCESS.2023.3323574 .

Karimi Y, Haghi Kashani M, Akbari M, Mahdipour E. Leveraging Big Data in smart cities: a systematic review. Concurr Comput Pract Exp. 2021. https://doi.org/10.1002/cpe.6379 .

Gong Y, Janssen M. Roles and capabilities of enterprise architecture in Big Data analytics technology adoption and implementation. J Theor Appl Electron Commer Res. 2021;16(1):37–51. https://doi.org/10.4067/S0718-18762021000100104 .

Anawar S, Othman NF, Selamat SR, Ayop Z, Harum N, Rahim FA. Security and privacy challenges of Big Data adoption: a qualitative study in telecommunication industry. Int J Interact Mob Technol. 2022;16(19):81–97. https://doi.org/10.3991/ijim.v16i19.32093 .

Rathore MM, Shah SA, Shukla D, Bentafat E, Bakiras S. The role of AI, machine learning, and Big Data in digital twinning: a systematic literature review, challenges, and opportunities. IEEE Access. 2021;9:32030–52. https://doi.org/10.1109/ACCESS.2021.3060863 .

Mohamed A, Najafabadi MK, Wah YB, Zaman EAK, Maskat R. The state of the art and taxonomy of Big Data analytics: view from new Big Data framework. Artif Intell Rev. 2020;53(2):989–1037. https://doi.org/10.1007/s10462-019-09685-9 .

Li Y, Liu Z, Zhu H. Enterprise search in the Big Data era: recent developments and open challenges. Proc VLDB Endow. 2014;7(13):1717–8. https://doi.org/10.14778/2733004.2733071 .

Lee D, Camacho D, Jung JJ. Smart mobility with Big Data: approaches, applications, and challenges. Appl Sci. 2023. https://doi.org/10.3390/app13127244 .

Himeur Y, Elnour M, Fadli F, Meskin N, Petri I, Rezgui Y, Bensaali F, Amira A. AI-Big Data analytics for building automation and management systems: a survey, actual challenges and future perspectives. Artif Intell Rev. 2023;56(6):4929–5021. https://doi.org/10.1007/s10462-022-10286-2 .

Cesario E. Big Data analytics and smart cities: applications, challenges, and opportunities. Front Big Data. 2023. https://doi.org/10.3389/fdata.2023.1149402 .

Zwilling M. Big Data challenges in social sciences: an NLP analysis. J Computer Inf Syst. 2023;63(3):537–54. https://doi.org/10.1080/08874417.2022.2085211 .

Rani R, Khurana M, Kumar A, Kumar N. Big Data dimensionality reduction techniques in IoT: review, applications and open research challenges. Clust Comput. 2022;25(6):4027–49. https://doi.org/10.1007/s10586-022-03634-y .

Jagatheesaperumal SK, Rahouti M, Ahmad K, Al-Fuqaha A, Guizani M. The duo of artificial intelligence and Big Data for industry 4.0: applications, techniques, challenges, and future research directions. IEEE Internet Things J. 2022;9(15):12861–85. https://doi.org/10.1109/JIOT.2021.3139827 .

Lundberg L, Grahn H. Research trends, enabling technologies and application areas for Big Data. Algorithms. 2022. https://doi.org/10.3390/a15080280 .

Ali TAL, Khafagy MH, Farrag MH. Big Data challenges: preserving techniques for privacy violations. J Theor Appl Inf Technol. 2022;100(8):2505–17.

Latifian A. How does cloud computing help businesses to manage Big Data issues. Kybernetes. 2022;51(6):1917–48. https://doi.org/10.1108/K-05-2021-0432 .

Rehman A, Naz S, Razzak I. Leveraging Big Data analytics in healthcare enhancement: trends, challenges and opportunities. Multimed Syst. 2022;28(4):1339–71. https://doi.org/10.1007/s00530-020-00736-8 .

Al-Zahrani A, Al-Hebbi M. Big Data major security issues: challenges and defense strategies. Tehnicki Glasnik. 2022;16(2):197–204. https://doi.org/10.31803/tg-20220124135330 .

Song X, Zhang H, Akerkar R, Huang H, Guo S, Zhong L, Ji Y, Opdahl AL, Purohit H, Skupin A, Pottathil A, Culotta A. Big Data and emergency management: concepts, methodologies, and applications. IEEE Trans Big Data. 2022;8(2):397–419. https://doi.org/10.1109/TBDATA.2020.2972871 .

Singh N, Singh DP, Pant B. Big Data knowledge discovery as a service: recent trends and challenges. Wirel Pers Commun. 2022;123(2):1789–807. https://doi.org/10.1007/s11277-021-09213-5 .

Mohammadi E, Karami A. Exploring research trends in Big Data across disciplines: a text mining analysis. J Inf Sci. 2022;48(1):44–56. https://doi.org/10.1177/0165551520932855 .

Ambeth Kumar VD, Varadarajan V, Gupta MK, Rodrigues JJPC, Janu N. AI empowered Big Data analytics for industrial applications. J Univers Computer Sci. 2022;28(9):877–81. https://doi.org/10.3897/jucs.94155 .

Kumari S, Muthulakshmi P. Transformative effects of Big Data on advanced data analytics: open issues and critical challenges. J Computer Sci. 2022;18(6):463–79. https://doi.org/10.3844/jcssp.2022.463.479 .

Tang S, He B, Yu C, Li Y, Li K. A survey on spark ecosystem: Big Data processing infrastructure, machine learning, and applications. IEEE Trans Knowl Data Eng. 2022;34(1):71–91. https://doi.org/10.1109/TKDE.2020.2975652 .

Reyes-Veras PF, Renukappa S, Suresh S. Challenges faced by the adoption of Big Data in the Dominican Republic construction industry: an empirical study. J Inf Technol Constr. 2021;26:812–31. https://doi.org/10.36680/J.ITCON.2021.044 .

Bentotahewa V, Hewage C, Williams J. Solutions to Big Data privacy and security challenges associated with COVID-19 surveillance systems. Front Big Data. 2021. https://doi.org/10.3389/fdata.2021.645204 .

Escobar CA, McGovern ME, Morales-Menendez R. Quality 4.0: a review of Big Data challenges in manufacturing. J Intell Manuf. 2021;32(8):2319–34. https://doi.org/10.1007/s10845-021-01765-4 .

Mwitondi KS, Said RA. Dealing with randomness and concept drift in large datasets. Data. 2021. https://doi.org/10.3390/data6070077 .

Kusal S, Patil S, Kotecha K, Aluvalu R, Varadarajan V. Ai based emotion detection for textual Big Data: techniques and contribution. Big Data Cognit Comput. 2021. https://doi.org/10.3390/bdcc5030043 .

Lee E, Jang J. Research trend analysis for sustainable QR code use: focus on Big Data analysis. KSII Trans Internet Inf Syst. 2021;15(9):3221–42. https://doi.org/10.3837/tiis.2021.09.008 .

Rhahla M, Allegue S, Abdellatif T. Guidelines for GDPR compliance in Big Data systems. J Inf Secur Appl. 2021. https://doi.org/10.1016/j.jisa.2021.102896 .

Amović M, Govedarica M, Radulović A, Janković I. Big Data in smart city: management challenges. Appl Sci. 2021. https://doi.org/10.3390/app11104557 .

Hoozemans J, Peltenburg J, Nonnemacher F, Hadnagy A, Al-Ars Z, Hofstee HP. FPGA acceleration for Big Data analytics: challenges and opportunities. IEEE Circuits Syst Mag. 2021;21(2):30–47. https://doi.org/10.1109/MCAS.2021.3071608 .

Jalali SMJ, Park HW, Vanani IR, Pho K-H. Research trends on Big Data domain using text mining algorithms. Digit Scholarsh Hum. 2021;36(2):361–70. https://doi.org/10.1093/llc/fqaa012 .

Almutairi MM. Role of Big Data in education in KSA. Int J Inf Technol. 2021;13(1):367–73. https://doi.org/10.1007/s41870-020-00489-7 .

Ardagna D, Barbierato E, Gianniti E, Gribaudo M, Pinto TBM, Silva APC, Almeida JM. Predicting the performance of Big Data applications on the cloud. J Supercomput. 2021;77(2):1321–53. https://doi.org/10.1007/s11227-020-03307-w .

Mkrttchian V, Gamidullaeva L, Finogeev A, Chernyshenko S, Chernyshenko V, Amirov D, Potapova I. Big Data and internet of things (IoT) technologies’ influence on higher education: current state and future prospects. Int J Web-Based Learn Teach Technol. 2021;16(5):137–57. https://doi.org/10.4018/IJWLTT.20210901.oa8 .

Mourtzis D. Towards the 5th industrial revolution: a literature review and a framework for process optimization based on Big Data analytics and semantics. J Mach Eng. 2021;21(3):5–39. https://doi.org/10.36897/jme/141834 .

Dias MNR, Hassan S, Shahzad A. The impact of Big Data utilization on Malaysian government hospital healthcare performance. Int J eBus eGov Stud. 2021;13(1):50–77. https://doi.org/10.34111/ijebeg.202113103 .

Babar M, Alshehri MD, Tariq MU, Ullah F, Khan A, Uddin MI, Almasoud AS. IoT-enabled Big Data analytics architecture for multimedia data communications. Wirel Commun Mob Comput. 2021. https://doi.org/10.1155/2021/5283309 .

Bhat SA, Huang N-F. Big Data and AI revolution in precision agriculture: survey and challenges. IEEE Access. 2021;9:110209–22. https://doi.org/10.1109/ACCESS.2021.3102227 .

Zainab A, Ghrayeb A, Syed D, Abu-Rub H, Refaat SS, Bouhali O. Big Data management in smart grids: technologies and challenges. IEEE Access. 2021;9:73046–59. https://doi.org/10.1109/ACCESS.2021.3080433 .

Jabir B, Falih N. Big Data analytics opportunities and challenges for the smart enterprise. Int J Tech Phys Probl Eng. 2021;13(2):20–6.

Zineb EF, Najat R, Jaafar A. An intelligent approach for data analysis and decision making in Big Data: a case study on e-commerce industry. Int J Adv Computer Sci Appl. 2021;12(7):723–36. https://doi.org/10.14569/IJACSA.2021.0120783 .

Syed D, Zainab A, Ghrayeb A, Refaat SS, Abu-Rub H, Bouhali O. Smart grid Big Data analytics: survey of technologies, techniques, and applications. IEEE Access. 2021;9:59564–85. https://doi.org/10.1109/ACCESS.2020.3041178 .

Talebkhah M, Sali A, Marjani M, Gordan M, Hashim SJ, Rokhani FZ. IoT and Big Data applications in smart cities: recent advances, challenges, and critical issues. IEEE Access. 2021;9:55465–84. https://doi.org/10.1109/ACCESS.2021.3070905 .

Dubuc T, Stahl F, Roesch EB. Mapping the Big Data landscape: technologies, platforms and paradigms for real-time analytics of data streams. IEEE Access. 2021;9:15351–74. https://doi.org/10.1109/ACCESS.2020.3046132 .

Ang KL-M, Seng JKP. Big Data and machine learning with hyperspectral information in agriculture. IEEE Access. 2021;9:36699–718. https://doi.org/10.1109/ACCESS.2021.3051196 .

Zeadally S, Siddiqui F, Baig Z, Ibrahim A. Smart healthcare: challenges and potential solutions using internet of things (IoT) and Big Data analytics. PSU Res Rev. 2020;4(2):149–68. https://doi.org/10.1108/PRR-08-2019-0027 .

Thudumu S, Branch P, Jin J, Singh JJ. A comprehensive survey of anomaly detection techniques for high dimensional Big Data. J Big Data. 2020. https://doi.org/10.1186/s40537-020-00320-x .

Trang NH. Limitations of Big Data partitions technology. J Appl Data Sci. 2020;1(1):11–9. https://doi.org/10.47738/jads.v1i1.7 .

Caíno-Lores S, Lapin A, Carretero J, Kropf P. Applying Big Data paradigms to a large scale scientific workflow: lessons learned and future directions. Future Gener Computer Syst. 2020;110:440–52. https://doi.org/10.1016/j.future.2018.04.014 .

Awaysheh FM, Alazab M, Gupta M, Pena TF, Cabaleiro JC. Next-generation Big Data federation access control: a reference model. Future Gener Computer Syst. 2020;108:726–41. https://doi.org/10.1016/j.future.2020.02.052 .

Valencia-Parra A, Varela-Vaca AJ, Parody L, Gomez-Lopez MT. Unleashing constraint optimisation problem solving in Big Data environments. J Comput Sci. 2020. https://doi.org/10.1016/j.jocs.2020.101180 .

Article MathSciNet Google Scholar

López-Martínez F, Núñez-Valdez ER, García-Díaz V, Bursac Z. A case study for a Big Data and machine learning platform to improve medical decision support in population health management. Algorithms. 2020. https://doi.org/10.3390/A13040102 .

Iqbal R, Doctor F, More B, Mahmud S, Yousuf U. Big Data analytics and computational intelligence for cyber-physical systems: recent trends and state of the art applications. Future Gener Computer Syst. 2020;105:766–78. https://doi.org/10.1016/j.future.2017.10.021 .

Carnevale L, Celesti A, Fazio M, Villari M. A Big Data analytics approach for the development of advanced cardiology applications. Information. 2020. https://doi.org/10.3390/info11020060 .

Shukla AK, Muhuri PK, Abraham A. A bibliometric analysis and cutting-edge overview on fuzzy techniques in Big Data. Eng Appl Artif Intell. 2020. https://doi.org/10.1016/j.engappai.2020.103625 .

Karim A, Siddiqa A, Safdar Z, Razzaq M, Gillani SA, Tahir H, Kiran S, Ahmed E, Imran M. Big Data management in participatory sensing: issues, trends and future directions. Future Gener Computer Syst. 2020;107:942–55. https://doi.org/10.1016/j.future.2017.10.007 .

Humayun M. Role of emerging IoT Big Data and cloud computing for real time application. Int J Adv Computer Sci Appl. 2020;11(4):494–506.

Rabanal F, Martínez C. Cryptography for Big Data environments: current status, challenges, and opportunities. Comput Math Methods. 2020. https://doi.org/10.1002/cmm4.1075 .

Ramesh T, Santhi V. Exploring Big Data analytics in health care. Int J Intell Netw. 2020;1:135–40. https://doi.org/10.1016/j.ijin.2020.11.003 .

Gautam A, Chatterjee I. Big Data and cloud computing: a critical review. Int J Oper Res Inf Syst. 2020;11(3):19–38. https://doi.org/10.4018/IJORIS.2020070102 .

Bajaber F, Sakr S, Batarfi O, Altalhi A, Barnawi A. Benchmarking Big Data systems: a survey. Computer Commun. 2020;149:241–51. https://doi.org/10.1016/j.comcom.2019.10.002 .

Maksimov P, Koiranen T. Application of novel Big Data processing techniques in process industries. Int J Computer Appl Technol. 2020;62(3):200–15. https://doi.org/10.1504/IJCAT.2020.106591 .

Dash S, Shakyawar SK, Sharma M, Kaushik S. Big Data in healthcare: management, analysis and future prospects. J Big Data. 2019. https://doi.org/10.1186/s40537-019-0217-0 .

Nagalakshmi N, Anand Babu GL, Reddy KS, Ashalatha T. Security challenges associated with Big Data in health care system. Int J Eng Adv Technol. 2019;9(1):4057–60. https://doi.org/10.35940/ijeat.A1296.109119 .

Dai H-N, Wong RC-W, Wang H, Zheng Z, Vasilakos AV. Big Data analytics for large-scale wireless networks: challenges and opportunities. ACM Comput Surv. 2019. https://doi.org/10.1145/3337065 .

Barika M, Garg S, Zomaya AY, Wang L, Moorsel AVAN, Ranjan R. Orchestrating Big Data analysis workflows in the cloud: research challenges, survey, and future directions. ACM Comput Surv. 2019. https://doi.org/10.1145/3332301 .

Hariri RH, Fredericks EM, Bowers KM. Uncertainty in Big Data analytics: survey, opportunities, and challenges. J Big Data. 2019. https://doi.org/10.1186/s40537-019-0206-3 .

Latif Z, Lei W, Latif S, Pathan ZH, Ullah R, Jianqiu Z. Big Data challenges: prioritizing by decision-making process using analytic network process technique. Multimed Tools Appl. 2019;78(19):27127–53. https://doi.org/10.1007/s11042-017-5161-4 .

Kumari A, Tanwar S, Tyagi S, Kumar N. Verification and validation techniques for streaming Big Data analytics in internet of things environment. IET Netw. 2019;8(3):155–63. https://doi.org/10.1049/iet-net.2018.5187 .

Singh SP, Nayyar A, Kumar R, Sharma A. Fog computing: from architecture to edge computing and Big Data processing. J Supercomput. 2019;75(4):2070–105. https://doi.org/10.1007/s11227-018-2701-2 .

Raufi B, Ismaili F, Ajdari J, Zenuni X. Web personalization issues in Big Data and semantic web: challenges and opportunities. Turk J Electr Eng Computer Sci. 2019;27(4):2379–94. https://doi.org/10.3906/elk-1812-25 .

Rahman NA, Nor NM. Healthcare using social media: Big Data analytics perspective. J Adv Res Dyn Control Syst. 2019;11(8 Special Issue):1169–79.

Mishra S, Pattnaik S, Mishra BB. Application of Big Data analysis in supply chain management: future challenges. J Adv Res Dyn Control Syst. 2019;11(8 Special Issue):2541–8.

Ivanovic M, Klasnja-Milicevic A. Big Data and collective intelligence. Int J Embed Syst. 2019;11(5):573–83. https://doi.org/10.1504/IJES.2019.102430 .

Qolomany B, Al-Fuqaha A, Gupta A, Benhaddou D, Alwajidi S, Qadir J, Fong AC. Leveraging machine learning and Big Data for smart buildings: a comprehensive survey. IEEE Access. 2019;7:90316–56. https://doi.org/10.1109/ACCESS.2019.2926642 .

Shah SA, Seker DZ, Hameed S, Draheim D. The rising role of Big Data analytics and IoT in disaster management: recent advances, taxonomy and prospects. IEEE Access. 2019;7:54595–614. https://doi.org/10.1109/ACCESS.2019.2913340 .

Lin W, Zhang Z, Peng S. Academic research trend analysis based on Big Data technology. Int J Comput Sci Eng. 2019;20(1):31–9. https://doi.org/10.1504/ijcse.2019.103247 .

Hong L, Luo M, Wang R, Lu P, Lu W, Lu L. Big Data in health care: applications and challenges. Data Inf Manag. 2018;2(3):175–97. https://doi.org/10.2478/dim-2018-0014 .

Pal D, Triyason T, Padungweang P. Big Data in smart-cities: current research and challenges. Indones J Electr Eng Inf. 2018;6(4):351–60. https://doi.org/10.11591/ijeei.v6i4.543 .

Li N, Mahalik NP. A Big Data and cloud computing specification, standards and architecture: agricultural and food informatics. Int J Inf Commun Technol. 2019;14(2):159–74. https://doi.org/10.1504/IJICT.2019.097687 .

Chiroma H, Abdullahi UA, Abdulhamid SM, Abdulsalam Alarood A, Gabralla LA, Rana N, Shuib L, Targio Hashem IA, Gbenga DE, Abubakar AI, Zeki AM, Herawan T. Progress on artificial neural networks for Big Data analytics: a survey. IEEE Access. 2019;7:70535–51. https://doi.org/10.1109/ACCESS.2018.2880694 .

Waheed H, Hassan S-U, Aljohani NR, Wasif M. A bibliometric perspective of learning analytics research landscape. Behav Inf Technol. 2018;37(10–11):941–57. https://doi.org/10.1080/0144929X.2018.1467967 .

Ray J, Johnny O, Trovati M, Sotiriadis S, Bessis N. The rise of Big Data science: a survey of techniques, methods and approaches in the field of natural language processing and network theory. Big Data Cognit Comput. 2018;2(3):1–18. https://doi.org/10.3390/bdcc2030022 .

Mantelero A. AI and Big Data: a blueprint for a human rights, social and ethical impact assessment. Computer Law Secur Rev. 2018;34(4):754–72. https://doi.org/10.1016/j.clsr.2018.05.017 .

Sultan K, Ali H, Zhang Z. Big Data perspective and challenges in next generation networks. Future Internet. 2018. https://doi.org/10.3390/fi10070056 .

Li Q, Chen Y, Wang J, Chen Y, Chen H. Web media and stock markets : a survey and future directions from a Big Data perspective. IEEE Trans Knowl Data Eng. 2018;30(2):381–99. https://doi.org/10.1109/TKDE.2017.2763144 .

Jabbar S, Malik KR, Ahmad M, Aldabbas O, Asif M, Khalid S, Han K, Ahmed SH. A methodology of real-time data fusion for localized Big Data analytics. IEEE Access. 2018;6:24510–20. https://doi.org/10.1109/ACCESS.2018.2820176 .

Darwish TSJ, Abu Bakar K. Fog based intelligent transportation Big Data analytics in the internet of vehicles environment: motivations, architecture, challenges, and critical issues. IEEE Access. 2018;6:15679–701. https://doi.org/10.1109/ACCESS.2018.2815989 .

Zheng S, Chen S, Yang L, Zhu J, Luo Z, Hu J, Yang X. Big Data processing architecture for radio signals empowered by deep learning: concept, experiment, applications and challenges. IEEE Access. 2018;6:55907–22. https://doi.org/10.1109/ACCESS.2018.2872769 .

Stefanowski J, Krawiec K, Wrembel R. Exploring complex and Big Data. Int J Appl Math Computer Sci. 2017;27(4):669–79. https://doi.org/10.1515/amcs-2017-0046 .

Harerimana G, Jang B, Kim JW, Park HK. Health Big Data analytics: a technology survey. IEEE Access. 2018;6:65661–78. https://doi.org/10.1109/ACCESS.2018.2878254 .

Ravi S, Jeyaprakash T. Combined ideas on the necessity of Big Data on internet of things and researchers point of view and its challenges, future directions. J Adv Res Dyn Control Syst. 2018;10(9 Special Issue):2140–4.

Neggers J, Allix O, Hild F, Roux S. Big Data in experimental mechanics and model order reduction: today’s challenges and tomorrow’s opportunities. Arch Comput Methods Eng. 2018;25(1):143–64. https://doi.org/10.1007/s11831-017-9234-3 .

Khan S, Liu X, Shakil KA, Alam M. A survey on scholarly data: from Big Data perspective. Inf Process Manag. 2017;53(4):923–44. https://doi.org/10.1016/j.ipm.2017.03.006 .

Costa C, Santos MY. Big Data: state-of-the-art concepts, techniques, technologies, modeling approaches and research challenges. IAENG Int J Computer Sci. 2017;44(3):285–301.

Lv Z, Song H, Basanta-Val P, Steed A, Jo M. Next-generation Big Data analytics: state of the art, challenges, and future research topics. IEEE Trans Ind Inf. 2017;13(4):1891–9. https://doi.org/10.1109/TII.2017.2650204 .

Memon MA, Soomro S, Jumani AK, Kartio MA. Big Data analytics and its applications. Ann Emerg Technol Comput. 2017;1(1):45–54. https://doi.org/10.33166/AETiC.2017.01.006 .

Mantelero A. Regulating Big Data The guidelines of the Council of Europe in the context of the European data protection framework. Computer Law Secur Rev. 2017;33(5):584–602. https://doi.org/10.1016/j.clsr.2017.05.011 .

Zhou L, Pan S, Wang J, Vasilakos AV. Machine learning on Big Data: opportunities and challenges. Neurocomputing. 2017;237:350–61. https://doi.org/10.1016/j.neucom.2017.01.026 .

Yan J, Meng Y, Lu L, Li L. Industrial Big Data in an industry 4.0 environment: challenges, schemes, and applications for predictive maintenance. IEEE Access. 2017;5:23484–91. https://doi.org/10.1109/ACCESS.2017.2765544 .

Gonçalves ME. The EU data protection reform and the challenges of Big Data: remaining uncertainties and ways forward. Inf Commun Technol Law. 2017;26(2):90–115. https://doi.org/10.1080/13600834.2017.1295838 .

Peng S, Wang G, Xie D. Social influence analysis in social networking Big Data: opportunities and challenges. IEEE Netw. 2017;31(1):11–7. https://doi.org/10.1109/MNET.2016.1500104NM .

L’Heureux A, Grolinger K, Elyamany HF, Capretz MAM. Machine learning with Big Data: challenges and approaches. IEEE Access. 2017;5:7776–97. https://doi.org/10.1109/ACCESS.2017.2696365 .

Manikyam NRH, Mohan Kumar S. Methods and techniques to deal with Big Data analytics and challenges in cloud computing environment. Int J Civil Eng Technol. 2017;8(4):669–78.

El-Seoud SA, El-Sofany HF, Abdelfattah M, Mohamed R. Big Data and cloud computing: trends and challenges. Int J Interact Mob Technol. 2017;11(2):34–52. https://doi.org/10.3991/ijim.v11i2.6561 .

Zúñiga H, Diehl T. Citizenship, social media, and Big Data: current and future research in the social sciences. Soc Sci Computer Rev. 2017;35(1):3–9. https://doi.org/10.1177/0894439315619589 .

Wang H, Xu Z, Pedrycz W. An overview on the roles of fuzzy set techniques in Big Data processing: trends, challenges and opportunities. Knowl-Based Syst. 2017;118:15–30. https://doi.org/10.1016/j.knosys.2016.11.008 .

Choi T-M, Chan HK, Yue X. Recent development in Big Data analytics for business operations and risk management. IEEE Trans Cybern. 2017;47(1):81–92. https://doi.org/10.1109/TCYB.2015.2507599 .

Zhong RY, Newman ST, Huang GQ, Lan S. Big Data for supply chain management in the service and manufacturing sectors: challenges, opportunities, and future perspectives. Comput Ind Eng. 2016;101:572–91. https://doi.org/10.1016/j.cie.2016.07.013 .

Bajaber F, Elshawi R, Batarfi O, Altalhi A, Barnawi A, Sakr S. Big Data 2.0 processing systems: taxonomy and open challenges. J Grid Comput. 2016;14(3):379–405. https://doi.org/10.1007/s10723-016-9371-1 .

De Gennaro M, Paffumi E, Martini G. Big Data for supporting low-carbon road transport policies in europe: applications, challenges and opportunities. Big Data Res. 2016;6:11–25. https://doi.org/10.1016/j.bdr.2016.04.003 .

Wang H, Xu Z, Fujita H, Liu S. Towards felicitous decision making: an overview on challenges and trends of Big Data. Inf Sci. 2016;367–368:747–65. https://doi.org/10.1016/j.ins.2016.07.007 .

Rodríguez-Mazahua L, Rodríguez-Enríquez C-A, Sánchez-Cervantes JL, Cervantes J, García-Alcaraz JL, Alor-Hernández G. A general perspective of Big Data: applications, tools, challenges and trends. J Supercomput. 2016;72(8):3073–113. https://doi.org/10.1007/s11227-015-1501-1 .

Bello-Orgaz G, Jung JJ, Camacho D. Social Big Data: recent achievements and new challenges. Inf Fus. 2016;28:45–59. https://doi.org/10.1016/j.inffus.2015.08.005 .

Zheng X, Chen W, Wang P, Shen D, Chen S, Wang X, Zhang Q, Yang L. Big Data for social transportation. IEEE Trans Intell Transp Syst. 2016;17(3):620–30. https://doi.org/10.1109/TITS.2015.2480157 .

Sahay S. Big Data and public health: challenges and opportunities for low and middle income countries. Commun Assoc Inf Syst. 2016;39(1):419–38. https://doi.org/10.17705/1cais.03920 .

Sharma N, Namratha B. Towards addressing the challenges of data intensive computing in Big Data analytics. Int J Control Theor Appl. 2016;9(23):57–62.

Yu S. Big privacy: challenges and opportunities of privacy study in the age of Big Data. IEEE Access. 2016;4:2751–63. https://doi.org/10.1109/ACCESS.2016.2577036 .

Chen C-M. Use cases and challenges in telecom Big Data analytics. APSIPA Trans Signal Inf Process. 2016. https://doi.org/10.1017/ATSIP.2016.20 .

Anagnostopoulos I, Zeadally S, Exposito E. Handling Big Data: research challenges and future directions. J Supercomput. 2016;72(4):1494–516. https://doi.org/10.1007/s11227-016-1677-z .

Jothi B, Pushpalatha M, Krishnaveni S. Significance and challenges in Big Data: a survey. Int J Control Theor Appl. 2016;9(34):235–43.

Huang Y, Schuehle J, Porter AL, Youtie J. A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’. Scientometrics. 2015;105(3):2005–22. https://doi.org/10.1007/s11192-015-1638-y .

Xu Z, Shi Y. Exploring Big Data analysis: fundamental scientific problems. Ann Data Sci. 2015;2(4):363–72. https://doi.org/10.1007/s40745-015-0063-7 .

Olshannikova E, Ometov A, Koucheryavy Y, Olsson T. Visualizing Big Data with augmented and virtual reality: challenges and research agenda. J Big Data. 2015. https://doi.org/10.1186/s40537-015-0031-2 .

Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R. Big Data computing and clouds: trends and future directions. J Parallel Distrib Comput. 2015;79–80:3–15. https://doi.org/10.1016/j.jpdc.2014.08.003 .

Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E. Deep learning applications and challenges in Big Data analytics. J Big Data. 2015. https://doi.org/10.1186/s40537-014-0007-7 .

Nativi S, Mazzetti P, Santoro M, Papeschi F, Craglia M, Ochiai O. Big Data challenges in building the global earth observation system of systems. Environ Model Softw. 2015;68:1–26. https://doi.org/10.1016/j.envsoft.2015.01.017 .

Tian X, Han R, Wang L, Lu G, Zhan J. Latency critical Big Data computing in finance. J Finance Data Sci. 2015;1(1):33–41. https://doi.org/10.1016/j.jfds.2015.07.002 .

Perera C, Ranjan R, Wang L, Khan SU, Zomaya AY. Big Data privacy in the internet of things era. IT Prof. 2015;17(3):32–9. https://doi.org/10.1109/MITP.2015.34 .

Jin X, Wah BW, Cheng X, Wang Y. Significance and challenges of Big Data research. Big Data Res. 2015;2(2):59–64. https://doi.org/10.1016/j.bdr.2015.01.006 .

Mao R, Xu H, Wu W, Li J, Li Y, Lu M. Overcoming the challenge of variety: Big Data abstraction, the next evolution of data management for AAL communication systems. IEEE Commun Mag. 2015;53(1):42–7. https://doi.org/10.1109/MCOM.2015.7010514 .

Philip Chen CL, Zhang C-Y. Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci. 2014;275:314–47. https://doi.org/10.1016/j.ins.2014.01.015 .

Ma Y, Wu H, Wang L, Huang B, Ranjan R, Zomaya A, Jie W. Remote sensing Big Data computing: challenges and opportunities. Future Gener Computer Syst. 2015;51:47–60. https://doi.org/10.1016/j.future.2014.10.029 .

Jeong SR, Ghani I. Semantic computing for Big Data: approaches, tools, and emerging directions (2011–2014). KSII Trans Internet Inf Syst. 2014;8(6):2022–42. https://doi.org/10.3837/tiis.2014.06.012 .

Sun D, Liu C, Ren D. Prospects, challenges and latest developments in designing a scalable Big Data stream computing system. Int J Wirel Mob Comput. 2015;9(2):155–60. https://doi.org/10.1504/IJWMC.2015.072567 .

Dobre C, Xhafa F. Intelligent services for Big Data science. Future Gener Computer Syst. 2014;37:267–81. https://doi.org/10.1016/j.future.2013.07.014 .

Qin HF, Li ZH. Research on the method of Big Data analysis. Inf Technol J. 2013;12(10):1974–80. https://doi.org/10.3923/itj.2013.1974.1980 .

Ji C, Li Y, Qiu W, Jin Y, Xu Y, Awada U, Li K, Qu W. Big Data processing: big challenges. J Interconnect Netw. 2012. https://doi.org/10.1142/S0219265912500090 .

Kambatla K, Kollias G, Kumar V, Grama A. Trends in Big Data analytics. J Parallel Distrib Comput. 2014;74(7):2561–73. https://doi.org/10.1016/j.jpdc.2014.01.003 .

Dong XL, Srivastava D. Big Data integration. Proc VLDB Endow. 2013;6(11):1188–9. https://doi.org/10.14778/2536222.2536253 .

Yin H, Jiang Y, Lin C, Luo Y, Liu Y. Big Data: transforming the design philosophy of future internet. IEEE Netw. 2014;28(4):14–9. https://doi.org/10.1109/MNET.2014.6863126 .

Nti IK, Quarcoo JA, Aning J, Fosu GK. A mini-review of machine learning in Big Data analytics: applications, challenges, and prospects. Big Data Min Anal. 2022;5(2):81–97. https://doi.org/10.26599/BDMA.2021.9020028 .

Yu Y, Li M, Liu L, Li Y, Wang J. Clinical Big Data and deep learning: applications, challenges, and future outlooks. Big Data Min Analy. 2019;2(4):288–305. https://doi.org/10.26599/BDMA.2019.9020007 .

Amalina F, Targio Hashem IA, Azizul ZH, Fong AT, Firdaus A, Imran M, Anuar NB. Blending Big Data analytics: review on challenges and a recent study. IEEE Access. 2020;8:3629–45. https://doi.org/10.1109/ACCESS.2019.2923270 .

Chen X-W, Lin X. Big Data deep learning: challenges and perspectives. IEEE Access. 2014;2:514–25. https://doi.org/10.1109/ACCESS.2014.2325029 .

Alam A, Ullah I, Lee Y-K. Video Big Data analytics in the cloud: a reference architecture, survey, opportunities, and open research issues. IEEE Access. 2020;8:152377–422. https://doi.org/10.1109/ACCESS.2020.3017135 .

Pham Q-V, Nguyen DC, Huynh-The T, Hwang W-J, Pathirana PN. Artificial intelligence (AI) and Big Data for coronavirus (COVID-19) pandemic: a survey on the state-of-the-arts. IEEE Access. 2020;8:130820–39. https://doi.org/10.1109/ACCESS.2020.3009328 .

Aydin AA. A comparative perspective on technologies of Big Data value chain. IEEE Access. 2023;11:112133–46. https://doi.org/10.1109/ACCESS.2023.3323160 .

Kalantari A, Kamsin A, Kamaruddin HS, Ale Ebrahim N, Gani A, Ebrahimi A, Shamshirband S. A bibliometric approach to tracking Big Data research trends. J Big Data. 2017. https://doi.org/10.1186/s40537-017-0088-1 .

Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV. Big Data analytics: a survey. J Big Data. 2015. https://doi.org/10.1186/s40537-015-0030-3 .

Raghupathi W, Raghupathi V. Big Data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2(1):3. https://doi.org/10.1186/2047-2501-2-3 .

Ram Mohan Rao P, Murali Krishna S, Siva Kumar AP. Privacy preservation techniques in Big Data analytics: a survey. J Big Data. 2018. https://doi.org/10.1186/s40537-018-0141-8 .

Ali A, Qadir J, Rasool RU, Sathiaseelan A, Zwitter A, Crowcroft J. Big Data for development: applications and techniques. Big Data Anal. 2016. https://doi.org/10.1186/s41044-016-0002-4 .

Hasan MM, Popp J, Oláh J. Current landscape and influence of Big Data on finance. J Big Data. 2020. https://doi.org/10.1186/s40537-020-00291-z .

Seyedan M, Mafakheri F. Predictive Big Data analytics for supply chain demand forecasting: methods, applications, and research opportunities. J Big Data. 2020. https://doi.org/10.1186/s40537-020-00329-2 .

Chang V, Muñoz VM, Ramachandran M. Emerging applications of internet of things, Big Data, security, and complexity: special issue on collaboration opportunity for IoTBDS and COMPLEXIS. Computing. 2020;102(6):1301–4. https://doi.org/10.1007/s00607-020-00811-y .

Biswas S, Khare N, Agrawal P, Jain P. Machine learning concepts for correlated Big Data privacy. J Big Data. 2021. https://doi.org/10.1186/s40537-021-00530-x .

Belcastro L, Cantini R, Marozzo F, Orsino A, Talia D, Trunfio P. Programming Big Data analysis: principles and solutions. J Big Data. 2022. https://doi.org/10.1186/s40537-021-00555-2 .

Abdalla HB. A brief survey on Big Data: technologies, terminologies and data-intensive applications. J Big Data. 2022. https://doi.org/10.1186/s40537-022-00659-3 .

Download references

Acknowledgments

Not applicable.

Author information

Authors and affiliations.

Department of Theoretical and Applied Sciences, Università degli Studi dell’Insubria, Via Mazzini 5, 21100, Varese, Italy

Davide Tosi & Redon Kokaj

Department of Computer Science and Engineering, University of Bologna, Mura Anteo Zamboni 7, 40126, Bologna, Italy

Marco Roccetti

You can also search for this author in PubMed Google Scholar

Contributions

D.T. designed the SLR and wrote the main manuscript text R.K. conducted the SLR M.R. contributed to the Introduction, Challenges, and Conclusions All authors reviewed the manuscript.

Corresponding author

Correspondence to Davide Tosi .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Tosi, D., Kokaj, R. & Roccetti, M. 15 years of Big Data: a systematic literature review. J Big Data 11 , 73 (2024). https://doi.org/10.1186/s40537-024-00914-9

Download citation

Received : 05 February 2024

Accepted : 07 April 2024

Published : 14 May 2024

DOI : https://doi.org/10.1186/s40537-024-00914-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Systematic literature review
Data analysis
Artificial intelligence

Table of Contents
Random Entry
Chronological
Editorial Information
About the SEP
Editorial Board
How to Cite the SEP
Special Characters
Advanced Tools
Support the SEP
PDFs for SEP Friends
Make a Donation
SEPIA for Libraries
Back to Entry
Entry Contents
Entry Bibliography
Academic Tools
Friends PDF Preview
Author and Citation Info
Back to Top

Notes to Scientific Research and Big Data

1. When a data collection can or should be regarded as “big data”, and the significance of this particular label for research, is discussed at length in Leonelli (2016), Kitchin and McArdle (2016) and Aronova, van Oertzen, and Sepkoski (2017).

2. This understanding of scientific knowledge is also embedded within publishing practices. As exemplified by the use of impact factors, scientific excellence is evaluated on the strength of authorship of articles, thus placing the production of scientific claims at the pinnacle of knowledge creation. Researchers whose activities focus away from writing theoretical statements—such as data curators or software developers—are often viewed as technicians with a lower status. The emergence of big data is challenging these habits and perceptions, for instance through the rise of Open Science practices, but it is no wonder that within this landscape, philosophers have focused their attention on models and theories as central outputs of research, leaving data behind.

Accessibility

Support SEP

Mirror sites.

View this site from another server:

Info about mirror sites

Library of Congress Catalog Data: ISSN 1095-5054

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Publications
Account settings
Advanced Search
Journal List

Ethics review of big data research: What should stay and what should be reformed?

Agata ferretti, marcello ienca, mark sheehan, alessandro blasimme, edward s dove, bobbie farsides, phoebe friesen, walter karlen, peter kleist, s matthew liao, camille nebeker, gabrielle samuel, mahsa shabani, minerva rivas velarde, effy vayena.

Author information
Article notes
Copyright and License information

Corresponding author.

Received 2020 Nov 23; Accepted 2021 Apr 15; Collection date 2021.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Ethics review is the process of assessing the ethics of research involving humans. The Ethics Review Committee (ERC) is the key oversight mechanism designated to ensure ethics review. Whether or not this governance mechanism is still fit for purpose in the data-driven research context remains a debated issue among research ethics experts.

In this article, we seek to address this issue in a twofold manner. First, we review the strengths and weaknesses of ERCs in ensuring ethical oversight. Second, we map these strengths and weaknesses onto specific challenges raised by big data research. We distinguish two categories of potential weakness. The first category concerns persistent weaknesses, i.e., those which are not specific to big data research, but may be exacerbated by it. The second category concerns novel weaknesses, i.e., those which are created by and inherent to big data projects. Within this second category, we further distinguish between purview weaknesses related to the ERC’s scope (e.g., how big data projects may evade ERC review) and functional weaknesses, related to the ERC’s way of operating. Based on this analysis, we propose reforms aimed at improving the oversight capacity of ERCs in the era of big data science.

Conclusions

We believe the oversight mechanism could benefit from these reforms because they will help to overcome data-intensive research challenges and consequently benefit research at large.

Keywords: Big data, Research ethics, Ethics, IRBs, RECs, Ethics review, Biomedical research

The debate about the adequacy of the Ethics Review Committee (ERC) as the chief oversight body for big data studies is partly rooted in the historical evolution of the ERC. Particularly relevant is the ERC’s changing response to new methods and technologies in scientific research. ERCs—also known as Institutional Review Boards (IRBs) or Research Ethics Committees (RECs)—came to existence in the 1950s and 1960s [ 1 ]. Their original mission was to protect the interests of human research participants, particularly through an assessment of potential harms to them (e.g., physical pain or psychological distress) and benefits that might accrue from the proposed research. ERCs expanded in scope during the 1970s, from participant protection towards ensuring valuable and ethical human subject research (e.g., having researchers implement an informed consent process), as well as supporting researchers in exploring their queries [ 2 ].

Fast forward fifty years, and a lot has changed. Today, biomedical projects leverage unconventional data sources (e.g., social media), partially inscrutable data analytics tools (e.g., machine learning), and unprecedented volumes of data [ 3 – 5 ]. Moreover, the evolution of research practices and new methodologies such as post-hoc data mining have blurred the concept of ‘ human subject’ and elicited a shift towards the concept of data subject —as attested in data protection regulations. [ 6 , 7 ]. With data protection and privacy concerns being in the spotlight of big data research review, language from data protection laws has worked its way into the vocabulary of research ethics. This terminological shift further reveals that big data, together with modern analytic methods used to interpret the data, creates novel dynamics between researchers and participants [ 8 ]. Research data repositories about individuals and aggregates of individuals are considerably expanding in size. Researchers can remotely access and use large volumes of potentially sensitive data without communicating or actively engaging with study participants. Consequently, participants become more vulnerable and subjected to the research itself [ 9 ]. As such, the nature of risk involved in this new form of research changes too. In particular, it moves from the risk of physical or psychological harm towards the risk of informational harm, such as privacy breaches or algorithmic discrimination [ 10 ]. This is the case, for instance, with projects using data collected through web search engines, mobile and smart devices, entertainment websites, and social media platforms. The fact that health-related research is leaving hospital labs and spreading into online space creates novel opportunities for research, but also raises novel challenges for ERCs. For this reason, it is important to re-examine the fit between new data-driven forms of research and existing oversight mechanisms [ 11 ].

The suitability of ERCs in the context of big data research is not merely a theoretical puzzle but also a practical concern resulting from recent developments in data science. In 2014, for example, the so-called ‘emotional contagion study’ received severe criticism for avoiding ethical oversight by an ERC, failing to obtain research consent, violating privacy, inflicting emotional harm, discriminating against data subjects, and placing vulnerable participants (e.g., children and adolescents) at risk [ 12 , 13 ]. In both public and expert opinion [ 14 ], a responsible ERC would have rejected this study because it contravened the research ethics principles of preventing harm (in this case, emotional distress) and adequately informing data subjects. However, the protocol adopted by the researchers was not required to undergo ethics review under US law [ 15 ] for two reasons. First, the data analyzed were considered non-identifiable, and researchers did not engage directly with subjects, exempting the study from ethics review. Second, the study team included both scientists affiliated with a public university (Cornell) and Facebook employees. The affiliation of the researchers is relevant because—in the US and some other countries—privately funded studies are not subject to the same research protections and ethical regulations as publicly funded research [ 16 ]. An additional example is the 2015 case in which the United Kingdom (UK) National Health Service (NHS) shared 1.6 million pieces of identifiable and sensitive data with Google DeepMind. This data transfer from the public to the private party took place legally, without the need for patient consent or ethics review oversight [ 17 ]. These cases demonstrate how researchers can pursue potentially risky big data studies without falling under the ERC’s purview. The limitations of the regulatory framework for research oversight are evident, in both private and public contexts.

The gaps in the ERC’s regulatory process, together with the increased sophistication of research contexts—which now include a variety of actors such as universities, corporations, funding agencies, public institutes, and citizens associations—has led to an increase in the range of oversight bodies. For instance, besides traditional university ethics committees and national oversight committees, funding agencies and national research initiatives have increasingly created internal ethics review boards [ 18 , 19 ]. New participatory models of governance have emerged, largely due to an increase in subjects’ requests to control their own data [ 20 ]. Corporations are creating research ethics committees as well, modelled after the institutional ERC [ 21 ]. In May 2020, for example, Facebook welcomed the first members of its Oversight Board, whose aim is to review the company’s decisions about content moderation [ 22 ]. Whether this increase in oversight models is motivated by the urge to fill the existing regulatory gaps, or whether it is just ‘ethics washing’, is still an open question. However, other types of specialized committees have already found their place alongside ERCs, when research involves international collaboration and data sharing [ 23 ]. Among others, data safety monitoring boards, data access committees, and responsible research and innovation panels serve the purpose of covering research areas left largely unregulated by current oversight [ 24 ].

The data-driven digital transformation challenges the purview and efficacy of ERCs. It also raises fundamental questions concerning the role and scope of ERCs as the oversight body for ethical and methodological soundness in scientific research. 1 Among these questions, this article will explore whether ERCs are still capable of their intended purpose, given the range of novel (maybe not categorically new, but at least different in practice) issues that have emerged in this type of research. To answer this question, we explore some of the challenges that the ERC oversight approach faces in the context of big data research and review the main strengths and weaknesses of this oversight mechanism. Based on this analysis, we will outline possible solutions to address current weaknesses and improve ethics review in the era of big data science.

Strengths of the ethics review via ERC

Historically, ERCs have enabled cross disciplinary exchange and assessment [ 27 ]. ERC members typically come from different backgrounds and bring their perspectives to the debate; when multi-disciplinarity is achieved, the mixture of expertise provides the conditions for a solid assessment of advantages and risks associated with new research. Committees which include members from a variety of backgrounds are also suited to promote projects from a range of fields, and research that cuts across disciplines [ 28 ]. Within these committees, the reviewers’ expertise can be paired with a specific type of content to be reviewed. This one-to-one match can bring timely and, ideally, useful feedback [ 29 ]. In many countries (e.g., European countries, the United States (US), Canada, Australia), ERCs are explicitly mandated by law to review many forms of research involving human participants; moreover, these laws also describe how such a body should be structured and the purview of its review [ 30 , 31 ]. In principle, ERCs also aim to be representative of society and the research enterprise, including members of the public and minorities, as well as researchers and experts [ 32 ]. And in performing a gatekeeping function to the research enterprise, ERCs play an important role: they recognize that both experts and lay people should have a say, with different views to contribute [ 33 ].

Furthermore, the ERC model strives to ensure independent assessment. The fact that ERCs assess projects “from the outside” and maintain a certain degree of objectivity towards what they are reviewing, reduces the risk of overlooking research issues and decreases the risk for conflicts of interest. Moreover, being institutionally distinct—for example, being established by an organization that is distinct from the researcher or the research sponsor—brings added value to the research itself as this lessens the risk for conflict of interest. Conflict of interest is a serious issue in research ethics because it can compromise the judgment of reviewers. Institutionalized review committees might particularly suffer from political interference. This is the case, for example, for universities and health care systems (like the NHS), which tend to engage “in house” experts as ethics boards members. However, ERCs that can prove themselves independent are considered more trustworthy by the general public and data subjects; it is reassuring to know that an independent committee is overseeing research projects [ 34 ].

The ex-ante (or pre-emptive) ethical evaluation of research studies is by many considered the standard procedural approach of ERCs [ 35 ]. Though the literature is divided on the usefulness and added value provided by this form of review [ 36 , 37 ], ex-ante review is commonly used as a mechanism to ensure the ethical validity of a study design before the research is conducted [ 38 , 39 ]. Early research scrutiny aims at risk-mitigation: the ERC evaluates potential research risks and benefits, in order to protect participants’ physical and psychological well-being, dignity, and data privacy. This practice saves researchers’ resources and valuable time by preventing the pursuit of unethical or illegal paths [ 40 ]. Finally, the ex-ante ethical assessment gives researchers an opportunity to receive feedback from ERCs, whose competence and experience may improve the research quality and increase public trust in the research [ 41 ].

All strengths mentioned in this section are strengths of the ERC model in principle. In practice, there are many ERCs that are not appropriately interdisciplinary or representative of the population and minorities, that lack independence from the research being reviewed, and that fail to improve research quality, and may in fact hinder it. We now turn to consider some of these weaknesses in more detail.

Weaknesses of the ethics review via ERC

In order to assess whether ERCs are adequately equipped to oversee big data research, we must consider the weaknesses of this model. We identify two categories of weaknesses which are described in the following section and summarized in Fig. 1 :

Persistent weaknesses : those existing in the current oversight system, which could be exacerbated by big data research

Novel weaknesses : those brought about by and specific to the nature of big data projects

Within this second category of novel weaknesses, we further differentiate between:

Purview weaknesses : reasons why some big data projects may bypass the ERCs’ purview

Functional weaknesses : reasons why some ERCs may be inadequate to assess big data projects specifically

Weaknesses of the ERCs

We base the conceptual distinction between persistent and novel weaknesses on the fact that big data research diverges from traditional biomedical research in many respects. As previously mentioned, big data projects are often broad in scope, involve new actors, use unprecedented methodologies to analyze data, and require specific expertise. Furthermore, the peculiarities of big data itself (e.g., being large in volume and from a variety of sources) make data-driven research different in practice from traditional research. However, we should not consider the category of “novel weaknesses” a closed category. We do not argue that weaknesses mentioned here do not, at least partially, overlap with others which already exist. In fact, in almost all cases of ‘novelty’, (i) there is some link back to a concept from traditional research ethics, and (ii) some thought has been given to the issue outside of a big data or biomedical context (e.g., the problem of ERCs’ expertise has arisen in other fields [ 42 ]). We believe that by creating conceptual clarity about novel oversight challenges presented by big data research, we can begin to identify tailored reforms.

Persistent weaknesses

As regulation for research oversight varies between countries, ERCs often suffer from a lack of harmonization. This weakness in the current oversight mechanism is compounded by big data research, which often relies on multi-center international consortia. These consortia in turn depend on approval by multiple oversight bodies demanding different types of scrutiny [ 43 ]. Furthermore, big data research may give rise to collaborations between public bodies, universities, corporations, foundations, and citizen science cooperatives. In this network, each stakeholder has different priorities and depends upon its own rules for regulation of the research process [ 44 – 46 ]. Indeed, this expansion of regulatory bodies and aims does not come with a coordinated effort towards agreed-upon review protocols [ 47 ]. The lack of harmonization is perpetuated by academic journals and funding bodies with diverging views on the ethics of big data. If the review bodies which constitute the “ethics ecosystem” [ 19 ] do not agree to the same ethics review requirements, a big data project deemed acceptable by an ERC in one country may be rejected by another ERC, within or beyond the national borders.

In addition, there is inconsistency in the assessment criteria used within and across committees. Researchers report subjective bias in the evaluation methodology of ERCs, as well as variations in ERC judgements which are not based on morally relevant contextual considerations [ 48 , 49 ]. Some authors have argued that the probability of research acceptance among experts increases if some research peer or same-field expert sits on the evaluation committee [ 50 , 51 ]. The judgement of an ERC can also be influenced by the boundaries of the scientific knowledge of its members. These boundaries can impact the ERC’s approach towards risk taking in unexplored fields of research [ 52 ]. Big data research might worsen this problem since the field is relatively new, with no standardized metric to assess risk within and across countries [ 53 ]. The committees do not necessarily communicate with each other to clarify their specific role in the review process, or try to streamline their approach to the assessment. This results in unclear oversight mandates and inconsistent ethical evaluations [ 27 , 54 ].

Additionally, ERCs may fall short in their efforts to justly redistribute the risks and benefits of research. The current review system is still primarily tilted toward protecting the interests of individual research participants. ERCs do not consistently assess societal benefit, or risks and benefits in light of the overall conduct of research (balancing risks for the individual with collective benefits). Although demands on ERCs vary from country to country [ 55 ], the ERC approach is still generally tailored towards traditional forms of biomedical research, such as clinical trials and longitudinal cohort studies with hospital patients. These studies are usually narrow in scope and carry specific risks only for the participants involved. In contrast, big data projects can impact society more broadly. As an example, computational technologies have shown potential to determine individuals’ sexual orientation by screening facial images [ 56 ]. An inadequate assessment of the common good resulting from this type of study can be socially detrimental [ 57 ]. In this sense, big data projects resemble public health research studies, with an ethical focus on the common good over individual autonomy [ 58 ]. Within this context, ERCs have an even greater responsibility to ensure the just distribution of research benefits across the population. Accurately determining the social value of big data research is challenging, as negative consequences may be difficult to detect before research begins. Nevertheless, this task remains a crucial objective of research oversight.

The literature reports examples of the failure of ERCs to be accountable and transparent [ 59 ]. This might be the result of an already unclear role of ERCs. Indeed, the ERCs practices are an outcome of different levels of legal, ethical, and professional regulations, which largely vary across jurisdictions. Therefore, some ERCs might function as peer counselors, others as independent advisors, and still others as legal controllers. What seems to be common across countries, though, is that ERCs rarely disclose their procedures, policies, and decision-making process. The ERCs’ “secrecy” can result in an absence of trust in the ethical oversight model [ 60 ].This is problematic because ERCs rely on public acceptance as accountable and trustworthy entities [ 61 ]. In big data research, as the number of data subjects is exponentially greater, a lack of accountability and an opaque deliberative process on the part of ERCs might bring even more significant public backlash. Ensuring truthfulness of the stated benefits and risks of research is a major determinant of trust in both science and research oversight. Researchers are another category of stakeholders negatively impacted by poor communication and publicity on the part of the ERC. Commentators have shown that ERCs often do not clearly provide guidance about the ethical standards applied in the research review [ 62 ]. For instance, if researchers provide unrealistic expectations of privacy and security to data subjects, ERCs have an institutional responsibility to flag those promises (e.g., about data security and the secondary-uses of subject data), especially when the research involves personal and high sensitivity data [ 63 ]. For their part, however, ERCs should make their expectations and decision-making processes clear.

Finally, ERCs face the increasing issue of being overwhelmed by the number of studies to review [ 64 , 65 ]. Whereas ERCs originally reviewed only human subjects research happening in natural sciences and medicine, over time they also became the ethical body of reference for those conducting human research in the social sciences (e.g., in behavioral psychology, educational sciences, etc.). This increase in demand creates pressure on ERC members, who often review research pro bono and on a voluntary basis. The wide range of big data research could exacerbate this existing issue. Having more research to assess and less time to accomplish the task may negatively impact the quality of the ERC’s output, as well as increase the time needed for review [ 66 ]. Consequently, researchers might carry out potentially risky studies because the relevant ethical issues of those studies were overlooked. Furthermore, research itself could be significantly delayed, until it loses its timely scientific value.

Novel weaknesses: purview weaknesses

To determine whether the ERC is still the most fit-for-purpose entity to oversee big data research, it is important to establish under which conditions big data projects fall under the purview of ERCs.

Historically, research oversight has primarily focused on human subject research in the biomedical field, using public funding. In the US for instance, each review board is responsible for a subtype of research based on content or methodology (for example there are IRBs dedicated to validating clinical trial protocols, assessing cancer treatments, examining pediatric research, and reviewing qualitative research). This traditional ethics review structure cannot accommodate big data research [ 2 ]. Big data projects often reach beyond a single institution, cut across disciplines, involve data collected from a variety of sources, re-use data not originally collected for research purposes, combine diverse methodologies, orient towards population-level research, rely on large data aggregates, and emerge from collaboration with the private sector. Given this scenario, big data projects may likely fall beyond the purview of ERCs.

Another case in which big data research does not fall under ERC purview is when it relies on anonymized data. If researchers use data that cannot be traced back to subjects (anonymized or non-personal data), then according to both the US Common Rule and HIPAA regulations, the project is considered safe enough to be granted an ethics review waiver. If instead researchers use pseudonymized (or de-identified) data, they must apply for research ethics review, as in principle the key that links the de-identified data with subjects could be revealed or hacked, causing harm to subjects. In the European Union, it would be left to each Member State (and national laws or policies at local institutions) to define whether research using anonymized data should seek ethical review. This case shows once more that current research ethics regulation is relatively loose and disjointed across jurisdictions, and may leave areas where big data research is unregulated. In particular, the special treatment given anonymized data comes from an emphasis on risk at the individual level. So far in the big data discourse, the concept of harm has been mainly linked to vulnerability in data protection. Therefore if privacy laws are respected, and protection is built into the data system, researchers can prevent harmful outcomes [ 40 ]. However, this view is myopic as it does not include other misuses of data aggregates, such as group discrimination and dignitary harm. These types of harm are already emerging in the big data ecosystem, where anonymized data reveal health patterns of a certain sub-group, or computational technologies include strong racial biases [ 67 , 68 ]. Furthermore, studies using anonymized data should not be deemed oversight-free by default, as it is increasingly hard to anonymize data. Technological advancements might soon make it possible to re-identify individuals from aggregate data sets [ 69 ].

The risks associated with big data projects also increase due to the variety of actors involved in research alongside university researchers (e.g., private companies, citizen science associations, bio-citizen groups, community workers cooperatives, foundations, and non-profit organizations) [ 70 , 71 ]. The novel aspect of health-related big data research compared with traditional research is that anyone who can access large amounts of data about individuals and build predictive models based on that data, can now determine and infer the health status of a person without directly engaging with that person in a research program [ 72 ]. Facebook, for example, is carrying out a suicide prediction and prevention project, which relies exclusively on the information that users post on the social network [ 18 ]. Because this type of research is now possible, and the available ethics review model exempts many big data projects from ERC appraisal, gaps in oversight are growing [ 17 , 73 ]. Just as corporations can re-use publicly available datasets (such as social media data) to determine life insurance premiums [ 74 ], citizen science projects can be conducted without seeking research oversight [ 75 ]. Indeed, participant-led big data research (despite being increasingly common) is another area where the traditional overview model is not effective [ 76 ]. In addition, ERCs might consider research conducted outside academia or publicly funded institutions to be not serious. Thus ERCs may disregard review requests from actors outside the academic environment (e.g., by the citizen science or health tech start up) [ 77 ].

Novel weaknesses: functional weaknesses

Functional weaknesses are those related to the skills, composition, and operational activities of ERCs in relation to big data research.

From this functional perspective, we argue that the ex-ante review model might not be appropriate for big data research. Project assessment at the project design phase or at the data collection level is insufficient to address emerging challenges that characterize big data projects – especially as data, over time, could become useful for other purposes, and therefore be re-used or shared [ 53 ]. Limitations of the ex-ante review model have already become apparent in the field of genetic research [ 78 ]. In this context, biobanks must often undergo a second ethics assessment to authorize the specific research use on exome sequencing of their primary data samples [ 79 ]. Similarly, in a case in which an ERC approved the original collection of sensitive personal data, a data access committee would ensure that the secondary uses are in line with original consent and ethics approval. However, if researchers collect data from publicly accessible platforms, they can potentially use and re-use data for research lawfully, without seeking data subject consent or ERC review. This is often the case in social media research. Social media data, which are collected by researchers or private companies using a form of broad consent, can be re-used by researchers to conduct additional analysis without ERC approval. It is not only the re-use of data that poses unforeseeable risks. The ex-ante approach might not be suitable to assess other stages of the data lifecycle [ 80 ], such as deployment machine learning algorithms.

Rather than re-using data, some big data studies build models on existing data (using data mining and machine learning methods), creating new data, which is then used to further feed the algorithms [ 81 ]. Sometimes it is not possible to anticipate which analytic models or tools (e.g., artificial intelligence) will be leveraged in the research. And even then, the nature of computational technologies which extract meaning from big data make it difficult to anticipate all the correlations that will emerge from the analysis [ 37 ]. This is an additional reason that big data research often has a tentative approach to a research question, instead of growing from a specific research hypothesis [ 82 ].The difficulty of clearly framing the big data research itself makes it even harder for ERCs to anticipate unforeseeable risks and potential societal consequences. Given the existing regulations and the intrinsic exploratory nature of big data projects, the mandate of ERCs does not appear well placed to guarantee research oversight. It seems even less so if we consider problems that might arise after the publication of big data studies, such as repurposing or dual-use issues [ 83 ].

ERCs also face the challenge of assessing the value of informed consent for big data projects. To re-obtain consent from research subjects is impractical, particularly when using consumer generated data (e.g., social media data) for research purposes. In these cases, researchers often rely on broad consent and consent waivers. This leaves the data subjects unaware of their participation in specific studies, and therefore makes them incapable of engaging with the research progress. Therefore, the data subjects and the communities they represent become vulnerable towards potential negative research outcomes. The tool of consent has limitations in big data research—it cannot disclose all possible future uses of data, in part because these uses may be unknown at the time of data generation. Moreover, researchers can access existing datasets multiple times and reuse the same data with alternative purposes [ 84 ]. What should be the ERCs’ strategy, given the current model of informed consent leaves an ethical gap in big data projects? ERCs may be tempted to focus on the consent challenge, neglecting other pressing big data issues [ 53 ]. However, the literature reports an increasing number of authors who are against the idea of a new consent form for big data studies [ 5 ].

A final widely discussed concern is the ERC’s inadequate expertise in the area of big data research [ 85 , 86 ]. In the past, there have been questions about the technical and statistical expertise of ERC members. For example, ERCs have attempted to conform social science research to the clinical trial model, using the same knowledge and approach to review both types of research [ 87 ]. However, big data research poses further challenges to ERCs’ expertise. First, the distinct methodology of big data studies (based on data aggregation and mining) requires a specialized technical expertise (e.g., information systems, self-learning algorithms, and anonymization protocols). Indeed, big data projects have a strong technical component, due to data volume and sources, which brings specific challenges (e.g., collecting data outside traditional protocols on social media) [ 88 , 89 ]. Second, ERCs may be unfamiliar with new actors involved in big data research, such as citizen science actors or private corporations. Because of this lack of relevant expertise, ERCs may require unjustified amendments to research studies, or even reject big data projects tout-court [ 36 ]. Finally, ERCs may lose credibility as an oversight body capable of assessing ethical violations and research misconduct. In the past, ERCs solved this challenge by consulting independent experts in a relevant field when reviewing a protocol in that domain. However, this solution is not always practical as it depends upon the availability of an expert. Furthermore, experts may be researchers working and publishing in the field themselves. This scenario would be problematic because researchers would have to define the rules experts must abide by, compromising the concept of independent review [ 19 ]. Nonetheless, this problem does not disqualify the idea of expertise but requires high transparency standards regarding rule development and compliance. Other options include ad-hoc expert committees or provision of relevant training for existing committee members [ 47 , 90 , 91 ]. Given these options, which one is best to address ERCs’ lack of expertise in big data research?

Reforming the ERC

Our analysis shows that ERCs play a critical role in ensuring ethical oversight and risk–benefit evaluation [ 92 ], assessing the scientific validity of a project in its early stages, and offering an independent, critical, and interdisciplinary approach to the review. These strengths demonstrate why the ERC is an oversight model worth holding on to. Nevertheless, ERCs carry persistent big data-specific weaknesses, reducing their effectiveness and appropriateness as oversight bodies for data-driven research. To answer our initial research question, we propose that the current oversight mechanism is not as fit for purpose to assess the ethics of big data research as it could be in principle. ERCs should be improved at several levels to be able to adequately address and overcome these challenges. Changes could be introduced at the level of the regulatory framework as well as procedures. Additionally, reforming the ERC model might mean introducing complementary forms of oversight. In this section we explore these possibilities. Figure 2 offers an overview of the reforms that could aid ERCs in improving their process.

Reforms overview for the research oversight mechanism

Regulatory reforms

The regulatory design of research oversight is the first aspect which needs reform. ERCs could benefit from new guidance (e.g., in the form of a flowchart) on the ethics of big data research. This guidance could build upon a deep rethinking of the importance of data for the functioning of societies, the way we use data in society, and our justifications for this use. In the UK, for instance, individuals can generally opt out of having their data (e.g., hospital visit data, health records, prescription drugs) stored by physicians’ offices or by NHS digital services. However, exceptions to this opt-out policy apply when uses of the data are vital to the functioning of society (for example, in the case of official national statistics or overriding public interest, such as the COVID-19 pandemic) [ 93 ].

We imagine this new guidance also re-defining the scope of ERC review, from protection of individual interest to a broader research impact assessment. In other words, it will allow the ERC’s scope to expand and to address purview issues which were previously discussed. For example, less research will be oversight-free because more factors would trigger ERC purview in the first place. The new governance would impose ERC review for research involving anonymized data, or big data research within public–private partnerships. Furthermore, ERC purview could be extended beyond the initial phase of the study to other points in the data lifecycle [ 94 ]. A possible option is to assess a study after its conclusion (as is the case in the pharmaceutical industry): ERCs could then decide if research findings and results should be released and further used by the scientific community. This new ethical guidance would serve ERCs not only in deciding whether a project requires review, but also in learning from past examples and best practices how to best proceed in the assessment. Hence, this guidance could come in handy to increase transparency surrounding assessment criteria used across ERCs. Transparency could be achieved by defining a minimum global standard for ethics assessment that allows international collaboration based on open data and a homogenous evaluation model. Acceptance of a global standard would also mean that the same oversight procedures will apply to research projects with similar risks and research paths, regardless of whether they are carried on by public or private entities. Increased clarification and transparency might also streamline the review process within and across committees, rendering the entire system more efficient.

Procedural reforms

Procedural reforms might target specific aspects of the ERC model to make it more suitable for the review of big data research. To begin with, ERCs should develop new operational tools to mitigate emerging big data challenges. For example, the AI Now algorithmic impact assessment tool, which appraises the ethics of automated decision systems, and informs decisions about whether or not to deploy the systems in society, could be used [ 95 ]. Forms of broad consent [ 96 ] and dynamic consent [ 20 ] can also address some of the issues raised, by using, re-using, and sharing big data (publicly available or not). Nonetheless, informed consent should not be considered a panacea for all ethical issues in big data research—especially in the case of publicly available social media data [ 97 ]. If the ethical implications of big data studies affect the society and its vulnerable sub-groups, individual consent cannot be relied upon as an effective safeguard. For this reason, ERCs should move towards a more democratic process of review. Possible strategies include engaging research subjects and communities in the decision-making process or promoting a co-governance system. The recent Montreal Declaration for Responsible AI is an example of an ethical oversight process developed out of public involvement [ 98 ]. Furthermore, this inclusive approach could increase the trustworthiness of the ethics review mechanism itself [ 99 ]. In practice, the more that ERCs involve potential data subjects in a transparent conversation about the risks of big data research, the more socially accountable the oversight mechanism will become.

ERCs must also address their lack of big data and general computing expertise. There are several potential ways to bridge this gap. First, ERCs could build capacity with formal training on big data. ERCs are willing to learn from researchers about social media data and computational methodologies used for data mining and analysis [ 85 ]. Second, ERCs could adjust membership to include specific experts from needed fields (e.g., computer scientists, biotechnologists, bioinformaticians, data protection experts). Third, ERCs could engage with external experts for specific consultations. Despite some resistance to accepting help, recent empirical research has shown that ERCs may be inclined to rely upon external experts in case of need [ 86 ].

In the data-driven research context, ERCs must embrace their role as regulatory stewards, and walk researchers through the process of ethics review [ 40 ]. ERCs should establish an open communication channel with researchers to communicate the value of research ethics while clarifying the criteria used to assess research. If ERCs and researchers agree to mutually increase transparency, they create an opportunity to learn from past mistakes and prevent future ones [ 100 ]. Universities might seek to educate researchers on ethical issues that can arise when conducting data-driven research. In general, researchers would benefit from training on identifying issues of ethics or completing ethics self-assessment forms, particularly if they are responsible for submitting projects for review [ 101 ]. As biomedical research is trending away from hospitals and clinical trials, and towards people’s homes and private corporations, researchers should strive towards greater clarity, transparency, and responsibility. Researchers should disclose both envisioned risks and benefits, as well as the anticipated impact at the individual and population level [ 54 ]. ERCs can then more effectively assess the impact of big data research and determine whether the common good is guaranteed. Furthermore, they might examine how research benefits are distributed throughout society. Localized decision making can play a role here [ 55 ]. ERCs may take into account characteristics specific to the social context, to evaluate whether or not the research respects societal values.

Complementary reforms

An additional measure to tackle the novelty of big data research might consist in reforming the current research ethics system through regulatory and procedural tools. However, this strategy may not be sufficient: the current system might require additional support from other forms of oversight to complement its work.

One possibility is the creation of hybrid review mechanisms and norms, merging valuable aspects of the traditional ERC review model with more innovative models, which have been adopted by various partners involved in the research (e.g., corporations, participants, communities) [ 102 ]. This integrated mechanism of oversight would cover all stages of big data research and involve all relevant stakeholders [ 103 ]. Journals and the publishing industry could play a role within this hybrid ecosystem in limiting potential dual use concerns. For instance, in the research publication phase, resources could be assigned to editors so as to assess research integrity standards and promote only those projects which are ethically aligned. However, these implementations can have an impact only when there is a shared understanding of best practice within the oversight ecosystem [ 19 ].

A further option is to include specialized and distinct ethical committees alongside ERCs, whose purpose is to assess big data research and provide sectorial accreditation to researchers. In this model, ERCs would not be overwhelmed by the numbers of study proposals to review and could outsource evaluations requiring specialist knowledge in the field of big data. It is true that specialized committees (data safety monitoring boards, data access committees, and responsible research and innovation panels) already exist and support big data researchers in ensuring data protection (e.g., system security, data storage, data transfer). However, something like a “data review board” could assess research implications both for the individual and society, while reviewing a project’s technical features. Peer review could play a critical role in this model: the research community retains the expertise needed to conduct ethical research and to support each other when the path is unclear [ 101 ].

Despite their promise, these scenarios all suffer from at least one primary limitation. The former might face a backlash when attempting to bring together the priorities and ethical values of various stakeholders, within common research norms. Furthermore, while decentralized oversight approaches might bring creativity over how to tackle hard problems, they may also be very dispersive and inefficient. The latter could suffer from overlapping scope across committees, resulting in confusing procedures, and multiplying efforts while diluting liability. For example, research oversight committees have multiplied within the United States, leading to redundancy and disharmony across committees [ 47 ]. Moreover, specialized big data ethics committees working in parallel with current ERCs could lead to questions over the role of the traditional ERC, when an increasing number of studies will be big data studies.

ERCs face several challenges in the context of big data research. In this article, we sought to bring clarity regarding those which might affect the ERC’s practice, distinguishing between novel and persistent weaknesses which are compounded by big data research. While these flaws are profound and inherent in the current sociotechnical transformation, we argue that the current oversight model is still partially capable of guaranteeing the ethical assessment of research. However, we also advance the notion that introducing reform at several levels of the oversight mechanism could benefit and improve the ERC system itself. Among these reforms, we identify the urgency for new ethical guidelines and new ethical assessment tools to safeguard society from novel risks brought by big data research. Moreover, we recommend that ERCs adapt their membership to include necessary expertise for addressing the research needs of the future. Additionally, ERCs should accept external experts’ consultations and consider training in big data technical features as well as big data ethics. A further reform concerns the need for transparent engagement among stakeholders. Therefore, we recommend that ERCs involve both researchers and data subjects in the assessment of big data research. Finally, we acknowledge the existing space for a coordinated and complementary support action from other forms of oversight. However, the actors involved must share a common understanding of best practice and assessment criteria in order to efficiently complement the existing oversight mechanism. We believe that these adaptive suggestions could render the ERC mechanism sufficiently agile and well-equipped to overcome data-intensive research challenges and benefit research at large.

Acknowledgements

This article reports the ideas and the conclusions emerged during a collaborative and participatory online workshop. All authors participated in the “Big Data Challenges for Ethics Review Committees” workshop, held online the 23-24 April 2020 and organized by the Health Ethics and Policy Lab, ETH Zurich.

Abbreviations

Ethics Review Committee(s)

Health Insurance Portability and Accountability Act

Institutional Review Board(s)

National Health Service

Research Ethics Committee(s)

United Kingdom

United States

Authors' contributions

AF drafted the manuscript, MI, MS1 and EV contributed substantially to the writing. EV is the senior lead on the project from which this article derives. All the authors (AF, MI, MS1, AB, ESD, BF, PF, JK, WK, PK, SML, CN, GS, MS2, MRV, EV) contributed greatly to the intellectual content of this article, edited it, and approved the final version. All authors read and approved the final manuscript.

This research is supported by the Swiss National Science Foundation under award 407540_167223 (NRP 75 Big Data). MS1 is grateful for funding from the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). The funding bodies did not take part in designing this research and writing the manuscript.

Availability of data and materials

Not applicable.

Declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

There is an unsettled discussion about whether ERCs ought to play a role in evaluating both scientific and ethical aspects of research, or whether these can even come apart—but we will not go into detail here. 25.Dawson AJ, Yentis SM. Contesting the science/ethics distinction in the review of clinical research. Journal of Medical Ethics. 2007;33(3):165–7, 26.Angell EL, Bryman A, Ashcroft RE, Dixon-Woods M. An analysis of decision letters by research ethics committees: the ethics/scientific quality boundary examined. BMJ Quality & Safety. 2008;17(2):131–6.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1. Moon MR. The history and role of institutional review boards: A useful tension. AMA J Ethics. 2009;11(4):311–316. doi: 10.1001/virtualmentor.2009.11.4.pfor1-0904. [ DOI ] [ PubMed ] [ Google Scholar ]
2. Friesen P, Kearns L, Redman B, Caplan AL. Rethinking the Belmont report? Am J Bioeth. 2017;17(7):15–21. doi: 10.1080/15265161.2017.1329482. [ DOI ] [ PubMed ] [ Google Scholar ]
3. Nebeker C, Torous J, Ellis RJB. Building the case for actionable ethics in digital health research supported by artificial intelligence. BMC Med. 2019;17(1):137. doi: 10.1186/s12916-019-1377-7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
4. Ienca M, Ferretti A, Hurst S, Puhan M, Lovis C, Vayena E. Considerations for ethics review of big data health research: A scoping review. PloS one. 2018;13(10). [ DOI ] [ PMC free article ] [ PubMed ]
5. Hibbin RA, Samuel G, Derrick GE. From “a fair game” to “a form of covert research”: Research ethics committee members’ differing notions of consent and potential risk to participants within social media research. J Empir Res Hum Res Ethics. 2018;13(2):149–159. doi: 10.1177/1556264617751510. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
6. Maldoff G. How GDPR changes the rules for research: International Association of Privacy Protection; 2020 [Available from: https://iapp.org/news/a/how-gdpr-changes-the-rules-for-research/ .
7. Samuel G, Buchanan E. Guest Editorial: Ethical Issues in Social Media Research. SAGE Publications Sage CA: Los Angeles, CA; 2020. p. 3–11. [ DOI ] [ PubMed ]
8. Shmueli G. Research Dilemmas with Behavioral Big Data. Big Data. 2017;5(2). [ DOI ] [ PubMed ]
9. Sula CA. Research ethics in an age of big data. Bull Assoc Inf Sci Technol. 2016;42(2):17–21. doi: 10.1002/bul2.2016.1720420207. [ DOI ] [ Google Scholar ]
10. Metcalf J, Crawford K. Where are human subjects in Big Data research? The emerging ethics divide. Big Data Soc. 2016;3(1):2053951716650211. doi: 10.1177/2053951716650211. [ DOI ] [ Google Scholar ]
11. Vayena E, Gasser U, Wood AB, O'Brien D, Altman M. Elements of a new ethical framework for big data research. Washington and Lee Law Review Online. 2016;72(3).
12. Goel V. As Data Overflows Online, Researchers Grapple With Ethics: The New York Times; 2014 [Available from: https://www.nytimes.com/2014/08/13/technology/the-boon-of-online-data-puts-social-science-in-a-quandary.html .
13. Vitak J, Shilton K, Ashktorab Z, editors. Beyond the Belmont principles: Ethical challenges, practices, and beliefs in the online data research community. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing; 2016.
14. BBC World. Facebook emotion experiment sparks criticism 2014 [Available from: https://www.bbc.com/news/technology-28051930 .
15. Fiske ST, Hauser RM. Protecting human research participants in the age of big data. Proc Natl Acad Sci USA. 2014;111(38):13675. doi: 10.1073/pnas.1414626111. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
16. Klitzman R, Appelbaum PS. Facebook’s emotion experiment: Implications for research ethics: The Hastings Center; 2014 [Available from: https://www.thehastingscenter.org/facebooks-emotion-experiment-implications-for-research-ethics/ .
17. Ballantyne A, Stewart C. Big Data and Public-Private Partnerships in Healthcare and Research. Asian Bioethics Review. 2019;11(3):315–326. doi: 10.1007/s41649-019-00100-7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
18. Barnett I, Torous J. Ethics, transparency, and public health at the intersection of innovation and Facebook's suicide prevention efforts. American College of Physicians; 2019. [ DOI ] [ PubMed ]
19. Samuel G, Derrick GE, van Leeuwen T. The ethics ecosystem: Personal ethics, network governance and regulating actors governing the use of social media research data. Minerva. 2019;57(3):317–343. doi: 10.1007/s11024-019-09368-3. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
20. Vayena E, Blasimme A. Biomedical big data: new models of control over access, use and governance. Journal of bioethical inquiry. 2017;14(4):501–513. doi: 10.1007/s11673-017-9809-6. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
21. BBC World. Google announces AI ethics panel: BBC World; 2019 [Available from: https://www.bbc.com/news/technology-47714921 .
22. Clegg N. Welcoming the Oversight Board - About Facebook: FACEBOOK; 2020 [updated 2020–05–06. Available from: https://about.fb.com/news/2020/05/welcoming-the-oversight-board/ .
23. Shabani M, Dove ES, Murtagh M, Knoppers BM, Borry P. Oversight of genomic data sharing: what roles for ethics and data access committees? Biopreservation and biobanking. 2017;15(5):469–474. doi: 10.1089/bio.2017.0045. [ DOI ] [ PubMed ] [ Google Scholar ]
24. Joly Y, Dove ES, Knoppers BM, Bobrow M, Chalmers D. Data sharing in the post-genomic world: the experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO) PLoS Comput Biol. 2012;8(7):e1002549. doi: 10.1371/journal.pcbi.1002549. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
25. Dawson AJ, Yentis SM. Contesting the science/ethics distinction in the review of clinical research. J Med Ethics. 2007;33(3):165–167. doi: 10.1136/jme.2006.016071. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
26. Angell EL, Bryman A, Ashcroft RE, Dixon-Woods M. An analysis of decision letters by research ethics committees: the ethics/scientific quality boundary examined. BMJ Qual Saf. 2008;17(2):131–136. doi: 10.1136/qshc.2007.022756. [ DOI ] [ PubMed ] [ Google Scholar ]
27. Nichols AS. Research ethics committees (RECS)/institutional review boards (IRBS) and the globalization of clinical research: Can ethical oversight of human subjects research be standardized. Wash U Global Stud L Rev. 2016;15:351. [ Google Scholar ]
28. Garrard E, Dawson A. What is the role of the research ethics committee? Paternalism, inducements, and harm in research ethics. J Med Ethics. 2005;31(7):419–423. doi: 10.1136/jme.2004.010447. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
29. Page SA, Nyeboer J. Improving the process of research ethics review. Research Integrity and Peer Review. 2017;2(1):14. doi: 10.1186/s41073-017-0038-7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
30. Bowen AJ. Models of institutional review board function. 2008.
31. McGuinness S. Research ethics committees: the role of ethics in a regulatory authority. J Med Ethics. 2008;34(9):695–700. doi: 10.1136/jme.2007.021089. [ DOI ] [ PubMed ] [ Google Scholar ]
32. Kane C, Takechi K, Chuma M, Nokihara H, Takagai T, Yanagawa H. Perspectives of non-specialists on the potential to serve as ethics committee members. J Int Med Res. 2019;47(5):1868–1876. doi: 10.1177/0300060518823941. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
33. Kirkbride J, George A. Lay REC members: patient and public. J Med Ethics. 2020;39(12):780–782. doi: 10.1136/medethics-2012-100642. [ DOI ] [ PubMed ] [ Google Scholar ]
34. Resnik DB. Trust as a Foundation for Research with Human Subjects. The Ethics of Research with Human Subjects: Protecting People, Advancing Science, Promoting Trust. Cham: Springer International Publishing; 2018. p. 87–111.
35. Kritikos M. Research Ethics Governance: The European Situation. Handbook of Research Ethics and Scientific Integrity. 2020:33–50.
36. Molina JL, Borgatti SP. Moral bureaucracies and social network research. Social Networks [Internet] 2019;16(11):2020. [ Google Scholar ]
37. Sheehan M, Dunn M, Sahan K. Reasonable disagreement and the justification of pre-emptive ethics governance in social research: a response to Hammersley. J Med Ethics. 2018;44:719–720. doi: 10.1136/medethics-2018-104975. [ DOI ] [ PubMed ] [ Google Scholar ]
38. Mustajoki H. Pre-emptive research ethics: Finnish NationalBoard on Research Integrity Tenk; 2018 [Available from: https://vastuullinentiede.fi/en/doing-research/pre-emptive-research-ethics .
39. Biagetti M, Gedutis A. Towards Ethical Principles of Research Evaluation in SSH. The Third Research Evaluation in SSH Conference, Valencia, 19–20 September 20192019. p. 19–20.
40. Dove ES. Regulatory Stewardship of Health Research: Edward Elgar Publishing; 2020.
41. Tene O, Polonetsky J. Beyond IRBs: Ethical guidelines for data research. Washington and Lee Law Review Online. 2016;72(3):458. [ Google Scholar ]
42. Bloss C, Nebeker C, Bietz M, Bae D, Bigby B, Devereaux M, et al. Reimagining human research protections for 21st century science. J Med Internet Res. 2016;18(12):e329. doi: 10.2196/jmir.6634. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
43. Dove ES, Garattini C. Expert perspectives on ethics review of international data-intensive research: Working towards mutual recognition. Research Ethics. 2018;14(1):1–25. doi: 10.1177/1747016117711972. [ DOI ] [ Google Scholar ]
44. van den Broek T, van Veenstra AF. Governance of big data collaborations: How to balance regulatory compliance and disruptive innovation. Technol Forecast Soc Chang. 2018;129:330–338. doi: 10.1016/j.techfore.2017.09.040. [ DOI ] [ Google Scholar ]
45. Jackman M, Kanerva L. Evolving the IRB: building robust review for industry research. Washington and Lee Law Review Online. 2016;72(3):442. [ Google Scholar ]
46. Someh I, Davern M, Breidbach CF, Shanks G. Ethical issues in big data analytics: A stakeholder perspective. Commun Assoc Inf Syst. 2019;44(1):34. [ Google Scholar ]
47. Friesen P, Redman B, Caplan A. Of Straws, Camels, Research Regulation, and IRBs. Therapeutic innovation & regulatory science. 2019;53(4):526–534. doi: 10.1177/2168479018783740. [ DOI ] [ PubMed ] [ Google Scholar ]
48. Kohn T, Shore C. The ethics of university ethics committees. Risk management and the research imagination, in Death of the public university. 2017:229–49.
49. Friesen P, Yusof ANM, Sheehan M. Should the Decisions of Institutional Review Boards Be Consistent? Ethics & human research. 2019;41(4):2–14. doi: 10.1002/eahr.500022. [ DOI ] [ PubMed ] [ Google Scholar ]
50. Binik A, Hey SP. A framework for assessing scientific merit in ethical review of clinical research. Ethics & human research. 2019;41(2):2–13. doi: 10.1002/eahr.500007. [ DOI ] [ PubMed ] [ Google Scholar ]
51. Derrick GE, Haynes A, Chapman S, Hall WD. The association between four citation metrics and peer rankings of research influence of Australian researchers in six fields of public health. PLoS ONE. 2011;6(4):e18521. doi: 10.1371/journal.pone.0018521. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
52. Luukkonen T. Conservatism and risk-taking in peer review: Emerging ERC practices. Research Evaluation. 2012;21(1):48–60. doi: 10.1093/reseval/rvs001. [ DOI ] [ Google Scholar ]
53. Dove ES, Townend D, Meslin EM, Bobrow M, Littler K, Nicol D, et al. Ethics review for international data-intensive research. Science. 2016;351(6280):1399–1400. doi: 10.1126/science.aad5269. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
54. Abbott L, Grady C. A systematic review of the empirical literature evaluating IRBs: What we know and what we still need to learn. J Empir Res Hum Res Ethics. 2011;6(1):3–19. doi: 10.1525/jer.2011.6.1.3. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
55. Shaw DM, Elger BS. The relevance of relevance in research. Swiss Medical Weekly. 2013;143(1920). [ DOI ] [ PubMed ]
56. Kosinski Y, Wang M. Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. J Pers Soc Psychol. 2018;114(2):246–257. doi: 10.1037/pspa0000098. [ DOI ] [ PubMed ] [ Google Scholar ]
57. Levin S. LGBT groups denounce 'dangerous' AI that uses your face to guess sexuality: The Guardian; 2017 [updated 2017–09–09. Available from: http://www.theguardian.com/world/2017/sep/08/ai-gay-gaydar-algorithm-facial-recognition-criticism-stanford .
58. Tan S, Zhao Y, Huang W. Neighborhood Social Disadvantage and Bicycling Behavior: A Big Data-Spatial Approach Based on Social Indicators. Soc Indic Res. 2019;145(3):985–999. doi: 10.1007/s11205-019-02120-0. [ DOI ] [ Google Scholar ]
59. Lynch HF. Opening closed doors: Promoting IRB transparency. J Law Med Ethics. 2018;46(1):145–158. doi: 10.1177/1073110518766028. [ DOI ] [ Google Scholar ]
60. Samuel GN, Farsides B. Public trust and ‘ethics review’as a commodity: the case of Genomics England Limited and the UK’s 100,000 genomes project. Med Health Care Philos. 2018;21(2):159–168. doi: 10.1007/s11019-017-9810-1. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
61. Nebeker C, Lagare T, Takemoto M, Lewars B, Crist K, Bloss CS, et al. Engaging research participants to inform the ethical conduct of mobile imaging, pervasive sensing, and location tracking research. Translational behavioral medicine. 2016;6(4):577–586. doi: 10.1007/s13142-016-0426-4. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
62. Clapp JT, Gleason KA, Joffe S. Justification and authority in institutional review board decision letters. Soc Sci Med. 2017;194:25–33. doi: 10.1016/j.socscimed.2017.10.013. [ DOI ] [ PubMed ] [ Google Scholar ]
63. Sheehan M, Friesen P, Balmer A, Cheeks C, Davidson S, Devereux J, et al. Trust, trustworthiness and sharing patient data for research. Journal of Medical Ethics [Internet]. 2020. [ DOI ] [ PubMed ]
64. Klitzman R. The ethics police?: The struggle to make human research safe: Oxford University Press; 2015.
65. Cantonal Ethics Committee Zurich. Annual Report 2019. 2019 [Available from: https://www.zh.ch/content/dam/zhweb/bilder-dokumente/organisation/gesundheitsdirektion/ethikkommission-/jahresberichte-kek/Jahresbericht_KEK%20ZH%202019_09-03-2020_PKL.pdf .
66. Lynch HF, Abdirisak M, Bogia M, Clapp J. Evaluating the quality of research ethics review and oversight: a systematic analysis of quality assessment instruments. AJOB Empirical Bioethics. 2020:1–15. [ DOI ] [ PubMed ]
67. Hoffman S. What genetic testing teaches about long-term predictive health analytics regulation. 2019.
68. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–453. doi: 10.1126/science.aax2342. [ DOI ] [ PubMed ] [ Google Scholar ]
69. Yoshiura H. Re-identifying people from anonymous histories of their activities. 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST); 23–25 Oct. 20192019. p. 1–5.
70. Holm S, Ploug T. Big Data and Health Research—The Governance Challenges in a Mixed Data Economy. Journal of Bioethical Inquiry. 2017;14(4):515–525. doi: 10.1007/s11673-017-9810-0. [ DOI ] [ PubMed ] [ Google Scholar ]
71. Nebeker C. mHealth Research Applied to Regulated and Unregulated Behavioral Health Sciences. The Journal of Law, Medicine & Ethics. 2020;48(1_suppl):49–59. [ DOI ] [ PubMed ]
72. Marks M. Emergent Medical Data: Health Information Inferred by Artificial Intelligence. UC Irvine Law Review (2021, Forthcoming). 2020.
73. Friesen P, Douglas Jones R, Marks M, Pierce R, Fletcher K, Mishra A, et al. Governing AI-driven health research: are IRBs up to the task? Ethics & Human Research. 2020 Forthcoming [ DOI ] [ PubMed ]
74. Baron J. Life Insurers Can Use Social Media Posts To Determine Premiums, As Long As They Don't Discriminate: Forbes; 2019 [Available from: https://www.forbes.com/sites/jessicabaron/2019/02/04/life-insurers-can-use-social-media-posts-to-determine-premiums/ .
75. Wiggins A, Wilbanks J. The rise of citizen science in health and biomedical research. Am J Bioeth. 2019;19(8):3–14. doi: 10.1080/15265161.2019.1619859. [ DOI ] [ PubMed ] [ Google Scholar ]
76. Ienca M, Vayena E. “Hunting Down My Son’s Killer”: New Roles of Patients in Treatment Discovery and Ethical Uncertainty. Journal of Bioethical Inquiry. 2020:1–11. [ DOI ] [ PubMed ]
77. Grant AD, Wolf GI, Nebeker C. Approaches to governance of participant-led research: a qualitative case study. BMJ Open. 2019;9(4):e025633. doi: 10.1136/bmjopen-2018-025633. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
78. Mascalzoni D, Hicks A, Pramstaller P, Wjst M. Informed consent in the genomics era. PLoS Med. 2008;5(9):e192. doi: 10.1371/journal.pmed.0050192. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
79. McGuire AL, Beskow LM. Informed consent in genomics and genetic research. Annu Rev Genomics Hum Genet. 2010;11:361–381. doi: 10.1146/annurev-genom-082509-141711. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
80. Roth S, Luczak-Roesch M. Deconstructing the data life-cycle in digital humanitarianism. Inf Commun Soc. 2020;23(4):555–571. doi: 10.1080/1369118X.2018.1521457. [ DOI ] [ Google Scholar ]
81. Gal A, Senderovich A. Process Minding: Closing the Big Data Gap. International Conference on Business Process Management: Springer; 2020. p. 3–16.
82. Ferretti A, Ienca M, Hurst S, Vayena E. Big Data, Biomedical Research, and Ethics Review: New Challenges for IRBs. Ethics & human research. 2020;42(5):17–28. doi: 10.1002/eahr.500065. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
83. Ienca M, Vayena E. Dual use in the 21st century: emerging risks and global governance. Swiss Med Wkly. 2018;148:w14688. doi: 10.4414/smw.2018.14688. [ DOI ] [ PubMed ] [ Google Scholar ]
84. Shabani M, Borry P. Rules for processing genetic data for research purposes in view of the new EU General Data Protection Regulation. Eur J Hum Genet. 2018;26(2):149–156. doi: 10.1038/s41431-017-0045-7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
85. Nebeker C, Harlow J, Espinoza Giacinto R, Orozco-Linares R, Bloss CS, Weibel N. Ethical and regulatory challenges of research using pervasive sensing and other emerging technologies: IRB perspectives. AJOB empirical bioethics. 2017;8(4):266–276. doi: 10.1080/23294515.2017.1403980. [ DOI ] [ PubMed ] [ Google Scholar ]
86. Sellers C, Samuel G, Derrick G. Reasoning, “uncharted territory”: notions of expertise within ethics review panels assessing research use of social media. J Empir Res Hum Res Ethics. 2020;15(1–2):28–39. doi: 10.1177/1556264619837088. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
87. Schrag ZM. The case against ethics review in the social sciences. Research Ethics. 2011;7(4):120–131. doi: 10.1177/174701611100700402. [ DOI ] [ Google Scholar ]
88. Beskow LM, Hammack-Aviran CM, Brelsford KM, O'Rourke PP. Expert Perspectives on Oversight for Unregulated mHealth Research: Empirical Data and Commentary. The Journal of Law, Medicine & Ethics. 2020;48(1_suppl):138–46. [ DOI ] [ PMC free article ] [ PubMed ]
89. Huh-Yoo J, Rader E. It's the Wild, Wild West: Lessons Learned From IRB Members' Risk Perceptions Toward Digital Research Data. Proceedings of the ACM on Human-Computer Interaction. 2020;4(CSCW1):1–22. doi: 10.1145/3392868. [ DOI ] [ Google Scholar ]
90. Research; NHA. Gene Therapy Advisory Committee 2020 [Available from: https://www.hra.nhs.uk/about-us/committees-and-services/res-and-recs/gene-therapy-advisory-committee/ .
91. Research; NHA. The Social Care Research Ethics Committee (REC) 2020 [Available from: https://www.hra.nhs.uk/planning-and-improving-research/policies-standards-legislation/social-care-research/ .
92. Sheehan M, Dunn M, Sahan K. In defence of governance: ethics review and social research. J Med Ethics. 2017;44(10):710–716. doi: 10.1136/medethics-2017-104443. [ DOI ] [ PubMed ] [ Google Scholar ]
93. NHS UK. When your choice does not apply. 2019 [Available from: https://www.nhs.uk/your-nhs-data-matters/where-your-choice-does-not-apply/ .
94. Master Z, Martinson BC, Resnik DB. Expanding the scope of research ethics consultation services in safeguarding research integrity: Moving beyond the ethics of human subjects research. Am J Bioeth. 2018;18(1):55–57. doi: 10.1080/15265161.2017.1401167. [ DOI ] [ PubMed ] [ Google Scholar ]
95. Reisman D, Schultz J, Crawford K. Whittaker M. Algorithmic impact assessments: A practical framework for public agency accountability. AI Now Institute; 2018. pp. 1–22. [ Google Scholar ]
96. Sheehan M. Broad consent is informed consent Bmj. 2011;343:d6900. doi: 10.1136/bmj.d6900. [ DOI ] [ PubMed ] [ Google Scholar ]
97. Sheehan M, Thompson R, Fistein J, Davies J, Dunn M, Parker M, et al. Authority and the Future of Consent in Population-Level Biomedical Research. Public Health Ethics. 2019;12(3):225–236. doi: 10.1093/phe/phz015. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
98. Montréal; Ud. Montréal Declaration for a Responsible Development of Artificial Intelligence 2019 [Available from: https://www.montrealdeclaration-responsibleai.com .
99. McCoy MS, Jongsma KR, Friesen P, Dunn M, Neuhaus CP, Rand L, et al. National Standards for Public Involvement in Research: missing the forest for the trees. J Med Ethics. 2018;44(12):801–804. doi: 10.1136/medethics-2018-105088. [ DOI ] [ PubMed ] [ Google Scholar ]
100. Brown C, Spiro J, Quinton S. The role of research ethics committees: Friend or foe in educational research? An exploratory study. Br Edu Res J. 2020;46(4):747–769. doi: 10.1002/berj.3654. [ DOI ] [ Google Scholar ]
101. Pagoto S, Nebeker C. How scientists can take the lead in establishing ethical practices for social media research. J Am Med Inform Assoc. 2019;26(4):311–313. doi: 10.1093/jamia/ocy174. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
102. Harlow J, Weibel N, Al Kotob R, Chan V, Bloss C, Linares-Orozco R, et al. Using participatory design to inform the Connected and Open Research Ethics (CORE) commons. Sci Eng Ethics. 2020;26(1):183–203. doi: 10.1007/s11948-019-00086-3. [ DOI ] [ PubMed ] [ Google Scholar ]
103. Vayena E, Blasimme A. Health research with big data: Time for systemic oversight. J Law Med Ethics. 2018;46(1):119–129. doi: 10.1177/1073110518766026. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

View on publisher site
PDF (1.3 MB)
Collections

Add to Collections

Big Data and the Assault on Science

Gary Smith Pomona College

Author Biography

Gary smith, pomona college.

Gary N. Smith is the Fletcher Jones professor of economics at Pomona College, California, United States. E-mail: [email protected] . Website: http://garysmithn.com .

How to Cite

Endnote/Zotero/Mendeley (RIS)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License .

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access ).
Authors will be required to sign our standard License Ageement before publication.

A Word document containing the License Agreement is available for download here .

More information about the publishing system, Platform and Workflow by OJS/PKP.

COMMENTS

Scientific Research and Big Data
Scientific Research and Big Data. First published Fri May 29, 2020. Big Data promises to revolutionise the production of knowledge within and beyond science, by enabling novel, highly efficient ways to plan, conduct, disseminate and assess research. The last few decades have witnessed the creation of novel ways to produce, store, and analyse ...
15 years of Big Data: a systematic literature review
Over the past 15 years, Big Data has emerged as a foundational pillar providing support to an extensive range of different scientific fields, from medicine and healthcare [] to engineering [], finance and marketing [3,4,5], politics [], social networks analysis [7, 8], and telecommunications [], to cite only a few examples.This 15-year period has witnessed a significant increase in research ...
Big Data Research
About the journal. The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as ...
The impact of big data on research methods in information science
The influence of big data on research methods cannot be overstated, as it has a profound impact on the way research is conducted. However, the application of these research methods to big data-related studies ultimately depends on the nature, emphasis, and perspective of the specific research study at hand.
What is your definition of Big Data? Researchers' understanding of the
Of the 39 researchers, 27 explicitly stated that they were working on Big Data research projects or on projects that involve Big Data methodologies. Four participants replied that they were not involved in Big Data research and eight were unsure whether their research could be described as Big Data research (See Table 3). A significant ...
Notes to Scientific Research and Big Data
Back to Top. Notes to Scientific Research and Big Data. 1. When a data collection can or should be regarded as "big data", and the significance of this particular label for research, is discussed at length in Leonelli (2016), Kitchin and McArdle (2016) and Aronova, van Oertzen, and Sepkoski (2017). 2. This understanding of scientific ...
A review of big data and medical research
In this descriptive review, we highlight the roles of big data, the changing research paradigm, and easy access to research participation via the Internet fueled by the need for quick answers. Universally, data volume has increased, with the collection rate doubling every 40 months, ever since the 1980s. 4 The big data age, starting in 2002 ...
Big Data, new epistemologies and paradigm shifts
Rather than empiricism and the end of theory, it is argued by some that data-driven science will become the new paradigm of scientific method in an age of Big Data because the epistemology favoured is suited to extracting additional, valuable insights that traditional 'knowledge-driven science' would fail to generate (Kelling et al., 2009 ...
Ethics review of big data research: What should stay and what should be
In big data research, as the number of data subjects is exponentially greater, a lack of accountability and an opaque deliberative process on the part of ERCs might bring even more significant public backlash. Ensuring truthfulness of the stated benefits and risks of research is a major determinant of trust in both science and research oversight.
Big Data and the Assault on Science
The scientific revolution has been fueled by using data to test theories, so it might be assumed that big data has now created a golden age for science. If anything, the opposite is true. Big data and powerful computers have inflamed the replication crisis that is undermining the credibility of scientists and scientific research.

15 years of Big Data: a systematic literature review

Introduction

Research method

Search strategy

Inclusion/Exclusion criteria

Study quality assessment

Data extraction

SLR: implementation

SLR: results

PS1—A comprehensive and systematic literature review on the Big Data management techniques in the internet of things [ 20 ]

PS2—A comprehensive review on Big Data for industries challenges and opportunities [ 21 ]

PS3—A survey on IoT Big Data current status, 13 V’s challenges, and future directions [ 22 ]

PS4—A systematic literature review on features of deep learning in Big Data analytics [ 23 ]

PS5—A systematic survey of data mining and Big Data analysis in internet of things [ 24 ]

PS6—Access methods for Big Data: current status and future directions [ 25 ]

PS7—An industrial Big Data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities [ 26 ]

PS8—Applications of Big Data in emerging management disciplines: a literature review using text mining [ 27 ]

PS9—Applying Big Data analytics in higher education: a systematic mapping study [ 28 ]

PS10—Artificial intelligence approaches and mechanisms for Big Data analytics: a systematic study [ 29 ]

PS11—Bibliometric mining of research directions and trends for Big Data [ 30 ]

PS12—Big Data adoption: state of the art and research challenges [ 31 ]

PS13—Big Data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions [ 32 ]

PS14—Big Data analytics in healthcare: a systematic literature review and road map for practical implementation [ 33 ]

PS15—Big Data analytics in telecommunications: literature review and architecture recommendations [ 34 ]

PS16—Big Data analytics meets social media: a systematic review of techniques, open issues, and future directions [ 35 ]

PS17—Big Data and its future in computational biology: a literature review [ 36 ]

PS18—Big Data and sentiment analysis: a comprehensive and systematic literature review [ 37 ]

PS19—Big Data applications on the internet of things: a systematic literature review [ 38 ]

PS20—Big Data in education: a state of the art, limitations, and future research directions [ 39 ]

PS21—Big Data in healthcare—a comprehensive bibliometric analysis of current research trends [ 40 ]

PS22—Big Data life cycle in shop-floor-trends and challenges [ 41 ]

PS23—Big Data testing techniques: taxonomy, challenges and future trends [ 42 ]

PS24—Big Data with cognitive computing: a review for the future [ 43 ]

PS25—Current approaches for executing Big Data science projects-a systematic literature review [ 44 ]

PS26—Data quality affecting Big Data analytics in smart factories: research themes, issues and methods [ 45 ]

PS27—Harnessing Big Data analytics for healthcare: a comprehensive review of frameworks, implications, applications, and impacts [ 46 ]

PS28—Leveraging Big Data in smart cities: a systematic review [ 47 ]

PS29—Roles and capabilities of enterprise architecture in Big Data analytics technology adoption and implementation [ 48 ]

PS30—Security and privacy challenges of Big Data adoption: a qualitative study in telecommunication industry [ 49 ]

PS31—The role of AI, machine learning, and Big Data in digital twinning: a systematic literature review, challenges, and opportunities [ 50 ]

PS32—The state of the art and taxonomy of Big Data analytics: view from new Big Data framework [ 51 ]

RQ1: what are the most common application domains for Big Data analytics, and how have they evolved over time?

RQ2: what are the major challenges and limitations that researchers have encountered in Big Data analysis, and how have they been addressed?

RQ3: what are the emerging research trends and directions in Big Data that will likely shape the field in the next 5 to 10 years?

Threats to validity

Data availability

Bibliography

Acknowledgments

Author information

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Share this article

Notes to Scientific Research and Big Data

Support SEP

Ethics review of big data research: What should stay and what should be reformed?

Conclusions

Strengths of the ethics review via ERC

Weaknesses of the ethics review via ERC

Persistent weaknesses

Novel weaknesses: purview weaknesses

Novel weaknesses: functional weaknesses

Reforming the ERC

Regulatory reforms

Procedural reforms

Complementary reforms

Acknowledgements

Abbreviations

Authors' contributions

Availability of data and materials

Declarations

Associated Data

Data Availability Statement

Similar articles

Add to Collections

Big Data and the Assault on Science

Author Biography