To read this content please select one of the options below:

Please note you do not have access to teaching notes, accounting, accountability, social media and big data: revolution or hype.

Accounting, Auditing & Accountability Journal

ISSN : 0951-3574

Article publication date: 15 May 2017

The purpose of this paper is to outline an agenda for researching the relationship between technology-enabled networks – such as social media and big data – and the accounting function. In doing so, it links the contents of an unfolding area research with the papers published in this special issue of Accounting, Auditing and Accountability Journal .

Design/methodology/approach

The paper surveys the existing literature, which is still in its infancy, and proposes ways in which to frame early and future research. The intention is not to offer a comprehensive review, but to stimulate and conversation.

The authors review several existing studies exploring technology-enabled networks and highlight some of the key aspects featuring social media and big data, before offering a classification of existing research efforts, as well as opportunities for future research. Three areas of investigation are identified: new performance indicators based on social media and big data; governance of social media and big data information resources; and, finally, social media and big data’s alteration of information and decision-making processes.

Originality/value

The authors are currently experiencing a technological revolution that will fundamentally change the way in which organisations, as well as individuals, operate. It is claimed that many knowledge-based jobs are being automated, as well as others transformed with, for example, data scientists ready to replace even the most qualified accountants. But, of course, similar claims have been made before and therefore, as academics, the authors are called upon to explore the impact of these technology-enabled networks further. This paper contributes by starting a debate and speculating on the possible research agendas ahead.

  • Social media
  • Management control

Acknowledgements

The authors wish to acknowledge the helpful comments of the reviewers. These have greatly improved the quality of the manuscript and the arguments contained within. In addition, the authors would like to thank Gloria Parker and Rainbow Shum for their expert help in liaising with Emerald and working through Scholar One. The authors are also appreciative of the effort and support of the authors who submitted papers to the special issue and reviewers who devoted their time and effort to the refereeing process. Finally, the authors are immensely grateful for Professor James Guthrie for his practical support intellectual encouragement, and wise counsel as the authors brought this special issue to fruition.

This paper forms part of the Accounting, Accountability, Social Media and Big Data: Revolution or Hype? Special issue.

Arnaboldi, M. , Busco, C. and Cuganesan, S. (2017), "Accounting, accountability, social media and big data: revolution or hype?", Accounting, Auditing & Accountability Journal , Vol. 30 No. 4, pp. 762-776. https://doi.org/10.1108/AAAJ-03-2017-2880

Emerald Publishing Limited

Copyright © 2017, Emerald Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

{phrase('archive_name')}

MURAL - Maynooth University Research Archive Library

  • Browse by Year
  • Browse by Academic Unit
  • Browse by Author
  • Repository Policies
  • Cite from MURAL
  • MURAL Libguide

Big Data – Hype or Revolution?

Kitchin, Rob (2016) Big Data – Hype or Revolution? In: The SAGE Handbook of Social Media Research Methods. SAGE Publications, pp. 27-39. ISBN 9781473916326

Share your research

Twitter

Abstract included in text.

Actions (login required)

Repository staff only(login required).

Downloads per month over past year

View more statistics

Origin of downloads

-

University of Roehampton Research Explorer Logo

Accounting, accountability, social media and big data: Revolution or hype?

  • Centre for Sustainability and Responsible Management

Research output : Contribution to journal › Article › peer-review

Access to Document

  • 10.1108/AAAJ-03-2017-2880
  • Accounting, accountability, social media and big data: Revolution or hype? Accepted author manuscript, 496 KB Licence: CC BY-NC
  • https://doi.org/10.1108/AAAJ-03-2017-2880

T1 - Accounting, accountability, social media and big data: Revolution or hype?

AU - Arnaboldi, Michela

AU - Busco, Cristiano

AU - Cuganesan, Suresh

PY - 2017/5/15

Y1 - 2017/5/15

N2 - Purpose – This paper outlines an agenda for researching the relationship between technology enabled networks – such as social media and big data - and the accounting function. In doing so, it links the contents of an unfolding area research with the papers published in this Accounting, Auditing and Accountability Journal special issue.Design/methodology – The paper surveys the existing literature, which is still in its infancy, and proposes ways in which to frame early and future research. The intention is not to offer a comprehensive review, but to stimulate and conversation.Findings – We review several existing studies exploring technology enabled networks and highlight some of the key aspects featuring social media and big data, before offering a classification of existing research efforts, as well as opportunities for future research. Three areas of investigation are identified: new performance indicators based on social media and big data;governance of social media and big data information resources; and, finally, social media and big data’s alteration of information and decision-making processes.Originality/value – We are currently experiencing a technological revolution that will fundamentally change the way in which organisations, as well as individuals, operate. It is claimed that many knowledge-based jobs are being automated, as well as others transformed with, for example, data scientists ready to replace even the most qualified accountants. But, of course, similar claims have been made before and therefore, as academics, we are called upon to explore the impact of these technology enabled networks further. This paper contributes by starting a debate and speculating on the possible research agendas ahead. © 2017, Emerald Group Publishing . The attached document (embargoed until 15/05/2019) is an author produced version of a paper published in Accounting, Auditing & Accountability Journal, uploaded in accordance with the publisher’s self- archiving policy. The final published version (version of record) is available online at https://doi.org/10.1108/AAAJ-03-2017-2880. Some minor differences between this version and the final published version may remain. We suggest you refer to the final published version should you wish to cite from it.

AB - Purpose – This paper outlines an agenda for researching the relationship between technology enabled networks – such as social media and big data - and the accounting function. In doing so, it links the contents of an unfolding area research with the papers published in this Accounting, Auditing and Accountability Journal special issue.Design/methodology – The paper surveys the existing literature, which is still in its infancy, and proposes ways in which to frame early and future research. The intention is not to offer a comprehensive review, but to stimulate and conversation.Findings – We review several existing studies exploring technology enabled networks and highlight some of the key aspects featuring social media and big data, before offering a classification of existing research efforts, as well as opportunities for future research. Three areas of investigation are identified: new performance indicators based on social media and big data;governance of social media and big data information resources; and, finally, social media and big data’s alteration of information and decision-making processes.Originality/value – We are currently experiencing a technological revolution that will fundamentally change the way in which organisations, as well as individuals, operate. It is claimed that many knowledge-based jobs are being automated, as well as others transformed with, for example, data scientists ready to replace even the most qualified accountants. But, of course, similar claims have been made before and therefore, as academics, we are called upon to explore the impact of these technology enabled networks further. This paper contributes by starting a debate and speculating on the possible research agendas ahead. © 2017, Emerald Group Publishing . The attached document (embargoed until 15/05/2019) is an author produced version of a paper published in Accounting, Auditing & Accountability Journal, uploaded in accordance with the publisher’s self- archiving policy. The final published version (version of record) is available online at https://doi.org/10.1108/AAAJ-03-2017-2880. Some minor differences between this version and the final published version may remain. We suggest you refer to the final published version should you wish to cite from it.

U2 - 10.1108/AAAJ-03-2017-2880

DO - 10.1108/AAAJ-03-2017-2880

M3 - Article

SN - 0951-3574

JO - Accounting, Auditing & Accountability Journal

JF - Accounting, Auditing & Accountability Journal

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • News & Views
  • What’s holding up the...

What’s holding up the big data revolution in healthcare?

  • Related content
  • Peer review
  • Kiret Dhindsa , postdoctoral fellow 1 ,
  • Mohit Bhandari , professor 2 ,
  • Ranil R Sonnadara , associate professor 2
  • 1 Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
  • 2 Department of Surgery, McMaster University, Hamilton, Ontario, Canada
  • Correspondence to: K Dhindsa dhindsj{at}mcmaster.ca

Poor data quality, incompatible datasets, inadequate expertise, and hype

Big data refers to datasets that are too large or complex to analyse with traditional methods. 1 Instead we rely on machine learning—self updating algorithms that build predictive models by finding patterns in data. 2 In recent years, a so called “big data revolution” in healthcare has been promised 3 4 5 so often that researchers are now asking why this supposed inevitability has not happened. 6 Although some technical barriers have been correctly identified, 7 there is a deeper issue: many of the data are of poor quality and in the form of small, incompatible datasets.

Current practices around collection, curation, and sharing of data make it difficult to apply machine learning to healthcare on a large scale. We need to develop, evaluate, and adopt modern health data standards that guarantee data quality, ensure that datasets from different institutions are compatible for pooling, and allow timely access to datasets by researchers and others. These prerequisites for machine learning have not yet been met.

Part of the problem is that the hype surrounding machine learning obscures the reality that it is just a tool for data science with its own requirements and limitations. The hype also fails to acknowledge that all healthcare tools must work within a wide range of human constraints, from the molecular to the social and political. Each of these will limit what can be achieved: even big technical advances may have only modest effects when integrated into the complex framework of clinical practice and healthcare delivery.

Although machine learning is the state of the art in predictive big data analytics, it is still susceptible to poor data quality, 8 9 sometimes in uniquely problematic ways. 2 Machine learning, including its more recent incarnation deep learning, 10 performs tasks involving pattern recognition (generally a combination of classification, regression, dimensionality reduction, and clustering 11 ). The ability to detect even the most subtle patterns in raw data is a double edged sword: machine learning algorithms, like humans, can easily be misdirected by spurious and irrelevant patterns. 12

For example, medical imaging datasets are often riddled with annotations—made directly on the images—that carry information about specific diagnostic features found by clinicians. This is disastrous in the machine learning context, where an algorithm trained using datasets that include annotated images will seem to perform extremely well on standard tests but fail to work in a real world scenario where similar annotations are not available. Since these algorithms find patterns regardless of how meaningful those patterns are to humans, the rule “garbage in, garbage out” may apply even more than usual. 13

Even if we had good data, would we have enough? Healthcare data are currently distributed across multiple institutions, collected using different procedures, and formatted in different ways. Machine learning algorithms recognise patterns by exploiting sources of variance in large datasets. But inconsistencies across institutions mean that combining datasets to achieve the required size easily introduces an insurmountable degree of non-predictive variability. This makes it all too easy for a machine learning algorithm to miss the truly important patterns and latch onto the more dominating patterns introduced by institutional differences.

Holistic solution

A holistic solution to problems of data quality and quantity would include adoption of consistent health data standards across institutions, complete with new data sharing policies that ensure ongoing protection of patient privacy. If healthcare leaders see an opportunity to advance patient care with big data and machine learning, they must take the initiative to establish new data policies in consultation with clinicians, data scientists, patients, and the public.

Improved data management is clearly necessary if machine learning algorithms are to generate models that can transition successfully from the laboratory to clinical practice. How should we go about it? Effective data management requires specialist training in data science and information technology, and detailed knowledge of the nuances associated with data types, applications, and domains, including how they relate to machine learning. This points to a growing role for data management specialists and knowledge engineers who can pool and curate datasets; such experts may become as essential to modern healthcare as imaging technicians are now. 14 Clinicians will also need training as collectors of health data and users of machine learning tools. 15

To truly realise the potential of big data in healthcare we need to bring together up-to-date data management practices, specialists who can maximise the usability and quality of health data, and a new policy framework that recognises the need for data sharing. Until then, the big data revolution (or at least a realistic version of it) remains on hold.

Competing interests: We have read and understood BMJ policy on declaration of interests and declare the following: RRS reports board membership for Compute Ontario, SHARCNET, and SOSCIP—all non-profit advanced research computing organisations. MB reports personal fees from Stryker, Sanofi, Ferring, and Pendopharm and grants from Acumed, DJO, and Sanofi outside the submitted work.

Provenance and peer review: Not commissioned; externally peer reviewed.

  • De Mauro A ,
  • Michalski RS ,
  • Carbonell JG ,
  • Mitchell TM
  • Fernandes L ,
  • O’Connor M ,
  • Raghupathi W ,
  • Raghupathi V
  • Murdoch TB ,
  • ↵ Cortes C, Jackel LD, Chiang WP. Limits on learning machine accuracy imposed by data quality. In: Advances in neural information processing systems. MIT Press, 1995:239-46.
  • Najafabadi MM ,
  • Villanustre F ,
  • Khoshgoftaar TM ,
  • Muharemagic E
  • ↵ Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A. The limitations of deep learning in adversarial settings. In: Security and Privacy (EuroS&P), 2016 IEEE European Symposium on IEEE. 2016:372-87.
  • Jones-Farmer LA
  • Furlong LI ,
  • Albanell J ,

big data research hype or revolution

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Big Data: The Management Revolution

  • Andrew McAfee
  • Erik Brynjolfsson

Exploiting vast new flows of information can radically improve your company’s performance. But first you’ll have to change your decision-making culture.

Big data, the authors write, is far more powerful than the analytics of the past. Executives can measure and therefore manage more precisely than ever before. They can make better predictions and smarter decisions. They can target more-effective interventions in areas that so far have been dominated by gut and intuition rather than by data and rigor. The differences between big data and analytics are a matter of volume, velocity, and variety: More data now cross the internet every second than were stored in the entire internet 20 years ago. Nearly real-time information makes it possible for a company to be much more agile than its competitors. And that information can come from social networks, images, sensors, the web, or other unstructured sources.

The managerial challenges, however, are very real. Senior decision makers have to learn to ask the right questions and embrace evidence-based decision making. Organizations must hire scientists who can find patterns in very large data sets and translate them into useful business information. IT departments have to work hard to integrate all the relevant internal and external sources of data.

The authors offer two success stories to illustrate how companies are using big data: PASSUR Aerospace enables airlines to match their actual and estimated arrival times. Sears Holdings directly analyzes its incoming store data to make promotions much more precise and faster.

“You can’t manage what you don’t measure.”

big data research hype or revolution

  • Andrew McAfee is the cofounder and co-director of the MIT Initiative on the Digital Economy and the inaugural visiting fellow in Google’s Technology and Society group. He is the author of the new book, The Geek Way , and coauthor with Erik Brynjoflsson of The Second Machine Age .
  • Erik Brynjolfsson is the director of the Stanford Digital Economy Lab, a professor at the Stanford Institute for Human-Centered AI, a research associate at the National Bureau of Economic Research, and a cofounder of Workhelix, which creates generative AI strategies and implementation plans for companies.

Sign in » warwick.ac.uk

  • Acceptable use policy

big data research hype or revolution

Advertisement

Advertisement

Awaiting the Second Big Data Revolution: From Digital Noise to Value Creation

  • Open access
  • Published: 18 February 2015
  • Volume 15 , pages 35–47, ( 2015 )

Cite this article

You have full access to this open access article

big data research hype or revolution

  • Mark Huberty 1  

10k Accesses

39 Citations

9 Altmetric

Explore all metrics

“Big data”—the collection of vast quantities of data about individual behavior via online, mobile, and other data-driven services—has been heralded as the agent of a third industrial revolution—one with raw materials measured in bits, rather than tons of steel or barrels of oil. Yet the industrial revolution transformed not just how firms made things, but the fundamental approach to value creation in industrial economies. To date, big data has not achieved this distinction. Instead, today’s successful big data business models largely use data to scale old modes of value creation, rather than invent new ones altogether. Moreover, today’s big data cannot deliver the promised revolution. In this way, today’s big data landscape resembles the early phases of the first industrial revolution, rather than the culmination of the second a century later. Realizing the second big data revolution will require fundamentally different kinds of data, different innovations, and different business models than those seen to date. That fact has profound consequences for the kinds of investments and innovations firms must seek, and the economic, political, and social consequences that those innovations portend.

Similar content being viewed by others

big data research hype or revolution

The role of digitalization in business and management: a systematic literature review

big data research hype or revolution

The theory contribution of case study research designs

Big data analytics in e-commerce: a systematic review and agenda for future research.

Avoid common mistakes on your manuscript.

1 Introduction

We believe that we live in an era of “big data”. Firms today accumulate, often nearly by accident, vast quantities of data about their customers, suppliers, and the world at large. Technology firms like Google or Facebook have led the pack in finding uses for such data, but its imprint is visible throughout the economy. The expanding sources and uses of data suggest to many the dawn of a new industrial revolution. Those who cheer lead for this revolution proclaim that these changes, over time, will bring about the same scope of change to economic and social prosperity in the 21st century, that rail, steam, or steel did in the 19th.

Yet this “big data” revolution has so far fallen short of its promise. Precious few firms transmutate data into novel products. Instead, most rely on data to operate, at unprecedented scale, business models with long pedigree in the media and retail sectors. Big data, despite protests to the contrary, is thus an incremental change—and its revolution one of degree, not kind.

The reasons for these shortcomings point to the challenges we face in realizing the promise of the big data revolution. Today’s advances in search, e-commerce, and social media relied on the creative application of marginal improvements in computational processing power and data storage. In contrast, tomorrow’s hopes for transforming real-world outcomes in areas like health care, education, energy, and other complex phenomena pose scientific and engineering challenges of an entirely different scale.

2 The Implausibility of Big Data

Our present enthusiasm for big data stems from the confusion of data and knowledge. Firms today can gather more data, at lower cost, about a wider variety of subjects, than ever before. Big data’s advocates claim that this data will become the raw material of a new industrial revolution. As with its 19th century predecessor, this revolution will alter how we govern, work, play, and live. But unlike the 19th century, we are told, the raw materials driving this revolution are so cheap and abundant that the horizon is bounded only by the supply of smart people capable of molding these materials into the next generation of innovations (Manyika et al. 2011 ).

This utopia of data is badly flawed. Those who promote it rely on a series of dubious assumptions about the origins and uses of data, none of which hold up to serious scrutiny. In aggregate, these assumptions all fail to address whether the data we have actually provides the raw materials needed for a data-driven Industrial Revolution we need. Taken together, these failures point out the limits of a revolution built on the raw materials that today seem so abundant.

Four of these assumptions merit special attention: First, N = all , or the claim that our data allow a clear and unbiased study of humanity; second, that today = tomorrow , or the claim that understanding online behavior today implies that we will still understand it tomorrow; third, offline = online , the claim that understanding online behavior offers a window into economic and social phenomena in the physical world; and fourth, that complex patterns of social behavior, once understood, will remain stable enough to become the basis of new data-driven, predictive products and services in sectors well beyond social and media markets. Each of these has its issues. Taken together, those issues limit the future of a revolution that relies, as today’s does, on the “digital exhaust” of social networks, e-commerce, and other online services. The true revolution must lie elsewhere.

2.1 N = All

Gathering data via traditional methods has always been difficult. Small samples were unreliable; large samples were expensive; samples might not be representative, despite researchers’ best efforts; tracking the same sample over many years required organizations and budgets that few organizations outside governments could justify. None of this, moreover, was very scalable: researchers needed a new sample for every question, or had to divine in advance a battery of questions and hope that this proved adequate. No wonder social research proceeded so slowly.

Mayer-Schönberger and Cukier ( 2013 ) argue that big data will eliminate these problems. Instead of having to rely on samples, online data, they claim, allows us to measure the universe of online behavior, where N (the number of people in the sample) is basically all (the entire population of people we care about). Hence we no longer need worry, they claim, about the problems that have plagued researchers in the past. When N = all , large samples are cheap and representative, new data on individuals arrives constantly, monitoring data over time poses no added difficulty, and cheap storage permits us to ask new questions of the same data again and again. With this new database of what people are saying or buying, where they go and when, how their social networks change and evolve, and myriad other factors, the prior restrictions borne of the cost and complexity of sampling will melt away.

But N ≠ all . Most of the data that dazzles those infatuated by “big data”—Mayer-Schönberger and Cukier included—comes from what McKinsey & Company termed “digital exhaust” (Manyika et al. 2011 ): the web server logs, e-commerce purchasing histories, social media relations, and other data thrown off by systems in the course of serving web pages, online shopping, or person-to-person communication. The N covered by that data concerns only those who use these services—not society at large. In practice, this distinction turns out to matter quite a lot. The demographics of any given online service usually differ dramatically from the population at large, whether we measure by age, gender, race, education, and myriad other factors.

Hence the uses of that data are limited. It’s very relevant for understanding web search behavior, purchasing, or how people behave on social media. But the N here is skewed in ways both known and unknown—perhaps younger than average, or more tech-savvy, or wealthier than the general population. The fact that we have enormous quantities of data about these people may not prove very useful to understanding society writ large.

2.2 Today = Tomorrow

But let’s say that we truly believe this assumption—that everyone is (or soon will be) online. Surely the proliferation of smart phones and other devices is bringing that world closer, at least in the developed world. This brings up the second assumption—that we know where to go find all these people. Several years ago, MySpace was the leading social media website, a treasure trove of new data on social relations. Today, it’s the punchline to a joke. The rate of change in online commerce, social media, search, and other services undermines any claim that we can actually know that our N = all sample that works today will work tomorrow. Instead, we only know about new developments—and the data and populations they cover—well after they have already become big. Hence our N = all sample is persistently biased in favor of the old. Moreover, we have no way of systematically checking how biased the sample is, without resorting to traditional survey methods and polling—the very methods that big data is supposed to render obsolete.

2.3 Online Behavior = Offline Behavior

But let’s again assume that problem away. Let’s assume that we have all the data, about all the people, for all the online behavior, gathered from the digital exhaust of all the relevant products and services out there. Perhaps, in this context, we can make progress understanding human behavior online. But that is not the revolution that big data has promised. Most of the “big data” hype has ambitions beyond improving web search, online shopping, socializing, or other online activity. Instead, big data should help cure disease, detect epidemics, monitor physical infrastructure, and aid first responders in emergencies.

To satisfy these goals, we need a new assumption: that what people do online mirrors what they do offline. Otherwise, all the digital exhaust in the world won’t describe the actual problems we care about.

There’s little reason to think that offline life faithfully mirrors online behavior. Research has consistently shown that individuals’ online identities vary widely from their offline selves. In some cases, that means people are more cautious about revealing their true selves. Danah Boyd’s work (Boyd and Marwick 2011 ) has shown that teenagers cultivate online identities very different from their offline selves—whether for creative, privacy, or other reasons. In others, it may mean that people are more vitriolic, or take more extreme positions. Online political discussions—another favorite subject of big data enthusiasts—suffer from levels of vitriol and partisanship far beyond anything seen offline (Conover et al. 2011 ). Of course, online and offline identity aren’t entirely separate. That would invite suggestions of schizophrenia among internet users. But the problem remains—we don’t know what part of a person is faithfully represented online, and what part is not.

Furthermore, even where online behavior may echo offline preferences or beliefs, that echo is often very weak. In statistical terms, our ability to distinguish “significant” from “insignificant” results improves with the sample size—but statistical significance is not actual significance. Knowing, say, that a history of purchasing some basket of products is associated with an increased risk of being a criminal may be helpful. But if that association is weak—say a one-hundredth of a percent increase—it’s practical import is effectively zero. Big data may permit us to find these associations, but it does not promise that they will be useful.

2.4 Behavior of All (Today) = Behavior of All (Tomorrow)

OK, but you say, surely we can determine how these distortions work, and incorporate them into our models? After all, doesn’t statistics have a long history of trying to gain insight from messy, biased, or otherwise incomplete data?

Perhaps we could build such a map, one that allows us to connect the observed behaviors of a skewed and selective online population to offline developments writ large. This suffices only if we care primarily about describing the past. But much of the promise of big data comes from predicting the future—where and when people will get sick in an epidemic, which bridges might need the most attention next month, whether today’s disgruntled high school student will become tomorrow’s mass shooter.

Satisfying these predictive goals requires yet another assumption. It is not enough to have all the data, about all the people, and a map that connects that data to real-world behaviors and outcomes. We also have to assume that the map we have today will still describe the world we want to predict tomorrow.

Two obvious and unknowable sources of change stand in our way. First, people change. Online behavior is a culmination of culture, language, social norms and other factors that shape both people and how they express their identity. These factors are in constant flux. The controversies and issues of yesterday are not those of tomorrow; the language we used to discuss anger, love, hatred, or envy change. The pathologies that afflict humanity may endure, but the ways we express them do not.

Second, technological systems change. The data we observe in the “digital exhaust” of the internet is created by individuals acting in the context of systems with rules of their own. Those rules are set, intentionally or not, by the designers and programmers that decide what we can and cannot do with them. And those rules are in constant flux. What we can and cannot buy, who we can and cannot contact on Facebook, what photos we can or cannot see on Flickr vary, often unpredictably. Facebook alone is rumored to run up to a thousand different variants on its site at one time. Hence even if culture never changed, our map from online to offline behavior would still decay as the rules of online systems continued to evolve.

An anonymous reviewer pointed out, correctly, that social researchers have always faced this problem. This is certainly true but many of the features of social systems—political and cultural institutions, demography, and other factors—change on a much longer timeframe than today’s data-driven internet services. For instance, US Congressional elections operate very differently now compared with a century ago; but change little between any two elections. Contrast that with the pace of change for major social media services, for which 2 years may be a lifetime.

A recent controversy illustrates this problem to a T. Facebook recently published a study (Kramer et al. 2014 ) in which they selectively manipulated the news feeds of a randomized sample of users, to determine whether they could manipulate users’ emotional states. The revelation of this study prompted fury on the part of users, who found this sort of manipulation unpalatable. Whether they should, of course, given that Facebook routinely runs experiments on its site to determine how best to satisfy (i.e., make happier) its users, is an interesting question. But the broader point remains—someone watching the emotional state of Facebook users might have concluded that overall happiness was on the rise, perhaps consequence of the improving American economy. But in fact this increase was entirely spurious, driven by Facebook’s successful experiment at manipulating its users.

Compounding this problem, we cannot know, in advance, which of the social and technological changes we do know about will matter to our map. That only becomes apparent in the aftermath, as real-world outcomes diverge from predictions cast using the exhaust of online systems.

Lest this come off as statistical nihilism, consider the differences in two papers that both purport to use big data to project the outcome of US elections. DiGrazia et al. ( 2013 ) claim that merely counting the tweets that reference a Congressional candidate—with no adjustments for demography, or spam, or even name confusion—can forecast whether that candidate will win his or her election. This is a purely “digital exhaust” approach. They speculate—but cannot know—whether this approach works because (to paraphrase their words) “one tweet equals one vote”, or “all attention on Twitter is better”. Moreover, it turns out that the predictive performance of this simple model provides no utility. As Huberty ( 2013 ) shows, their estimates perform no better than an approach that simply guesses that the incumbent party would win—a simple and powerful predictor of success in American elections. Big data provided little value.

Contrast this with Wang et al. ( 2014 ). They use the Xbox gaming platform as a polling instrument, which they hope might help compensate for the rising non-response rates that have plagued traditional telephone polls. As with Twitter, N ≠ all : the Xbox user community is younger, more male, less politically involved. But the paper nevertheless succeeds in generating accurate estimates of general electoral sentiment. The key difference lies in their use of demographic data to re-weight respondents’ electoral sentiments to look like the electorate at large. The Xbox data were no less skewed than Twitter data; but the process of data collection provided the means to compensate. The black box of Twitter’s digital exhaust, lacking this data, did not. The difference? DiGrazia et al. ( 2013 ) sought to reuse data created for one purpose in order to do something entirely different; Wang et al. ( 2014 ) set out to gather data explicitly tailored to their purpose alone.

2.5 The Implausibility of Big Data 1.0

Taken together, the assumptions that we have to make to fulfill the promise of today’s big data hype appear wildly implausible. To recap, we must assume that:

everyone we care about is online;

we know where to find them today, and tomorrow;

they represent themselves online consistent with how they behave offline, and;

they will continue to represent themselves online—in behavior, language, and other factors—in the same way, for long periods of time.

Nothing in the history of the internet suggests that even one of these statements holds true. Everyone was not online in the past; and likely will not be online in the future. The constant, often wrenching changes in the speed, diversity, and capacity of online services means those who are online move around constantly. They do not, as we’ve seen, behave in ways necessarily consistent with their offline selves. And the choices they make about how to behave online evolve in unpredictable ways, shaped by a complex and usually opaque amalgam of social norms and algorithmic influences.

But if each of these statements fall down, then how have companies like Amazon, Facebook, or Google built such successful business models? The answer lies in two parts. First, most of what these companies do is self-referential: they use data about how people search, shop, or socialize online to improve and expand services targeted at searching, shopping, or socializing. Google, by definition, has an N = all sample of Google users’ online search behavior. Amazon knows the shopping behaviors of Amazon users. Of course, these populations are subject to change their behaviors, their self-representation, or their expectations at any point. But at least Google or Amazon can plausibly claim to have a valid sample of the primary populations they care about.

Second, the consequences of failure are, on the margins, very low. Google relies heavily on predictive models of user behavior to sell the advertising that accounts for most of its revenue. But the consequences of errors in that model are low—Google suffers little from serving the wrong ad on the margins. Of course, persistent and critical errors of understanding will undermine products and lead to lost customers. But there’s usually plenty of time to correct course before that happens. So long as Google does better than its competitors at targeting advertising, it will continue to win the competitive fight for advertising dollars.

But if we move even a little beyond these low-risk, self-referential systems, the usefulness of the data that underpin them quickly erodes. Google Flu provides a valuable lesson in this regard. In 2008, Google announced a new collaboration with the Centers for Disease Control (CDC) to track and report rates of influenza infection. Historically, the CDC had monitored US flu infection patterns through a network of doctors that tracked and reported “influenza-like illness” in their clinics and hospitals. But doctors’ reports took up to 2 weeks to reach the CDC—a long time in a world confronting SARS or avian flu. Developing countries with weaker public health capabilities faced even greater challenges. Google hypothesized that, when individuals or their family members got the flu, they went looking on the internet—via Google, of course—for medical advice. In a highly cited paper, Ginsberg et al. ( 2008 ) showed that they could predict region-specific influenza infection rates in the United States using Google search frequency data. Here was the true promise of big data—that we capitalize on virtual data to better understand, and react to, the physical world around us.

The subsequent history of Google Flu illustrates the shortcomings of the first big data revolution. While Google Flu has performed well in many seasons, it has failed twice, both times in the kind of abnormal flu season during which accurate data are most valuable. The patterns of and reasons for failure speak to the limits of prediction. In 2009, Google Flu under-predicted flu rates during the H1N1 pandemic. Post-hoc analysis suggested that the different viral characteristics of H1N1 compared with garden-variety strains of influenza likely meant that individuals didn’t know they had a flu strain, and thus didn’t go looking for flu-related information (Cook et al. 2011 ). Conversely, in 2012, Google Flu over-predicted influenza infections. Google has yet to discuss why, but speculation has centered on the intensive media coverage of an early-onset flu season, which may have sparked interest in the flu among healthy individuals (Butler 2013 ).

The problems experienced by Google Flu provide a particularly acute warning of the risks inherent in trying to predict what will happen in the real world based on the exhaust of the digital one. Google Flu relied on a map—a mathematical relationship between online behavior and real-world infection. Google built that map on historic patterns of flu infection and search behavior. It assumed that such patterns would continue to hold in the future. But there was nothing fundamental about those patterns. Either a change in the physical world (a new virus) or the virtual one (media coverage) were enough to render the map inaccurate. The CDC’s old reporting networks out-performed big data when it mattered most.

3 A Revolution Constrained: Data, Potential, and Value Creation

Despite ostensibly free raw materials, mass-manufacturing insight from digital exhaust has thus proven far more difficult than big data’s advocates would let on. It’s thus unsurprising that this revolution has had similarly underwhelming effects on business models. Amazon, Facebook, and Google are enormously successful businesses, underpinned by technologies operating at unprecedented scale. But they still rely on centuries-old business models for most of their revenue. Google and Amazon differ in degree, but not kind, from a newspaper or a large department store when it comes to making money. This is a weak showing from a revolution that was supposed to change the 21st century in the way that steam, steel, or rail changed the 19th. Big data has so far made it easier to sell things, target ads, or stalk long-lost friends or lovers. But it hasn’t yet fundamentally reworked patterns of economic life, generated entirely new occupations, or radically altered relationships with the physical world. Instead, it remains oddly self-referential: we generate massive amounts of data in the process of online buying, viewing, or socializing; but find that data truly useful only for improving online sales and search.

Understanding how we might get from here to there requires a better understanding of how and why data—big or small—might create value in a world of better algorithms and cheap compute capacity. Close examination shows that firms have largely used big data to improve on existing business models, rather than adopt new ones; and that those improvements have relied on data to describe and predict activity in worlds largely of their own making. Where firms have ventured beyond these self-constructed virtual worlds, the data have proven far less useful, and products built atop data far more prone to failure.

3.1 Refining Data into Value

The Google Flu example suggests the limits to big data as a source of mass-manufactured insight about the real world. But Google itself, and its fellow big-data success stories, also illustrate the shortcomings of big data as a source of fundamentally new forms of value creation. Most headline big data business models have used their enhanced capacity to describe, predict, or infer in order to implement—albeit at impressive scale and complexity—centuries-old business models. Those models create value not from the direct exchange between consumer and producer, but via a web of transactions several orders removed from the creation of the data itself. Categorizing today’s big data business models based on just how far they separate data generation from value creation quickly illustrates how isolated the monetary value of firms’ data is from their primary customers. Having promised a first-order world, big data has delivered a third-order reality.

Realizing the promise of the big data revolution will require a different approach. The same problems that greeted flu prediction have plagued other attempts to build big data applications that forecast the real world. Engineering solutions to these problems that draw on the potential of cheap computation and powerful algorithms will require not different methods, but different raw materials. The data those materials require must originate from a first-order approach to studying and understanding the worlds we want to improve. Such approaches will require very different models of firm organization than those exploited by Google and its competitors in the first big data revolution.

3.1.1 Third-Order Value Creation: The Newspaper Model

Most headline big data business models do not make much money directly from their customers. Instead, they rely on third parties—mostly advertisers—to generate profits from data. The actual creation and processing of data is only useful insofar as it’s of use to those third parties. In doing so, these models have merely implemented, at impressive scale and complexity, the very old business model used by the newspapers they have largely replaced.

If we reach back into the dim past when newspapers were viable businesses (rather than hobbies of the civic-minded wealthy), we will remember that their business model had three major components:

gather, filter, and analyze news;

attract readers by providing that news at far below cost, and;

profit by selling access to those readers to advertisers.

The market for access matured along with the newspapers that provided it. Both newspapers and advertisers realized that people who read the business pages differed from those who read the front page, or the style section. Front-page ads were more visible to readers than those buried on page A6. Newspapers soon started pricing access to their readers accordingly. Bankers paid one price to advertise in the business section, clothing designers another for the style pages. This segmentation of the ad market evolved as the ad buyers and sellers learned more about whose eyeballs were worth how much, when, and where.

Newspapers were thus third-order models. The news services they provided were valuable in their own right. But readers didn’t pay for them. Instead, news was a means of generating attention and data, which was only valuable when sold to third parties in the form of ad space. Data didn’t directly contribute to improving the headline product—news—except insofar as it generated revenue that could be plowed back into news gathering. The existence of a tabloid press of dubious quality but healthy revenues proved the weakness of the link between good journalism and profit.

From a value creation perspective, Google, Yahoo, and other ad-driven big data businesses are nothing more than newspapers at scale. They too provide useful services (then news, now email or search) to users at rates far below cost. They too profit by selling access to those users to third-party advertisers. They too accumulate and use data to carve up the ad market. The scale of data they have available, of course, dwarfs that of their newsprint ancestors. This data, combined with cheap computation and powerful statistics, has enabled operational efficiency, scale, and effectiveness far beyond what newspapers could ever have managed. But the business model itself—the actual means by which these firms earn revenues—is identical.

Finally, that value model does not emerge, fully-formed, from the data itself. The data alone are no more valuable than the unrefined iron ore or crude oil of past industrial revolutions. Rather, the data were mere inputs to a production process that depended on human insight—that what people looked for on the internet might be a good proxy for their consumer interests.

3.1.2 Second-Order Value Creation: The Retail Model

Big-box retail ranks as the other substantial success for big data. Large retailers like Amazon, Wal-Mart, or Target have harvested very fine-grained data about customer preferences to make increasingly accurate predictions of what individual customers wish to buy, in what quantities and combinations, at what times of the year, at what price. These predictions are occasionally shocking in their accuracy—as with Target’s implicit identification of a pregnant teenager well before her father knew it himself, based solely on subtle changes in her purchasing habits.

From this data, these retailers can, and have, built a detailed understanding of retail markets: what products are complements or substitutes for each other; exactly how much more people are willing to pay for brand names versus generics; how size, packaging, and placement in stores and on shelves matters to sales volumes.

Insights built on such data have prompted two significant changes in retail markets. First, they have made large retailers highly effective at optimizing supply chains, identifying retail trends in their infancy, and managing logistical difficulties to minimize the impact on sales and lost competitiveness. This has multiplied their effectiveness versus smaller retailers, who lack such capabilities and are correspondingly less able to compete on price.

But it has also changed, fundamentally, the relationship of these retailers to their suppliers. Big box retailers have increasingly become monopsony buyers of some goods—books for Amazon, music for iTunes. But they are also now monopoly sellers of information back to their suppliers. Amazon, Target and Wal-Mart have a much better understanding of their suppliers’ customers than the customers themselves. They also understand these suppliers’ competitors far better. Hence their aggregation of information has given them substantial power over suppliers. This has had profound consequences for the suppliers. Wal-Mart famously squeezes suppliers on cost—either across the board, or by pitting suppliers against one another based on detailed information of their comparative cost efficiencies and customer demand.

Hence big data has shifted the power structure of the retail sector and its manufacturing supply chains. The scope and scale of the data owned by Amazon or Wal-Mart about who purchases what, when, and in what combinations often means that they understand the market for a product far better than the manufacturer. Big data, in this case, comes from big business—a firm that markets to the world also owns data about the world’s wants, needs, and peculiarities. Even as they are monopsony buyers of many goods (think e-books for Amazon), they are correspondingly monopoly sellers of data. And that has made them into huge market powers on two dimensions, enabling them to squeeze suppliers to the absolute minimum price, packaging, size, and other product features that are most advantageous to them—and perhaps to their customers.

But big data has not changed the fundamental means of value creation in the retail sector. Whatever its distributional consequences, the basic retail transaction—of individuals buying goods from retail intermediaries, remains unchanged from earlier eras. The same economies of scale and opportunities for cross-marketing that made Montgomery Ward a retail powerhouse in the 19th century act on Amazon and Wal-Mart in the 21st. Big data may have exacerbated trends already present in the retail sector; but the basics of how that sector creates value for customers and generates profits for investors are by no means new. Retailers have yet to build truly new products or services that rely on data itself—instead, that data is an input into a longstanding process of optimization of supply chain relations, marketing, and product placement in service of a very old value model: the final close of sale between a customer and the retailer.

3.1.3 First-order Value Creation: The Opportunity

Second- and third-order models find value in data several steps removed from the actual transaction that generates the data. However, as the Google Flu example illustrated, that data may have far less value when separated from its virtual context. Thus while these businesses enjoy effectively free raw materials, the potential uses of those materials are in fact quite limited. Digital exhaust from web browsing, shopping, or socializing has proven enormously useful in the self-referential task of improving future web browsing, shopping, and socializing. But that success has not translated success at tasks far removed from the virtual world that generated this exhaust. Digital exhaust may be plentiful and convenient to collect, but it offers limited support for understanding or responding to real-world problems.

First-order models, in contrast, escape the Flu trap by building atop purpose-specific data, conceived and collected with the intent of solving specific problems. In doing so, they capitalize on the cheap storage, powerful algorithms, and inexpensive computing power that made the first wave of big data firms possible. But they do so in pursuit of a rather different class of problems.

First order products remain in their infancy. But some nascent examples suggest what might be possible. IBM’s Watson famously used its natural language and pattern recognition abilities to win the Jeopardy! game show. Doing so constituted a major technical feat: the ability to understand unstructured, potentially obfuscated Jeopardy! game show answers, and respond with properly-structured questions based on information gleaned from vast databases of unstructured information on history, popular culture, art, science, or almost any other domain.

The question now is whether IBM can adapt this technology to other problems. Its first attempts at improving medical diagnosis appear promising. By learning from disease and health data gathered from millions of patients, initial tests suggest that Watson can improve the quality, accuracy, and efficacy of medical diagnosis and service to future patients (Steadman 2013 ). Watson closes the data value loop: patient data is made valuable because it improves patient services, not because it helps with insurance underwriting or product manufacturing or logistics or some other third-party activity.

Premise Corporation provides another example. Premise has built a mobile-phone based data gathering network to measure macroeconomic aggregates like inflation and food scarcity. This network allows them to monitor economic change at a very detailed level, in regions of the world where official statistics are unavailable or unreliable. This sensor network is the foundation of the products and services that Premise sells to financial services firms, development agencies, and other clients. As compared with the attenuated link between data and value in second- or third-order businesses, Premise’s business model links the design of the data generation process directly to the value of its final products.

Optimum Energy (OE) provides a final example. OE monitors and aggregates data on building energy use—principally data centers—across building types, environments, and locations. That data enables it to build models for building energy use and efficiency optimization. Those models, by learning building behaviors across many different kinds of inputs and buildings, can perform better than single-building models with limited scope. Most importantly, OE creates value for clients by using this data to optimize energy efficiency and reduce energy costs.

These first-order business models all rely on data specifically obtained for their products. This reliance on purpose-specific data contrasts with third-order models that rely on the “digital exhaust” of conventional big data wisdom. To use the newspaper example, third-order models assume—but can’t specifically verify—that those who read the style section are interested in purchasing new fashions. Google’s success stemmed from closing this information gap a bit—showing that people who viewed web pages on fashion were likely to click on fashion ads. But again, the data that supports this is data generated by processes unrelated to actual purchasing—activities like web surfing and search or email exchange. And so the gap remains. Google appears to realize this, and has launched Consumer Surveys as an attempt to bridge that gap. In brief, it offers people the chance to skip ads in favor of providing brand feedback.

3.2 The Unrealized Promise of Unreasonable Data

We should remember the root of the claim about big data. That claim was perhaps best summarized by Halevy et al. ( 2009 ) in what they termed “the unreasonable effectiveness of data”—that, when seeking to improve the performance of predictive systems, more data appeared to yield better returns on effort than better algorithms. Most appear to have taken that to mean that data—and particularly more data—are unreasonably effective everywhere—and that, by extension, even noisy or skewed data could suffice to answer hard questions if we could simply get enough of it. But that misstates the authors’ claims. They did not claim that more data was always better. Rather, they argued that, for specific kinds of applications, history suggested that gathering more data paid better dividends than inventing better algorithms.

Where data are sparse or the phenomenon under measurement noisy, more data allow a more complete picture of what we are interested in. Machine translation provides a very pertinent example: human speech and writing varies enormously within one language, let alone two. Faced with the choice between better algorithms for understanding human language, and more data to quantify the variance in language, more data appears to work better. But for other applications, the “bigness” of data may not matter at all. If I want to know who will win an election, polling a thousand people might be enough. Relying on the aggregated voices of a nation’s Twitter users, in contrast, will probably fail (Gayo-Avello et al. 2011 ; Gayo-Avello 2012 ; Huberty 2013 ). Not only are we not, as section  2 discussed, in the N = all world that infatuated Mayer-Schönberger and Cukier ( 2013 ); but for most problems we likely don’t care to be. Having the right data—and consequently identifying the right question to ask beforehand—is far more important than having a lot of data of limited relevance to the answers we seek.

4 Consequences

Big data therefore falls short of the proclamation that it represents the biggest change in technological and economic possibility since the industrial revolution. That revolution, in the span of a century or so, fundamentally transformed almost every facet of human life. Someone born in 1860, who lived to be 70 years old, grew up in a world of horses for travel, candles for light, salting and canning for food preservation, and telegraphs for communication. The world of their passing had cars and airplanes, electric light and refrigerators, and telephones, radio, and motion pictures. Having ranked big data with the industrial revolution, we find ourselves wondering why our present progress seems so paltry in comparison.

But much of what we associate with the industrial revolution—the advances in automobile transport, chemistry, communication, and medicine—came much later. The businesses that produced them were fundamentally different from the small collections of tinkerers and craftsmen that built the first power looms. Instead, these firms invested in huge industrial research and development operations to discover and then commercialize new scientific discoveries. These changes were expensive, complicated, and slow—so slow that John Stuart Mill despaired, as late as 1871, of human progress. But in time, they produced a world inconceivable to even the industrial enthusiasts of the 1840s.

In today’s revolution, we have our looms, but we envision the possibility of a Model T. Today, we can see glimmers of that possibility in IBM’s Watson, Google’s self-driving car, or Nest’s thermostats that learn the climate preferences of a home’s occupants. These and other technologies are deeply embedded in, and reliant on, data generated from and around real-world phenomena. None rely on “digital exhaust”. They do not create value by parsing customer data or optimizing ad click-through rates (though presumably they could). They are not the product of a relatively few, straightforward (if ultimately quite useful) insights. Instead, IBM, Google, and Nest have dedicated substantial resources to studying natural language processing, large-scale machine learning, knowledge extraction, and other problems. The resulting products represent an industrial synthesis of a series of complex innovations, linking machine intelligence, real-time sensing, and industrial design. These products are thus much closer to what big data’s proponents have promised—but their methods are a world away from the easy hype about mass-manufactured insights from the free raw material of digital exhaust.

5 Towards the Second Big Data Revolution

We’re stuck in the first industrial revolution. We have the power looms and the water mills, but wonder, given all the hype, at the absence of the Model Ts and telephones of our dreams. The answer is a hard one. The big gains from big data will require a transformation of organizational, technological, and economic operations on par with that of the second industrial revolution. Then, as now, firms had to invest heavily in industrial research and development to build the foundations of entirely new forms of value creation. Those foundations permitted entirely new business models, in contrast to the marginal changes of the first industrial revolution. And the raw materials of the first revolution proved only tangentially useful to the innovations of the second.

These differences portend a revolution of greater consequence and complexity. Firms will likely be larger. Innovation will rely less on small entrepreneurs, who lack the funds and scale for systems-level innovation. Where entrepreneurs do remain, they will play far more niche roles. As Rao ( 2012 ) has argued, startups will increasingly become outsourced R&D, whose innovations are acquired to become features of existing products rather than standalone products themselves. The success of systems-level innovation will threaten a range of current jobs—white collar and service sector as well as blue collar and manufacturing—as expanding algorithmic capacity widens the scope of digitizeable tasks. But unlike past revolutions, that expanding capacity also begs the question of where this revolution will find new forms of employment insulated from these technological forces; and if it does not, how we manage the social instability that will surely follow. With luck, we will resist the temptation to use those same algorithmic tools for social control. But human history on that point is not encouraging.

Regardless, we should resist the temptation to assume that a world of ubiquitous data means a world of cheap, abundant, and relevant raw materials for a new epoch of economic prosperity. The most abundant of those materials today turn out to have limited uses outside the narrow products and services that generate them. Overcoming that hurdle requires more than just smarter statisticians, better algorithms, or faster computation. Instead, it will require new business models capable of nurturing both new sources of data and new technologies into truly new products and services.

Boyd D, Marwick AE (2011) Social privacy in networked publics: teens’ attitudes, practices, and strategies. In: A decade in internet time: symposium on the dynamics of the internet and society. pp 1–29

Butler D (2013) When Google got flu wrong. Nature 494(7436):155

Article   Google Scholar  

Conover MD, Ratkiewicz J, Francisco M, Goncalves B, Flammini A, Menczer F (2011) Political polarization on Twitter. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media

Cook S, Conrad C, Fowlkes AL, Mohebbi MH (2011) Assessing Google Flu trends performance in the United States during the 2009 influenza virus a (H1N1) pandemic. PLoS One 6(8):1–8

DiGrazia J, McKelvey K, Bollen J, Rojas F (2013) More tweets, more votes: social media as a quantitative indicator of political behavior. PLoS ONE 8(11):1–5

Gayo-Avello D (2012) I wanted to predict elections with Twitter and all I got was this lousy paper: a balanced survey on election prediction using twitter data. arXiv preprint arXiv:1204.6441

Gayo-Avello D, Metaxas PT, Mustafaraj E (2011) Limits of electoral predictions using Twitter. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) 21:2011

Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014

Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. Intell Syst IEEE 24(2):8–12

Huberty M (2013) Multi-cycle forecasting of congressional elections with social media. In: Proceedings of the 2nd Workshop on Politics, Elections, and Data (PLEAD), pp 23–30

Kramer A, Guillory J, Hancock J (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci 111(24):8788–8790

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute Report

Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Eamon Dolan/Houghton Mifflin Harcourt

Rao V (2012) Entrepreneurs are the new labor. Forbes. http://www.forbes.com/sites/venkateshrao/2012/09/03/entrepreneurs-are-the-new-labor-part-i/ . Accessed 3 Sept 2014

Steadman I (2013) IBM’s Watson is better at diagnosing cancer than human doctors. Wired UK, February 11th

Wang W, Rothschild D, Goel S, Gelman A (2014) Forecasting elections with non-representative polls. Int J Forecast Forthcoming

Download references

Acknowledgments

This research is a part of the ongoing collaboration of BRIE, the Berkeley Roundtable on the International Economy at the University of California at Berkeley, and ETLA, The Research Institute of the Finnish Economy. This paper has benefited from extended discussions with Cathryn Carson, Drew Conway, Chris Diehl, Stu Feldman, David Gutelius, Jonathan Murray, Joseph Reisinger, Sean Taylor, Georg Zachmann, and John Zysman. All errors committed, and opinions expressed, remain solely my own.

Author information

Authors and affiliations.

Berkeley Roundtable on the International Economy, Berkeley, CA, USA

Mark Huberty

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mark Huberty .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and permissions

About this article

Huberty, M. Awaiting the Second Big Data Revolution: From Digital Noise to Value Creation. J Ind Compet Trade 15 , 35–47 (2015). https://doi.org/10.1007/s10842-014-0190-4

Download citation

Received : 14 April 2014

Revised : 13 September 2014

Accepted : 03 December 2014

Published : 18 February 2015

Issue Date : March 2015

DOI : https://doi.org/10.1007/s10842-014-0190-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Digitalization
  • Value creation
  • Business models
  • Technological change

JEL Classification

  • Find a journal
  • Publish with us
  • Track your research

Browse Econ Literature

  • Working papers
  • Software components
  • Book chapters
  • JEL classification

More features

  • Subscribe to new research

RePEc Biblio

Author registration.

  • Economics Virtual Seminar Calendar NEW!

IDEAS home

Accounting, accountability, social media and big data: revolution or hype?

  • Author & abstract
  • 33 Citations
  • Related works & more

Corrections

  • Michela Arnaboldi
  • Cristiano Busco
  • Suresh Cuganesan

Suggested Citation

Download full text from publisher.

Follow serials, authors, keywords & more

Public profiles for Economics researchers

Various research rankings in Economics

RePEc Genealogy

Who was a student of whom, using RePEc

Curated articles & papers on economics topics

Upload your paper to be listed on RePEc and IDEAS

New papers by email

Subscribe to new additions to RePEc

EconAcademics

Blog aggregator for economics research

Cases of plagiarism in Economics

About RePEc

Initiative for open bibliographies in Economics

News about RePEc

Questions about IDEAS and RePEc

RePEc volunteers

Participating archives

Publishers indexing in RePEc

Privacy statement

Found an error or omission?

Opportunities to help RePEc

Get papers listed

Have your research listed on RePEc

Open a RePEc archive

Have your institution's/publisher's output listed on RePEc

Get RePEc data

Use data assembled by RePEc

IRIS - Institutional Research Information System

Purpose: The purpose of this paper is to outline an agenda for researching the relationship between technology-enabled networks – such as social media and big data – and the accounting function. In doing so, it links the contents of an unfolding area research with the papers published in this special issue of Accounting, Auditing and Accountability Journal. Design/methodology/approach: The paper surveys the existing literature, which is still in its infancy, and proposes ways in which to frame early and future research. The intention is not to offer a comprehensive review, but to stimulate and conversation. Findings: The authors review several existing studies exploring technology-enabled networks and highlight some of the key aspects featuring social media and big data, before offering a classification of existing research efforts, as well as opportunities for future research. Three areas of investigation are identified: new performance indicators based on social media and big data; governance of social media and big data information resources; and, finally, social media and big data’s alteration of information and decision-making processes. Originality/value: The authors are currently experiencing a technological revolution that will fundamentally change the way in which organisations, as well as individuals, operate. It is claimed that many knowledge-based jobs are being automated, as well as others transformed with, for example, data scientists ready to replace even the most qualified accountants. But, of course, similar claims have been made before and therefore, as academics, the authors are called upon to explore the impact of these technology-enabled networks further. This paper contributes by starting a debate and speculating on the possible research agendas ahead.

Accounting, accountability, social media and big data: revolution or hype? / Arnaboldi, Michela; Busco, Cristiano; Cuganesan, Suresh. - In: ACCOUNTING, AUDITING & ACCOUNTABILITY JOURNAL. - ISSN 1368-0668. - 30:4(2017), pp. 762-776. [10.1108/AAAJ-03-2017-2880]

Accounting, accountability, social media and big data: revolution or hype?

Arnaboldi, michela; busco, cristiano ; cuganesan, suresh, scheda breve scheda completa scheda completa (dc), pubblicazioni consigliate.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

 Informazioni

Scopus

social impact

Conferma cancellazione.

Sei sicuro che questo prodotto debba essere cancellato?

simulazione ASN

  • Data Summit
  • Blockchain in Government
  • Big Data Quarterly

Twitter

  • TOPICS: Big Data
  • BI & Analytics
  • Data Integration
  • Database Management
  • Virtualization
  • More Topics Artificial Intelligence Blockchain Data Center Management Data Modeling Data Quality Data Warehousing Database Security Hadoop Internet of Things Master Data Management MultiValue Database Technology NoSQL

Newsletters

  • 5 Minute Briefing: Information Management [ Latest Issue ]
  • 5 Minute Briefing: Data Center [ Latest Issue ]
  • 5 Minute Briefing: MultiValue [ Latest Issue ]
  • 5 Minute Briefing: Oracle [ Latest Issue ]
  • 5 Minute Briefing: SAP [ Latest Issue ]
  • 5 Minute Briefing: Blockchain [ Latest Issue ]
  • 5 Minute Briefing: Cloud [ Latest Issue ]
  • IOUG E-brief: Oracle Enterprise Manager [ Latest Issue ]
  • IOUG E-Brief: Cloud Strategies [ Latest Issue ]
  • DBTA E-Edition [ Latest Issue ]
  • Data Summit Conference News
  • AI and Machine Learning Summit News

Hadoop and the Big Data Revolution

It’s in the nature of hype bubbles to obscure important new paradigms behind a cloud of excitement and exaggerated claims.    For example, the phrase “big data” has been so widely and poorly applied that the term has become almost meaningless.  Nevertheless, beneath the hype of big data there is a real revolution in progress, and more than anything else it revolves around Apache Hadoop.

Let’s look at why Hadoop is creating such a stir in database management circles, and identify the obstacles that must be overcome before Hadoop can become part of mainstream enterprise architecture.  To do that, it helps to look at the factors that drove the last revolution in databases – the relational revolution of the 1980s.

The Last Data Revolution

The database revolution of the 1980s led to the triumph of the relational database (RDBMS).  The relational model was based on well-formulated theoretical models of data representation and storage.   But it was not the theoretical advantages of the RDBMS that drove its rapid adoption.  Rather, the RDBMS gave business easy access to the data held in production systems for the first time, enabling the birth of modern business intelligence systems.  

Prior to the relational database and relatively easy and flexible SQL language, virtually every request for information required the attention of a professional programmer, who would typically satisfy the request by writing hundreds or thousands of lines of COBOL program.  The ability of the relational database to accept ad hoc SQL query requests opened up the database to those outside the IT elite.  The log jam of report requests in MIS departments was eliminated, and relatively unskilled staff could quickly extract data for decision making purposes. 

Hadoop is revolutionizing database management for a similar reason – it is unlocking value of the masses of enterprise data that are not stored in an RDBMS.  And it’s this non-relational data – data generated by weblogs, point-of-sale devices, social networks, and mobile devices –  that offers the most potential competitive differentiation today.

In the 1980s, the relational paradigm became powerful so that almost every vendor described their database as relational, regardless of the underlying architecture.    In a similar way, almost every data technology on the market today is described as a “big data” solution.   But the reality is that of all open source and commercial technologies that can claim to offer a “big data” solution, Hadoop is far ahead in terms of adoption, fitness for purpose, and pace of innovation.

Why The Fuss Over Big Data?  

The volumes of data in database systems have been growing exponentially since the earliest days of digital storage.  A lot of this growth is driven simply by Moore’s law:  The density of digital storage doubles every year or two, and the size of the largest economically practical database increases correspondingly.   “Because we can” explains much of the growth in digital data over the last generation.

But something different is behind today’s “big data” revolution.  Prior to the internet revolution, virtually all enterprise data was produced “by hand”: employees using online systems to enter orders, record customer details and so on.  Entire professional categories once existed for professionals whose duty was to enter data into computer systems – occupations such as Key Punch Operators and Data Entry Operators.

Today, only a small fraction of a company’s data assets are manually created by employees.  Instead, the majority of data is either created by customers or generated as a by-product of business operations.  For instance, customers generate “click streams” of web navigation as a by-product of their interactions with the online business.  Supply chain systems generate tracking information as orders are fulfilled.   And often customers and potential customers post their reviews, opinions and desires to the internet through systems like Twitter, Yelp, Facebook, and so on. 

This “data exhaust” and social network noise would be of only minor interest if it weren’t for the simultaneous - and not entirely coincidental  - development of new techniques for extracting value from masses of raw data – techniques such as machine learning and collective intelligence.

Collective Intelligence Beats Artificial Intelligence

Prior to the big data revolution, software developers attempted to create intelligent and adaptive systems largely as rule-based expert systems.  These expert systems attempted to capture and emulate the wisdom of a human expert. Expert systems had limited success in fields such as medical diagnosis and performance optimization but failed dismally when applied to tasks such as language parsing, recommendation systems, and targeted marketing.   

The success of Google illustrated the way forward:  Google provided better search results not simply by hard coding better rules, but by using their increasingly massive database of past searches to refine their results.  During the same period, Amazon demonstrated the power of using recommendation engines to personalize the online shopping experience. Both Google and Amazon benefited from a virtuous cycle in which increasing data volumes improved the user experience leading to greater adoption and even more data.   In short, we discovered that the “wisdom of crowds” beats the traditional rule-based expert system.  

Amazon and Google solutions are examples of collective intelligence and machine learning techniques.   Machine learning programs modify their algorithms based on experience while collective intelligence uses big data sets to deliver seemingly intelligent application behavior.  

Web 2.0 companies such as Google and Amazon benefited the most from these big data techniques.  Today we can see competitive advantage in almost every industry segment.   Big data and collective intelligence techniques can increase sales in almost any industry by more accurately matching potential consumers to products.  They can also be used to identify customers at risk of “churn” to a competitor, who might default on payments or who otherwise warrant some personalized attention.   They are also critical in creating the personalized experience that consumers demand in modern ecommerce.

Why Hadoop Works

Hadoop is essentially an open source implementation of the key building blocks pioneered by Google to meet the challenge of indexing and storing the contents of the web.  From its beginning, Google encountered the three challenges that typify big data – sometimes called the “three Vs”:

  • Massive and exponentially growing quantities of data (Volume)
  • Unpredictable, diverse and weakly structured content (Variety)
  • Rapid rate of data generation (Velocity)

Google’s solution was to employ enormous clusters of commodity servers with internal disk storage.  The Google File System (GFS) allowed all the disks across these servers to be treated as a single file system.  The MapReduce algorithm was created to allow workloads to be parallelized across all the members of the cluster. By using disks in cheap commodity servers rather than disks in expensive SANs, Google achieved a far more economic and scalable data storage architecture than would otherwise have been possible.

By duplicating this classic Google architecture, Hadoop provides a practical, economic and mature platform for the storage of masses of unstructured data.  Compared to the alternatives – particularly to RDBMS – Hadoop is:

  • Economical:  Per GB, Hadoop costs an order of magnitude less than high-end SAN storage that would typically support a serious RDBMS implementation.
  • Mature:  the key algorithms of Hadoop have been field tested at Google, and massive Hadoop implementations have been proven at Facebook and Yahoo!.    The Hadoop community is vibrant and backed by several commercial vendors with deep pockets – especially now that IBM, Microsoft and Oracle have all embraced Hadoop.
  • Convenient:  RDBMS requires that data be analysed, modelled and transformed before being loaded.  These Extract, Transform and Load (ETL) projects are expensive, risk-prone and time consuming.  In contrast, the Hadoop “schema on read” approach allows data to be captured in its original form, deferring the schema definition until the data needs to be accessed.

Hadoop is not the only possible or existing technology that could potentially deliver these benefits.   But almost all significant vendors have abandoned development of alternative technologies in favor of embracing and embedding Hadoop.  Most significantly, Microsoft, IBM, and Oracle now all deliver Hadoop integrated within their standard architectures.  

Delivering on the Promises of Big Data

A technology like Hadoop alone doesn’t deliver the business benefits promised by big data.  For big data to become more than just promise we’ll need advances in the skill sets of IT professionals, software frameworks that are able to unlock the data held inside Hadoop clusters and a discipline of big data best practice.

It’s standard practice in a Hadoop project today to rely on highly skilled Java programmers with experience in statistical analysis and machine learning.  These developers are in short supply and – given the complexity inherent in collective intelligence solutions – probably always will be.  Nevertheless, universities should be constructing syllabuses less focused on the relatively routine world of web-based development in favor of course structures that include an emphasis on parallel data algorithms such as MapReduce together with statistical analysis and machine-learning techniques.

Given the shortage of skilled programmers higher-level abstractions on top of the native Hadoop MapReduce framework are extremely important.   Hive – A SQL-like access layer for Hadoop –  and PIG – a scripting data flow language – both open Hadoop up to a wider range of users and to a wider range of software tools.  Hive in particular is a key to enterprise Hadoop adoption because it potentially allows traditional BI and query tools to talk to Hadoop.  Unfortunately, the Hive SQL dialect (HQL) is a long way from being ANSI-SQL-compliant.  The Hadoop community should not underestimate how much enterprise adoption is depending on increasing Hive maturity.

Hive can open up Hadoop systems to traditional data analysts and traditional Business Intelligence tools.  But as a revolutionary technology, Hadoop and big data are more about breaking with existing traditions. Their unique business benefits lie in more sophisticated solutions that can’t be delivered by Business Intelligence tools.

Statistical analysis becomes particularly important as data granularity and volumes exceeds what can be understood through simple aggregations and pivots.   Long-time commercial statistical analysis vendors are rushing Hadoop connectivity to market, but by and large it’s been the open source R package that has being used most successfully with Hadoop.  R lacks some of the elegant graphics and easy interfaces of the commercial alternatives, but its open source licensing and extensibility make it a good fit in the Hadoop-based big data stack.

Beyond statistical analysis lies the realm of machine learning and collective intelligence that powered much of Google and Amazon’s original success.  Open source frameworks such as Apache Mahout provide building blocks for machine learning – low level techniques such as clustering, categorization and recommenders.  But it takes a very skilled team of developers to build a business solution from these building blocks. 

Frameworks that can be used to deliver packaged big data solutions for the enterprise are only just emerging.  A big opportunity could await the software vendor who can deliver a packaged application that brings collective intelligence and machine learning within the reach of mainstream IT.

Vive la Révolution!

The relational database is a triumph of software engineering.  It defined database management for more than 25 years and it will continue as the dominant model for real-time OLTP and BI systems for decades to come.

However, it’s apparent that the volume and nature of today’s digital data demands a complementary, but non-relational storage technology – and Hadoop is the leading contender.  For many organizations, the majority of digital data assets will soon be held not in an RDBMS, but in Hadoop or Hadoop-like systems.  

Hadoop provides an economically viable storage layer without which the big data revolution would be impossible.   The revolution will not be complete until we have practical techniques for turning this big data into business value.  Completing the big data revolution is going to generate demand for a new breed of IT professional and hopefully foster a new wave of software innovation.

About the author: 

Guy  Harrison can be found on the internet at www.guyharrison.net , on email at [email protected] and is @guyharrison on Twitter.

White Papers

The Importance of a Modern Data Strategy What's in it for You?

  • The Importance of a Modern Data Strategy What’s in it for You?
  • Amazon Aurora High Availability and Disaster Recovery Features for Global Resilience
  • PostgreSQL Performance Tuning Strategies
  • MySQL Performance Tuning
  • MongoDB Performance Tuning

Amazon Aurora High Availability and Disaster Recovery Features for Global Resilience

  • Business Intelligence and Analytics
  • Cloud Computing
  • Data Center Management
  • Data Modeling
  • Data Quality
  • Data Warehousing
  • Database Security
  • Master Data Management
  • MultiValue Database Technology
  • NoSQL Central
  • DBTA E-Edition
  • Data and Information Management Newsletters
  • DBTA 100: The 100 Companies that Matter in Data
  • Trend Setting Products in Data and Information Management
  • DBTA Downloads
  • DBTA SourceBook
  • Defining Data
  • Destination CRM
  • Faulkner Information Services
  • InfoToday.com
  • InfoToday Europe
  • ITIResearch.com
  • Online Searcher
  • Smart Customer Service
  • Speech Technology
  • Streaming Media
  • Streaming Media Europe
  • Streaming Media Producer

IOUG

TechRepublic

Male system administrator of big data center typing on laptop computer while working in server room. Programming digital operation. Man engineer working online in database center. Telecommunication.

8 Best Data Science Tools and Software

Apache Spark and Hadoop, Microsoft Power BI, Jupyter Notebook and Alteryx are among the top data science tools for finding business insights. Compare their features, pros and cons.

AI act trilogue press conference.

EU’s AI Act: Europe’s New Rules for Artificial Intelligence

Europe's AI legislation, adopted March 13, attempts to strike a tricky balance between promoting innovation and protecting citizens' rights.

Concept image of a woman analyzing data.

10 Best Predictive Analytics Tools and Software for 2024

Tableau, TIBCO Data Science, IBM and Sisense are among the best software for predictive analytics. Explore their features, pricing, pros and cons to find the best option for your organization.

Tableau logo.

Tableau Review: Features, Pricing, Pros and Cons

Tableau has three pricing tiers that cater to all kinds of data teams, with capabilities like accelerators and real-time analytics. And if Tableau doesn’t meet your needs, it has a few alternatives worth noting.

Futuristic concept art for big data solution for enterprises.

Top 6 Enterprise Data Storage Solutions for 2024

Amazon, IDrive, IBM, Google, NetApp and Wasabi offer some of the top enterprise data storage solutions. Explore their features and benefits, and find the right solution for your organization's needs.

Latest Articles

Splash graphic featuring the logo of Udemy.

The 5 Best Udemy Courses That Are Worth Taking in 2024

Udemy is an online platform for learning at your own pace. Boost your career with our picks for the best Udemy courses for learning tech skills online in 2024.

Check mark on shield on a background of binary values.

What Is Data Quality? Definition and Best Practices

Data quality refers to the degree to which data is accurate, complete, reliable and relevant for its intended use.

big data research hype or revolution

TechRepublic Premium Editorial Calendar: Policies, Checklists, Hiring Kits and Glossaries for Download

TechRepublic Premium content helps you solve your toughest IT issues and jump-start your career or next project.

European Union flag colors and symbols on a printed circuit board.

What is the EU’s AI Office? New Body Formed to Oversee the Rollout of General Purpose Models and AI Act

The AI Office will be responsible for enforcing the rules of the AI Act, ensuring its implementation across Member States, funding AI and robotics innovation and more.

Audience at conference hall.

Top Tech Conferences & Events to Add to Your Calendar in 2024

A great way to stay current with the latest technology trends and innovations is by attending conferences. Read and bookmark our 2024 tech events guide.

Data science abstract vector background.

What is Data Science? Benefits, Techniques and Use Cases

Data science involves extracting valuable insights from complex datasets. While this process can be technically challenging and time-consuming, it can lead to better business decision-making.

Glowing circuit grid forming a cloud and trickling binary values on a dark background.

Gartner’s 7 Predictions for the Future of Australian & Global Cloud Computing

An explosion in AI computing, a big shift in workloads to the cloud, and difficulties in gaining value from hybrid cloud strategies are among the trends Australian cloud professionals will see to 2028.

big data research hype or revolution

OpenAI Adds PwC as Its First Resale Partner for the ChatGPT Enterprise Tier

PwC employees have 100,000 ChatGPT Enterprise seats. Plus, OpenAI forms a new safety and security committee in their quest for more powerful AI, and seals media deals.

Contact management vector illustration. 2 people managing their client's contact information.

What Is Contact Management? Importance, Benefits and Tools

Contact management ensures accurate, organized and accessible information for effective communication and relationship building.

Laptop computer displaying logo of Tableau Software.

How to Use Tableau: A Step-by-Step Tutorial for Beginners

Learn how to use Tableau with this guide. From creating visualizations to analyzing data, this guide will help you master the essentials of Tableau.

Hubspot vs Mailchimp

HubSpot CRM vs. Mailchimp (2024): Which Tool Is Right for You?

HubSpot and Mailchimp can do a lot of the same things. In most cases, though, one will likely be a better choice than the other for a given use case.

Cloud computing trends.

Top 5 Cloud Trends U.K. Businesses Should Watch in 2024

TechRepublic identified the top five emerging cloud technology trends that businesses in the U.K. should be aware of this year.

Versus graphic featuring the logos of Pipedrive and monday.com

Pipedrive vs. monday.com (2024): CRM Comparison

Find out which CRM platform is best for your business by comparing Pipedrive and Monday.com. Learn about their features, pricing and more.

Close up view of a virtual project management software interface.

Celoxis: Project Management Software Is Changing Due to Complexity and New Ways of Working

More remote work and a focus on resource planning are two trends driving changes in project management software in APAC and around the globe. Celoxis’ Ratnakar Gore explains how PM vendors are responding to fast-paced change.

SAP versus Oracle.

SAP vs. Oracle (2024): Which ERP Solution Is Best for You?

Explore the key differences between SAP and Oracle with this in-depth comparison to determine which one is the right choice for your business needs.

Create a TechRepublic Account

Get the web's best business technology news, tutorials, reviews, trends, and analysis—in your inbox. Let's start with the basics.

* - indicates required fields

Sign in to TechRepublic

Lost your password? Request a new password

Reset Password

Please enter your email adress. You will receive an email message with instructions on how to reset your password.

Check your email for a password reset link. If you didn't receive an email don't forgot to check your spam folder, otherwise contact support .

Welcome. Tell us a little bit about you.

This will help us provide you with customized content.

Want to receive more TechRepublic news?

You're all set.

Thanks for signing up! Keep an eye out for a confirmation email from our team. To ensure any newsletters you subscribed to hit your inbox, make sure to add [email protected] to your contacts list.

  • Today's news
  • Reviews and deals
  • Climate change
  • 2024 election
  • Fall allergies
  • Health news
  • Mental health
  • Sexual health
  • Family health
  • So mini ways
  • Unapologetically
  • Buying guides

Entertainment

  • How to Watch
  • My Portfolio
  • Latest News
  • Stock Market
  • Biden Economy
  • EV Deep Dive
  • Stocks: Most Actives
  • Stocks: Gainers
  • Stocks: Losers
  • Trending Tickers
  • World Indices
  • US Treasury Bonds
  • Top Mutual Funds
  • Highest Open Interest
  • Highest Implied Volatility
  • Stock Comparison
  • Advanced Charts
  • Currency Converter
  • Basic Materials
  • Communication Services
  • Consumer Cyclical
  • Consumer Defensive
  • Financial Services
  • Industrials
  • Real Estate
  • Mutual Funds
  • Credit cards
  • Balance Transfer Cards
  • Cash-back Cards
  • Rewards Cards
  • Travel Cards
  • Student Loans
  • Personal Loans
  • Car Insurance
  • Morning Brief
  • Market Domination
  • Market Domination Overtime
  • Asking for a Trend
  • Opening Bid
  • Stocks in Translation
  • Lead This Way
  • Good Buy or Goodbye?
  • Fantasy football
  • Pro Pick 'Em
  • College Pick 'Em
  • Fantasy baseball
  • Fantasy hockey
  • Fantasy basketball
  • Download the app
  • Daily fantasy
  • Scores and schedules
  • GameChannel
  • World Baseball Classic
  • Premier League
  • CONCACAF League
  • Champions League
  • Motorsports
  • Horse racing
  • Newsletters

New on Yahoo

  • Privacy Dashboard

Yahoo Finance

The s&p 500 could climb 23% next year if treasury yields fall and ai momentum stays strong, research firm says.

The S&P 500 could soar more than 20% by the end of next year, Capital Economics predicted.

That's assuming the bond market and AI continue to work in the favor of stocks.

Falling yields and continuing excitement for AI could boost the S&P 500 to 6,500, the firm said.

The S&P 500 could see another double-digit gain by the end of next year.

That's as long as the bond market cooperates and Wall Street's hype for artificial intelligence continues to work in its favor, according to Capital Economics.

The research firm predicted the benchmark index could climb to 6,500 by the end of 2025, implying a 23% gain from its current level. That's assuming that falling Treasury yields and the AI will continue to boost the market.

Reilly pointed to the recent decline in US Treasury yields, with the yield on the 10-year Treasury plunging over 50 basis points from its peak last year as investors anticipate coming rate cuts from the Fed . Falling yields are typically bullish for stocks, but that's failed to boost the market in recent weeks, as some investors have grown exhausted over the excitement for artificial intelligence .

That's likely to change, though, Reilly said, as the market is still in the early stages of the AI megatrend.

"If both of these forces combine in favor of the stock market over the next year or so, as we expect, that could be a serious tailwind for equities," Reilly said in a recent note to clients. "This expectation that AI hype will increase and that Treasury yields will fall underpins our long-standing forecast for the S&p 500 to hit 6,500 by end-2025."

Some stock market prognosticators have warned the AI-fueled rally will fizzle, or worse, that it's bound to end in a painful bursting of a bubble that has been inflating for the last two years. Investors may be overly focused on a select group of AI players, a classic hallmark of a stock market bubble , Wall Street strategists have warned.

But narrow stock market rallies have the potential to last years, Rielly said, suggesting the stock market run-up could continue for now.

"In any case, we don't expect this narrowing to persist. The dotcom bubble highlighted how hard it is to identify the beneficiaries of a new technology ex-ante. And while investors have been convinced that NVIDIA (the dominant provider of chips) and other big-tech firms were likely winners, we remain at the early stages of the AI revolution and, in our view, of the bubble."

Warnings of a market bubble have proliferated as the S&P 500 notched a series of record-highs this year. By some valuation metrics, stocks look to be at their most overvalued since 1929 , one elite investor recently warned, speculating that a stock crash of around 50%-70% isn't out of the question.

Capital Economics has also warned of a stock market correction akin to the 1929 and dot-com crashes , which could begin in early 2026.

Read the original article on Business Insider

Recommended Stories

Benchmark diesel price falls for eighth straight week as futures plummet.

The DOE/EIA benchmark price used for most fuel surcharges declined again. The post Benchmark diesel price falls for eighth straight week as futures plummet appeared first on FreightWaves.

GameStop's stock surges after Roaring Kitty appears to show he has a $116 million stake

GameStop shares surged on Monday after the trader Keith Gill appeared to show a $116 million position in the retailer.

The Best Stocks to Invest $1,000 in Right Now

Learn which two megacap stocks are worth considering.

Bond Yields Slide on US Data; Asian Stocks to Fall: Markets Wrap

(Bloomberg) -- Australian and New Zealand bonds rallied, tracking gains in US Treasuries after weak manufacturing data bolstered bets the Federal Reserve will cut interest rates this year. Asian shares are poised to decline.Most Read from BloombergKey Engines of US Consumer Spending Are Losing Steam All at OnceGameStop Shares Surge as Gill’s Reddit Return Shows Huge BetMnuchin Chases Wall Street Glory With His War Chest of Foreign MoneyHomebuyers Are Starting to Revolt Over Steep Prices Across U

Where Will Amazon Stock Be in 3 Years?

Shareholders hope the next three years are better than the last three.

Ford CEO on the future of EVs, Detroit, and his relationship with Tesla's Elon Musk

Ford CEO Jim Farley sat down for a new edition of Yahoo Finance's Opening Bid podcast, sharing why the auto giant has spent $1 billion to rebuild a Detroit landmark and why he remains bullish on EVs.

Oil falls 3% on OPEC production cut as demand worries surface

Oil slides after OPEC+ detailed the unwinding of its production cut plan going into 2025. Wall Street had expected cuts to be extended through the end of the year.

Sweden card payments market to grow by 8.3% in 2024, forecasts GlobalData

Sweden registered a growth of 11.7% in 2022, driven by a rise in consumer spending and is estimated to have grown to reach SEK1.4 trillion

The stock market took a narrow road to record highs last month

The stock market reverted back to its 2023 ways in May with six large tech stocks accounting for the majority of the gains in the S&P 500.

Berkshire Hathaway stock appears to drop 99.9% after NYSE technical glitch

A technical glitch sends shares of Berkshire Hathaway tumbling 99%.

big data research hype or revolution

The AI Revolution Is Already Losing Steam

N vidia reported eye-popping revenue last week. Elon Musk just said human-level artificial intelligence is coming next year. Big tech can’t seem to buy enough AI-powering chips. It sure seems like the AI hype train is just leaving the station, and we should all hop aboard.

But significant disappointment may be on the horizon, both in terms of what AI can do, and the returns it will generate for investors.

The rate of improvement for AIs is slowing, and there appear to be fewer applications than originally imagined for even the most capable of them. It is wildly expensive to build and run AI. New, competing AI models are popping up constantly, but it takes a long time for them to have a meaningful impact on how most people actually work.

These factors raise questions about whether AI could become commoditized, about its potential to produce revenue and especially profits, and whether a new economy is actually being born. They also suggest that spending on AI is probably getting ahead of itself in a way we last saw during the fiber-optic boom of the late 1990s—a boom that led to some of the biggest crashes of the first dot-com bubble.

The pace of improvement in AIs is slowing

Most of the measurable and qualitative improvements in today’s large language model AIs like OpenAI’s ChatGPT and Google’s Gemini—including their talents for writing and analysis—come down to shoving ever more data into them.

These models work by digesting huge volumes of text, and it’s undeniable that up to now, simply adding more has led to better capabilities. But a major barrier to continuing down this path is that companies have already trained their AIs on more or less the entire internet, and are running out of additional data to hoover up. There aren’t 10 more internets’ worth of human-generated content for today’s AIs to inhale.

To train next generation AIs, engineers are turning to “synthetic data,” which is data generated by other AIs. That approach didn’t work to create better self-driving technology for vehicles, and there is plenty of evidence it will be no better for large language models, says Gary Marcus, a cognitive scientist who sold an AI startup to Uber in 2016.

AIs like ChatGPT rapidly got better in their early days, but what we’ve seen in the past 14-and-a-half months are only incremental gains, says Marcus. “The truth is, the core capabilities of these systems have either reached a plateau, or at least have slowed down in their improvement,” he adds.

Further evidence of the slowdown in improvement of AIs can be found in research showing that the gaps between the performance of various AI models are closing. All of the best proprietary AI models are converging on about the same scores on tests of their abilities, and even free, open-source models, like those from Meta and Mistral, are catching up.

AI could become a commodity

A mature technology is one where everyone knows how to build it. Absent profound breakthroughs—which become exceedingly rare—no one has an edge in performance. At the same time, companies look for efficiencies, and whoever is winning shifts from who is in the lead to who can cut costs to the bone. The last major technology this happened with was electric vehicles, and now it appears to be happening to AI.

The commoditization of AI is one reason that Anshu Sharma, chief executive of data and AI-privacy startup Skyflow, and a former vice president at business-software giant Salesforce, thinks that the future for AI startups—like OpenAI and Anthropic—could be dim. While he’s optimistic that big companies like Microsoft and Google will be able to entice enough users to make their AI investments worthwhile, doing so will require spending vast amounts of money over a long period of time, leaving even the best-funded AI startups—with their comparatively paltry warchests—unable to compete.

This is happening already. Some AI startups have already run into turmoil, including Inflection AI—its co-founder and other employees decamped for Microsoft in March. The CEO of Stability AI, which built the popular image-generation AI tool Stable Diffusion, left abruptly in March. Many other AI startups, even well-funded ones, are apparently in talks to sell themselves.

Today’s AI’s remain ruinously expensive to run

An oft-cited figure in arguments that we’re in an AI bubble is a calculation by Silicon Valley venture-capital firm Sequoia that the industry spent $50 billion on chips from Nvidia to train AI in 2023, but brought in only $3 billion in revenue.

That difference is alarming, but what really matters to the long-term health of the industry is how much it costs to run AIs.

Numbers are almost impossible to come by, and estimates vary widely, but the bottom line is that for a popular service that relies on generative AI, the costs of running it far exceed the already eye-watering cost of training it. That’s because AI has to think anew every single time something is asked of it, and the resources that AI uses when it generates an answer are far larger than what it takes to, say, return a conventional search result. For an almost entirely ad-supported company like Google, which is now offering AI-generated summaries across billions of search results, analysts believe delivering AI answers on those searches will eat into the company’s margins.

In their most recent earnings reports, Google, Microsoft and others said their revenue from cloud services went up, which they attributed in part to those services powering other company’s AIs. But sustaining that revenue depends on other companies and startups getting enough value out of AI to justify continuing to fork over billions of dollars to train and run those systems. That brings us to the question of adoption.

Narrow use cases, slow adoption

A recent survey conducted by Microsoft and LinkedIn found that three in four white-collar workers now use AI at work. Another survey, from corporate expense-management and tracking company Ramp, shows about a third of companies pay for at least one AI tool, up from 21% a year ago.

This suggests there is a massive gulf between the number of workers who are just playing with AI, and the subset who rely on it and pay for it. Microsoft’s AI Copilot, for example, costs $30 a month.

OpenAI doesn’t disclose its annual revenue, but the Financial Times reported in December that it was at least $2 billion, and that the company thought it could double that amount by 2025.

That is still a far cry from the revenue needed to justify OpenAI’s now nearly $90 billion valuation. The company’s recent demo of its voice-powered features led to a 22% one-day jump in mobile subscriptions, according to analytics firm Appfigures. This shows the company excels at generating interest and attention, but it’s unclear how many of those users will stick around.

Evidence suggests AI isn’t nearly the productivity booster it has been touted as, says Peter Cappelli, a professor of management at the University of Pennsylvania’s Wharton School. While these systems can help some people do their jobs, they can’t actually replace them. This means they are unlikely to help companies save on payroll. He compares it to the way that self-driving trucks have been slow to arrive, in part because it turns out that driving a truck is just one part of a truck driver’s job.

Add in the myriad challenges of using AI at work. For example, AIs still make up fake information, which means they require someone knowledgeable to use them. Also, getting the most out of open-ended chatbots isn’t intuitive, and workers will need significant training and time to adjust.

Changing people’s mindsets and habits will be among the biggest barriers to swift adoption of AI. That is a remarkably consistent pattern across the rollout of all new technologies.

None of this is to say that today’s AI won’t, in the long run, transform all sorts of jobs and industries. The problem is that the current level of investment—in startups and by big companies—seems to be predicated on the idea that AI is going to get so much better, so fast, and be adopted so quickly that its impact on our lives and the economy is hard to comprehend.

Mounting evidence suggests that won’t be the case.

For more WSJ Technology analysis, reviews, advice and headlines, sign up for our weekly newsletter.

The AI Revolution Is Already Losing Steam

IMAGES

  1. Gartner Mentions Datatron in Three Hype Cycles in 2021

    big data research hype or revolution

  2. Gartner hype cycle for data management.

    big data research hype or revolution

  3. All aboard the Hype Cycle! What's DataOps? Well, it has no standards or

    big data research hype or revolution

  4. QS906

    big data research hype or revolution

  5. Big Data Overview

    big data research hype or revolution

  6. Big Data: Big Hype or Big Possibilities?

    big data research hype or revolution

VIDEO

  1. IS IT WORTH THE HYPE? FINALLY TRYING THE REVOLUTION PRO MIRACLE CREAM HONEST REVIEW

  2. Navigating the Generative AI Revolution: From Hype to Impact

  3. Exploring the Future: The Technological Revolution #computer #automation #technology #innovation

  4. AI hype drives valuations higher as Anthropic looks to raise funding

  5. Behind the Hype

  6. Nvidia Stock Soaring: AI Revolution or Hype?

COMMENTS

  1. QS906

    Module Outline. Big data is said to be transforming science and social science. In this module, you will critically engage with this claim and explore the ways in which the rapid rise of big data impacts on research processes and practices in a growing range of disciplinary areas and fields of study. In particular, the module considers the ...

  2. PDF Big Data

    Big Data - Hype or Revolution? Rob Kitchin IntroductIon The etymology of 'big data' can be traced to the mid-1990s, first used to refer to the han-dling and analysis of massive datasets (Diebold, 2012). Laney (2001) refined the definition to refer to data characterized by the now standard 3Vs, with big data being:

  3. IM952 (QS906): Big Data Research: Hype or Revolution?

    Module Description. Big data is said to be transforming science and social science. In this module, you will critically engage with this claim and explore the ways in which the rapid rise of big data impacts on research processes and practices in a growing range of disciplinary areas and fields of study. In particular, the module considers the ...

  4. IM914: Big Data Research: Hype or Revolution?

    You will also examine how we might we use big data research both as a way to resist and/or shape global transformations, how big data might impact on the future of social science, and what challenges lie ahead for social science research given the impact of big data.

  5. Accounting, accountability, social media and big data: revolution or hype?

    The intention is not to offer a comprehensive review, but to stimulate and conversation.,The authors review several existing studies exploring technology-enabled networks and highlight some of the key aspects featuring social media and big data, before offering a classification of existing research efforts, as well as opportunities for future ...

  6. Accounting, accountability, social media and big data: Revolution or hype?

    Abstract. Purpose The purpose of this paper is to outline an agenda for researching the relationship between technology-enabled networks - such as social media and big data - and the ...

  7. PDF Revolution in Data

    Big Data reached the peak of its hype cycle in 2014, and then outgrew its own position - in 2015, Forrester dropped Big Data off of its annual hype cycle for emerging technologies report, their lead analyst, Betsy Burton, explaining that "Big Data has become a part of many hype cycles."1 Illustrating this fact,

  8. [PDF] Big Data

    Semantic Scholar extracted view of "Big Data - Hype or Revolution?" by Rob Kitchin. Skip to search form Skip to main content Skip to account menu. Semantic Scholar's Logo. Search 216,881,724 papers from all fields of science. Search. Sign In Create Free Account. DOI: 10.4135/9781473983847.N3;

  9. Big Data

    Big data - hype or revolution?. In Sloan, L., & Quan-Haase, A. The SAGE Handbook of social media research methods (pp. 27-38). 55 City Road, London: SAGE Publications Ltd doi: 10.4135/9781473983847: Keywords: big data; research paradigms; data-driven science; computational social science; digital humanities; social media data; Academic Unit:

  10. Accounting, accountability, social media and big data: Revolution or hype?

    Three areas of investigation are identified: new performance indicators based on social media and big data;governance of social media and big data information resources; and, finally, social media and big data's alteration of information and decision-making processes.Originality/value - We are currently experiencing a technological ...

  11. What's holding up the big data revolution in healthcare?

    Poor data quality, incompatible datasets, inadequate expertise, and hype Big data refers to datasets that are too large or complex to analyse with traditional methods.1 Instead we rely on machine learning—self updating algorithms that build predictive models by finding patterns in data.2 In recent years, a so called "big data revolution" in healthcare has been promised345 so often that ...

  12. The Data Revolution: A Critical Analysis of Big Data, Open Data and

    The Data Revolution is a must read for anyone interested in why data have become so important in the contemporary era. Thoroughly updated, including ten new chapters, the book provides an accessible and comprehensive: introduction to thinking conceptually about the nature of data and the field of critical data studies overview of big data, open ...

  13. Book Review The Data Revolution: Big Data, Open Data, Data

    Rob Kitchin's The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences is a timely volume that knits together research and perspectives on the twenty-first century data deluge from multiple scholarly communities connected to CSCW. The book's aim is threefold. First, to examine in detail and reflect upon the nature of data and the assemblages they are a part of.

  14. Big Data: The Management Revolution

    The differences between big data and analytics are a matter of volume, velocity, and variety: More data now cross the internet every second than were stored in the entire internet 20 years ago.

  15. The Data Revolution: Big Data, Open Data, Data Infrastructures and

    "Carefully distinguishing between big data and open data, and exploring various data infrastructures, Kitchin vividly illustrates how the data landscape is rapidly changing and calls for a revolution in how we think about data." - Evelyn Ruppert, Goldsmiths, University of London "Deconstructs the hype around the 'data revolution' to carefully guide us through the histories and the futures ...

  16. QS906

    By the end of the module, students should be able to: • Appreciate the rising issues and challenges at the forefront of big data research; • Critically engage with the ways in which big data problematise core methodological issues in research; • Apply general issues involved in doing research with big data to more specific thematic areas ...

  17. PDF Accounting, Accountability, Social Media and Big Data: Revolution or Hype?

    research. 2. Big Data and Implications for Accounting . A large attention by practitioner and academics on has emerged from the Big Data diffusion of social media application, and the related possibility to observe and crawl data from public interfaces. However, the term Big Data does not refer solely to Social Media Data but

  18. Awaiting the Second Big Data Revolution: From Digital Noise ...

    "Big data"—the collection of vast quantities of data about individual behavior via online, mobile, and other data-driven services—has been heralded as the agent of a third industrial revolution—one with raw materials measured in bits, rather than tons of steel or barrels of oil. Yet the industrial revolution transformed not just how firms made things, but the fundamental approach to ...

  19. Accounting, accountability, social media and big data: revolution or hype?

    Downloadable (with restrictions)! Purpose - The purpose of this paper is to outline an agenda for researching the relationship between technology-enabled networks - such as social media and big data - and the accounting function. In doing so, it links the contents of an unfolding area research with the papers published in this special issue ofAccounting, Auditing and Accountability Journal.

  20. Accounting, accountability, social media and big data: revolution or hype?

    A first step toward an inclusive big data research agenda for IS is offered by focusing on the interplay between big data's characteristics, the information value chain encompassing people-process-technology, and the three dominant IS research traditions (behavioral, design, and economics of IS). Expand

  21. Accounting, accountability, social media and big data: revolution or hype?

    Accounting, accountability, social media and big data: revolution or hype? BUSCO, CRISTIANO; 2017 Abstract Purpose: The purpose of this paper is to outline an agenda for researching the relationship between technology-enabled networks - such as social media and big data - and the accounting function.

  22. Hadoop and the Big Data Revolution

    Hadoop and the Big Data Revolution. Oct 10, 2012. By Guy Harrison. It's in the nature of hype bubbles to obscure important new paradigms behind a cloud of excitement and exaggerated claims. For example, the phrase "big data" has been so widely and poorly applied that the term has become almost meaningless. Nevertheless, beneath the hype ...

  23. Big Data: Latest Articles, News & Trends

    8 Best Data Science Tools and Software. Apache Spark and Hadoop, Microsoft Power BI, Jupyter Notebook and Alteryx are among the top data science tools for finding business insights. Compare their ...

  24. The S&P 500 could climb 23% next year if Treasury yields fall and AI

    The research firm predicted the benchmark index could climb to 6,500 by the end of 2025, implying a 23% gain from its current level. That's assuming that falling Treasury yields and the AI will ...

  25. Accounting, accountability, social media and big data: revolution or hype?

    An agenda for researching the relationship between technology-enabled networks - such as social media and big data - and the accounting function is outlined and a classification of existing research efforts, as well as opportunities for future research are offered. Purpose - The purpose of this paper is to outline an agenda for researching the relationship between technology-enabled ...

  26. Forget Nvidia: 1 Artificial Intelligence (AI) Stock to Buy Instead

    Current Price. $173.96. Price as of May 31, 2024, 4:00 p.m. ET. This leading internet enterprise deserves a closer look from investors. There might be no company that represents the artificial ...

  27. The AI Revolution Is Already Losing Steam

    The commoditization of AI is one reason that Anshu Sharma, chief executive of data and AI-privacy startup Skyflow, and a former vice president at business-software giant Salesforce, thinks that ...