The social data revolution is the shift in human communication patterns towards increased personal information sharing and its related implications, made possible by the rise of
social networks
A social network is a social structure consisting of a set of social actors (such as individuals or organizations), networks of dyadic ties, and other social interactions between actors. The social network perspective provides a set of meth ...
in the early 2000s. This phenomenon has resulted in the accumulation of
unprecedented amounts of public data.
This large and frequently updated data source has been described as a new type of scientific instrument for the social sciences.
Several independent researchers have used social data to "nowcast" and forecast trends such as unemployment, flu outbreaks,
mood of whole populations,
travel spending and political opinions in a way that is faster, more accurate and cheaper than standard government reports or
Gallup polls.
Social data refers to data individuals create that is knowingly and voluntarily shared by them. Cost and overhead previously rendered this semi-public form of communication unfeasible, but advances in social networking technology from 2004–2010 has made broader concepts of sharing possible. The types of data users are sharing include
geolocation
Geopositioning is the process of determining or estimating the geographic position of an object or a person.
Geopositioning yields a set of Geographic coordinate system, geographic coordinates (such as latitude and longitude) in a given map datum ...
, medical data, dating preferences, open thoughts, interesting news articles, etc.
The social data revolution enables not only new business models like the ones on
Amazon.com
Amazon.com, Inc., doing business as Amazon, is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence. Founded in 1994 by Jeff Bezos in Bellevu ...
but also provides large opportunities to improve
decision-making
In psychology, decision-making (also spelled decision making and decisionmaking) is regarded as the Cognition, cognitive process resulting in the selection of a belief or a course of action among several possible alternative options. It could be ...
for public policy and
international development
International development or global development is a broad concept denoting the idea that societies and countries have differing levels of economic development, economic or human development (economics), human development on an international sca ...
.
The analysis of large amounts of social data leads to the field of
computational social science. Classic examples include the study of media content or social media content.
Evolution of social data
Every internet activity leaves behind traces of data (a
digital footprint) which can be used to learn more about the user.
As use of the internet is becoming more widespread, the datafication of the world is progressing rapidly: Currently, around 16 zettabytes of data are produced per year and for the year 2025 163 zettabytes of data are expected.
This has led to data becoming a critical commodity.
This ties together all societal actors: Public institutions, private firms, as well as individuals, each relying on data in a unique way.
Governments have been collecting
data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
for centuries to ensure the continuance of institutional systems, through limiting the risk of defaulting credits, collecting tax based on income and providing the necessary infrastructure under consideration of their citizens' demographic distribution.
In its beginnings, this data entailed written information for record keeping and control, including a census system.
This analogue process was very time- and cost-intensive, leaving little room for interpreting larger data sets.
Meanwhile, corporate technological developments have moved this offline data into the digital age, allowing visualization and data analytics.
In the public sphere, connecting the survey and poll methodologies with database computing, resulted in the ability to gather and store large data sets on individuals.
Web 2.0 and social network sites
Over the last few decades, the internet has shifted from being used mostly as a source of information about the world to being primarily used for communication, user-generated content,
data sharing
Data sharing is the practice of making data used for scholarly research available to other investigators. Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are consid ...
, and
community building
Community building is a field of practices directed toward the creation or enhancement of community among individuals within a regional area (such as a neighborhood) or with a common need or interest. It is often encompassed under the fields o ...
.
[Fuchs, Christian. 2011. "Web 2.0, Prosumption, and Surveillance." ''Surveillance & Society'' 8(3): 288-309.] This is what many consider to be the development of "
Web 2.0
Web 2.0 (also known as participative (or participatory) web and social web) refers to websites that emphasize user-generated content, ease of use, participatory culture, and interoperability (i.e., compatibility with other products, systems, a ...
" social network sites such as
Facebook
Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
and
YouTube
YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
are the foundation of the development of Web 2.0 and the shift to social data sharing.
Early examples of social data websites are
Craigslist
Craigslist (stylized as craigslist) is a privately held American company operating a classified advertisements website with sections devoted to jobs, housing, for sale, items wanted, services, community service, gigs, résumés, and discussi ...
and the wishlists of
Amazon.com
Amazon.com, Inc., doing business as Amazon, is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence. Founded in 1994 by Jeff Bezos in Bellevu ...
. Both enable users to communicate information to anybody who is looking for it. They differ in their approach to
identity. Craigslist leverages the power of anonymity, while Amazon.com leverages the power of persistent identity, based on the history of the customer with the firm. The job market is even being shaped by the information people share about themselves on sites like
LinkedIn
LinkedIn () is an American business and employment-oriented Social networking service, social network. It was launched on May 5, 2003 by Reid Hoffman and Eric Ly. Since December 2016, LinkedIn has been a wholly owned subsidiary of Microsoft. ...
and Facebook.
Examples of more sophisticated social data sites are
Twitter
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
and Facebook. On Twitter, sending a message or tweet is as simple as sending an SMS text message. Twitter made this C2W, customer to the world: Any tweet a user sends can potentially be read by the entire world. Facebook focuses on interactions between friends, C2C in traditional language. It provides many ways for collecting data from its users: "
tag" a friend in a photo, "comment" on what they posted, or just "like" it. These data are the basis for sophisticated models of the relationships between users. They can be used to significantly increase the
relevance
Relevance is the connection between topics that makes one useful for dealing with the other. Relevance is studied in many different fields, including cognitive science, logic, and library and information science. Epistemology studies it in gener ...
of what is shown to the user, and for advertising purposes.
By 2009, the popularity of social networking sites had increased to four times of what it had been in 2005. As of 2013, Twitter has over 250 million users sharing almost 500 million tweets per day, and Facebook has well over one billion users around the world.
Business sector and social data
Companies often use the data that is shared via social networking sites and other forms of data sharing avenues, advertisers, etc.
[Jai, Tun-Min, and King, Nancy J. 2016. "Privacy versus reward: Do loyalty programs increase consumers' willingness to share personal information with third-party advertisers and data brokers?" ''Journal of Retailing and Consumer Services'' 28: 296-303.] Social networking sites, for example, can sell user data to advertisers and other entities which they can then influence consumer decisions.
Data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
is also used to gather this information.
While websites and other applications were the origins of this data collection, with improvements in technology, many devices that are used in daily life have the ability to collect data on individuals and therefore are increasing the amount of personal data that is available (ex. smartphones, tech watches, music devices, etc.).
[Morey, Timothy, Forbath, Theodore, and Schoop, Allison. 2015. "Customer data: designing for transparency and trust." ''Harvard Business Review'' 93(5): 96-105]
This growth of people's
digital identity
A digital identity is data stored on Computer, computer systems relating to an individual, organization, application, or device. For individuals, it involves the collection of personal data that is essential for facilitating automated access to ...
– the information available via these electronic sources- is being used by companies and organizations to improve products and services and to reduce costs by targeting what consumers want/expect.
The data that can be gathered can include shopping experiences, social media preferences, demographic information and more.
Using this data can allow for better personalization of products and has become an expected and vital aspect of product use and production.
The data that is accessible about consumers can be used to infer behavioral patterns of consumers.
[Smith, Natasha. 2015. "The datafication of marketing." ''DM News:'' 16+. Retrieved from http://go.galegroup.com/] For example, location information is used to assess when and where consumers are going to target ads and promotions based on what stores consumers are going to.
Online retailers also have gained insight as to how better personalize the online shopping experience through data gathered during the online transaction.
Businesses can even use consumer data to determine whether different shelf spacing of products has an effect on consumer purchasing decisions as well as assess potential cross-item marketing potentials based on items often purchased together.
Social commerce
While businesses and advertisers often take advantage of the consumer data available, consumers also use other users' information for their purchase decisions.
Social commerce sites are where consumers share product/service experiences and opinions and other information.
[Liu, Libo, Cheung, Christy M.K., and Lee, Matthew K.O. 2016. "An empirical investigation of information sharing behavior on social commerce sites." ''International Journal of Information Management'' 36(5): 686-699.] A famous example of such a site is
Pinterest
Pinterest is an American social media service for publishing and discovery of information in the form of digital Bulletin board, pinboards. This includes recipes, home, style, motivation, and inspiration on the Internet using image sharing. Pint ...
which has over 100 million users.
These sites and other online sources of product/brand information are influential on consumer's purchasing decisions. It is estimated that about 67% of online customers use this information in making their purchase decisions.
These sites create an environment that is considered trusted by consumers since the information is coming from other consumers.
Other uses of social data
With the vast amount of data available about individuals that are accessible, the potential uses of this information are growing.
The healthcare sector has many potential uses for this data. Information gathered from social media, and other social data sharing sources can be used to predict the flu, disease outbreaks, how emergency responses are handled, and more.
[Nguyen, Duc T., and Jung, Jai E. 2016. "Real-time event detection for online behavioral analysis of big social data." ''Future Generation Computer Systems'' 66: 137-145.] With the use of Twitter and
geotags, medical researchers can evaluate the health of a particular neighborhood and use that information to provide better outreach and services.
Medtronic has developed a digital blood glucose meter that allows health care providers and patients know about low levels.
Social data can also be used to assess reactions to crises.
[Spence, Patric R., Lachlan, Kenneth A., and Rainear, Adam M. 2016. "Social media and crisis research: Data collection and directions." ''Computers in Human Behavior'' 54: 667-672.] After
Hurricane Sandy
Hurricane Sandy (unofficially referred to as Superstorm Sandy) was an extremely large and devastating tropical cyclone which ravaged the Caribbean and the coastal Mid-Atlantic (United States), Mid-Atlantic region of the United States in late ...
, researchers used Twitter to evaluate the emotions and issues that those affected were facing.
This information can potentially be used to help better prepare and respond to future crises.
This data can be used to assist with urban planning. The city of
Boston
Boston is the capital and most populous city in the Commonwealth (U.S. state), Commonwealth of Massachusetts in the United States. The city serves as the cultural and Financial centre, financial center of New England, a region of the Northeas ...
has used rider information from
Uber
Uber Technologies, Inc. is an American multinational transportation company that provides Ridesharing company, ride-hailing services, courier services, food delivery, and freight transport. It is headquartered in San Francisco, California, a ...
to improve transportation planning and road maintenance.
Computational social science
Using social data for research purposes has led to the development of computational social science. Computational social science combines social science, computer science, and network science. This field emerged in 2009.
[Mann, A. 2016. Core concept: computational social science. ''PNAS, 113''(3). 468-470. doi: 10.1073/pnas.1524881113] Before the rise of social data and the technological advances that supported it, researchers were limited to a narrow view of information based on individuals since their primary form of research relied on interviews.
With the vast amount of social data available today, researchers can now analyze a wider group and can obtain a broader view of information. They can use social networks, cell phone data, and perform online experiments that allow them to gather more information than before.
Privacy concerns
With the amount of data available about individuals accessible by many sources, privacy has become a major concern. Security breaches of customer and other social information such as the compromise of more than 56 million
Home Depot
The Home Depot, Inc., often referred to as Home Depot, is an American multinational corporation, multinational home improvement retail corporation that sells tools, construction products, appliances, and services, including fuel and transportat ...
customers' credit card information
have impacted the concern of privacy with social data. How companies are using, and the potential misuse of the personal information gathered is a concern for the majority of consumers.
Despite this, many people do not know how social networking sites and other sources are using and selling their data. In 2014 study, only 25% of online users knew that their location could be accessed and only 14% knew that their web-surfing history could be accessed and shared.
Even though privacy concern is a critical factor in people's sharing of personal information on the internet and overall internet involvement,
most people are willing to share this information if the benefits of doing so outweigh the potential privacy and security costs.
Consumers enjoy the personalization of products and services that are possible because of this information gathering and despite the concerns, continue to use them.
International development
In his study of the data revolution in international development, Social Sciences Professor at UC Davis, Martin Hilbert, argued that the natural next step from
information societies, fueled by
ICT, since the late 1990s are
knowledge societies informed by
Big Data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
analysis. Decision-making informed by big data analysis has improved both efficiency and productivity in the developed world. Hilbert examines the challenges and potential of the data revolution on "the unruly world of international development."
Types of data
Hilbert identified four types of data available in large quantities by 2013: words, locations, nature, and behavior.
Words
Individual interactions with the internet, such as words in comments, social media postings, and Google search term volumes, offer an increasingly large source of big data. Typically statistics are generated through a census or a probability survey, for example, the
Annual Social and Economic Supplement (ASEC),
Current Population Survey
The Current Population Survey (CPS) is a monthly survey of about 60,000 U.S. households conducted by the United States Census Bureau for the Bureau of Labor Statistics (BLS). The BLS uses the data to publish reports early each month called the Em ...
(CPS),
American Community Survey
The American Community Survey (ACS) is an annual demographics survey program conducted by the United States Census Bureau. It regularly gathers information previously contained only in the long form of the United States census, decennial census ...
(ACS),
National Health Interview Survey
The National Health Interview Survey (NHIS) is an annual, cross-sectional survey intended to provide nationally representative estimates on a wide range of health status and utilization measures among the nonmilitary, noninstitutionalized popula ...
(NHIS) in the United States or administrative records, such as payroll, unemployment, Social Security income taxes, scanner data and credit card data and other commercial transaction records.
Weatherhead University Professor Gary King described how the revolution is not just regarding the quantity of data available but in the ability to do something with the data to benefit society.
Location
Global Positioning System
The Global Positioning System (GPS) is a satellite-based hyperbolic navigation system owned by the United States Space Force and operated by Mission Delta 31. It is one of the global navigation satellite systems (GNSS) that provide ge ...
(GPS)-enabled mobile tablets, phones,
Radio-frequency identification
Radio-frequency identification (RFID) uses electromagnetic fields to automatically Automatic identification system, identify and Tracking system, track tags attached to objects. An RFID system consists of a tiny radio transponder called a tag, ...
(RFID) chips (part of
Automatic identification and data capture
Automatic identification and data capture (AIDC) refers to the methods of automatically identifying objects, collecting data about them, and entering them directly into computer systems, without human involvement. Technologies typically considered ...
(AIDC) technologies),
telematics
Telematics is an interdisciplinary field encompassing telecommunications, vehicular technologies (road transport, road safety, etc.), electrical engineering (sensors, instrumentation, wireless communications, etc.), and computer science (multimedia ...
,
Location-based game
A location-based game (also called location-enabled game, geolocation-based game, or simply geo game) is a type of game in which the gameplay evolves and progresses via a player's real world location. Location-based games must provide some mechanis ...
s, etc. provide data on absolute location and relative movement.
Nature
Hilbert categorizes data on natural processes under 'Nature' which includes sensors that provide data on moisture in the air and temperature.
Behavior
Data can be generated from user-behavior in
multiplayer online game
A multiplayer video game is a video game in which more than one person can play in the same game environment at the same time, either locally on the same computing system (couch co-op), on different computing systems via a local area network, or ...
s,
such as ''
League of Legends
''League of Legends'' (''LoL'', commonly referred to as ''League'', is a multiplayer online battle arena video game developed and published by Riot Games. Inspired by ''Defense of the Ancients'', a Mod (video games), custom map for ''Warcraf ...
'', ''
World of Warcraft
''World of Warcraft'' (''WoW'') is a 2004 massively multiplayer online role-playing (MMORPG) video game developed and published by Blizzard Entertainment for Windows and Mac OS X. Set in the '' Warcraft'' fantasy universe, ''World of War ...
'', ''
Minecraft
''Minecraft'' is a 2011 sandbox game developed and published by the Swedish video game developer Mojang Studios. Originally created by Markus Persson, Markus "Notch" Persson using the Java (programming language), Java programming language, the ...
'', ''
Call of Duty
''Call of Duty'' is a first-person shooter military video game series and media franchise published by Activision, starting in 2003. The games were first developed by Infinity Ward, then by Treyarch and Sledgehammer Games. Several spin-of ...
'', and ''
Dota 2
''Dota 2'' is a 2013 multiplayer online battle arena (MOBA) video game by Valve Corporation, Valve. The game is a sequel to ''Defense of the Ancients'' (''DotA''), a community-created Mod (video gaming), mod for Blizzard Entertainment's ''War ...
''.
Nathan Eagle's, a computer scientist at the Santa Fe Institute in New Mexico, began using cellphones in the early 2000s to collect accurate, large-scale data about real social interactions.
Reality mining whitepaper
/ref> The project was named one of the "10 Technologies Most Likely To Change The Way We Live" by the MIT Technology Review
''MIT Technology Review'' is a bimonthly magazine wholly owned by the Massachusetts Institute of Technology. It was founded in 1899 as ''The Technology Review'', and was re-launched without "''The''" in its name on April 23, 1998, under then pu ...
.[Eagle's Harvard Biography]
/ref>
See also
* Big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
* Digital Revolution
The Information Age is a History by period, historical period that began in the mid-20th century. It is characterized by a rapid shift from traditional industries, as established during the Industrial Revolution, to an economy centered on info ...
* Open data
Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license.
The goals of the open data movement are similar to those of other "open(-so ...
* Recommendation engine
A recommender system (RecSys), or a recommendation system (sometimes replacing ''system'' with terms such as ''platform'', ''engine'', or ''algorithm'') and sometimes only called "the algorithm" or "algorithm", is a subclass of information fil ...
* Reputation system
A reputation system is a program or algorithm that allow users of an online community to rate each other in order to build trust (social sciences), trust through reputation. Some common uses of these systems can be found on E-commerce websites s ...
* Social bot
A social bot, also described as a social AI or social algorithm, is a software agent that communicates autonomously on social media. The messages (e.g. tweets) it distributes can be simple and operate in groups and various configurations with ...
* Social capital
Social capital is a concept used in sociology and economics to define networks of relationships which are productive towards advancing the goals of individuals and groups.
It involves the effective functioning of social groups through interper ...
* Social cloud computing
* Social data analysis
* Social graph
A social graph is a graph that represents social relations between entities. It is a model or representation of a social network. The social graph has been referred to as "the global mapping of everybody and how they're related".
The term w ...
* Social profiling
* Social technology
Social technology is a way of using human, intellectual and digital resources in order to influence social processes. For example, one might use social technology to ease social procedures via social software and social hardware, which might in ...
References
{{Social networking
Revolutions by type
Social influence
Social information processing
Social networks