social media mining
   HOME

TheInfoList



OR:

Social media mining is the process of obtaining data from
user-generated content User-generated content (UGC), alternatively known as user-created content (UCC), emerged from the rise of web services which allow a system's User (computing), users to create Content (media), content, such as images, videos, audio, text, testi ...
on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of
mining Mining is the Resource extraction, extraction of valuable geological materials and minerals from the surface of the Earth. Mining is required to obtain most materials that cannot be grown through agriculture, agricultural processes, or feasib ...
for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services. Social media mining uses concepts from
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
,
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
. Mining is based on
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
,
network science Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, Cognitive network, cognitive and semantic networks, and social networks, considering distinct eleme ...
,
sociology Sociology is the scientific study of human society that focuses on society, human social behavior, patterns of Interpersonal ties, social relationships, social interaction, and aspects of culture associated with everyday life. The term sociol ...
,
ethnography Ethnography is a branch of anthropology and the systematic study of individual cultures. It explores cultural phenomena from the point of view of the subject of the study. Ethnography is also a type of social research that involves examining ...
, optimization and mathematics. It attempts to formally represent, measure and model patterns from social media data. In the 2010s, major corporations, governments and not-for-profit organizations began mining to learn about customers, clients and others. Platforms such as Google, Facebook (partnered with Datalogix and BlueKai) conduct mining to target users with advertising. Scientists and
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
researchers extract insights and design product features. Users may not understand how platforms use their data. Users tend to click through Terms of Use agreements without reading them, leading to ethical questions about whether platforms adequately protect users' privacy. During the
2016 United States presidential election United States presidential election, Presidential elections were held in the United States on November 8, 2016. The Republican Party (United States), Republican ticket of businessman Donald Trump and Indiana Governor, Indiana governor Mike P ...
, Facebook allowed
Cambridge Analytica Cambridge Analytica Ltd. (CA), previously known as SCL USA, was a British political consulting firm that came to prominence through the Facebook–Cambridge Analytica data scandal. It was started in 2013, as a subsidiary of the private intell ...
, a political consulting firm linked to the
Trump Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who is the 47th president of the United States. A member of the Republican Party (United States), Republican Party, he served as the 45 ...
campaign, to analyze the data of an estimated 87 million Facebook users to profile voters, creating controversy when this was revealed.


Background

As defined by Kaplan and Haenlein, social media is the "group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." There are many categories of social media including, but not limited to, social networking (
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
or
LinkedIn LinkedIn () is an American business and employment-oriented Social networking service, social network. It was launched on May 5, 2003 by Reid Hoffman and Eric Ly. Since December 2016, LinkedIn has been a wholly owned subsidiary of Microsoft. ...
), microblogging (
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
), photo sharing (
Flickr Flickr ( ) is an image hosting service, image and Online video platform, video hosting service, as well as an online community, founded in Canada and headquartered in the United States. It was created by Ludicorp in 2004 and was previously a co ...
,
Instagram Instagram is an American photo sharing, photo and Short-form content, short-form video sharing social networking service owned by Meta Platforms. It allows users to upload media that can be edited with Social media camera filter, filters, be ...
,
Photobucket Photobucket is an image hosting and video hosting website, web services suite, and online community based in Denver, Colorado, United States. Photobucket once hosted more than 10 billion images from 100 million registered members. Li ...
, or
Picasa Picasa was a cross-platform image organizer and image viewer for organizing and editing digital photos, integrated with a now defunct photo-sharing website, originally created by a company named Lifescape (which at that time was incubated by ...
), news aggregation ( Google Reader,
StumbleUpon StumbleUpon was a website, browser extension, toolbar, and mobile app with a "Stumble!" button that, when pushed, opened a semi-random website or video that matched the user's interests, similar to a random web search engine. Users were able to ...
, or
Feedburner Feedburner, Inc. is a web feed management service primarily for monetizing RSS feeds, primarily by inserting targeted advertisements into them. It was founded in 2004 and acquired by Google in 2007. Services Services provided to publishers inclu ...
), video sharing (
YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
,
MetaCafe Metacafe was an Israeli video-sharing website, launched in July 2003. During the mid-2000s it was one of the largest video-sharing websites, though it eventually began to be superseded by YouTube, Vimeo and Dailymotion. On 28 August 2021, the pl ...
), livecasting (
Ustream IBM Watson Media (formerly Ustream and IBM Cloud Video) is an American virtual events platform company which is a division of IBM. Prior to the IBM acquisition, it had more than 180 employees across San Francisco, Los Angeles, and Budapest office ...
or Twitch), virtual worlds ( Kaneva), social gaming (
World of Warcraft ''World of Warcraft'' (''WoW'') is a 2004 massively multiplayer online role-playing (MMORPG) video game developed and published by Blizzard Entertainment for Windows and Mac OS X. Set in the '' Warcraft'' fantasy universe, ''World of War ...
), social search (
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
,
Bing Bing most often refers to: * Bing Crosby (1903–1977), American singer * Microsoft Bing, a web search engine Bing may also refer to: Food and drink * Bing (bread), a Chinese flatbread * Bing (soft drink), a UK brand * Bing cherry, a varie ...
, or Ask.com), and instant messaging (
Google Talk Google Talk was an instant messaging service that provided both text and voice communication. The instant messaging service was variously referred to colloquially as Gchat, Gtalk, or Gmessage among its users. Google Talk was also the name o ...
,
Skype Skype () was a proprietary telecommunications application operated by Skype Technologies, a division of Microsoft, best known for IP-based videotelephony, videoconferencing and voice calls. It also had instant messaging, file transfer, ...
, or Yahoo! messenger). The first social media website was introduced by
GeoCities GeoCities, later Yahoo! GeoCities, was a web hosting service that allowed users to create and publish websites for free and to browse user-created websites by their theme or interest, active from 1994 to 2009. GeoCities was started in November 1 ...
in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
coding. The first social networking site, SixDegrees.com, was introduced in 1997. Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma. Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media.


Uses

Social media mining is used across several industries including business development, social science research, health services, and educational purposes.Zafarani, R., Ali Abbasi, M., Liu, H., (2014). Social Media Mining. Cambridge University Press. http://dmml.asu.edu/smm. Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity.Tang, J., Chang, Y., Aggarwal, C., Liu, H., (2016).
A Survey of Signed Network Mining in Social Media
. ''ACM Computing Surveys'', 49: 3.
These forces are then measured via statistical analysis of the nodes and connections between these nodes. Social analytics also uses
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subje ...
, because social media users often relay positive or negative sentiment in their posts.Adedoyin-Olowe, M., Gaber, M., & Stahl, F., (2013). "A Survey of Data Mining Techniques for Social Media Analysis." This provides important social information about users' emotions on specific topics. These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network. Companies would be interested in this information in order to decide who they may hire for
influencer marketing Influencer marketing (also known as influence marketing) is a form of social media marketing involving endorsements and product placement from influencers, individuals and organizations who have a purported expert level of knowledge or so ...
. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites. Analysts also value measures of homophily: the tendency of two similar individuals to become friends. Users have begun to rely on information of other users' opinions in order to understand diverse subject matter. These analyses can also help create recommendations for individuals in a tailored capacity. By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with.


Perception

Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
. Companies such as these, considered "
Big Tech Big Tech, also referred to as the Tech Giants or Tech Titans, is a collective term for the largest and most influential technology companies in the world. The label draws a parallel to similar classifications in other industries, such as "Big Oi ...
" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021. It has been claimed by a multitude of anti-algorithm personalities, like Tristan Harris or Chamath Palihapitiya, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics. At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming
influencer A social media influencer, or simply influencer (also known as an online influencer), is a person who builds a grassroots online presence through engaging content such as photos, videos, and updates. This is done by using direct audience intera ...
s. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought.


Research


Research areas

* Social media event detection – Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events. * Public health monitoring and surveillance - Using large-scale analysis of social media to study large cohorts of patients and the general public, e.g. to obtain early warning signals of drug-drug interactions and adverse drug reactions, or understand human reproduction and sexual interest. *
Community structure In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the par ...
(Community Detection/Evolution/Evaluation) – Identifying communities on social networks, how they evolve, and evaluating identified communities, often without ground truth. * Network measures – Measuring centrality, transitivity, reciprocity, balance, status, and similarity in social media. *
Network model In computing, the network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship ty ...
s – Simulate networks with specific characteristics. Examples include random graphs (E-R models), Preferential attachment models, and small-world models. *
Information cascade An information cascade or informational cascade is a phenomenon described in behavioral economics and network theory in which a number of people make the same decision in a sequential fashion. It is similar to, but distinct from herd behavior. A ...
– Analyzing how information propagates in social media sites. Examples include herd behavior, information cascades, diffusion of innovations, and epidemic models. * Influence and
homophily Homophily () is a concept in sociology describing the tendency of individuals to associate and bond with similar others, as in the proverb "". The presence of homophily has been discovered in a vast array of network studies: over have observe ...
– Measuring network assortativity and measuring and modeling influence and homophily. * Recommendation in social media – recommending friends or items on social media sites. *
Social search Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram an ...
– Searching for information on the social web. * Sentiment analysis in social media – Identifying collectively subjective information, e.g. positive and negative, from social media data. * Social spammer detection – Detecting social spammers who send out unwanted spam content appearing on social networks and any website with user-generated content to targeted users, often corroborating to boost their social influence, legitimacy, credibility. * Feature selection with social media data – Transforming feature selection to harness the power of social media. * Trust in social media – Studying and understanding of trust in social media. *Distrust and negative links – Exploring negative links in social media. *Role of
social media Social media are interactive technologies that facilitate the Content creation, creation, information exchange, sharing and news aggregator, aggregation of Content (media), content (such as ideas, interests, and other forms of expression) amongs ...
in
crises A crisis (: crises; : critical) is any event or period that will lead to an unstable and dangerous situation affecting an individual, group, or all of society. Crises are negative changes in the human or environmental affairs, especially when ...
– Social media is continuing to play an important role during crises, particularly Twitter. Studies show that it is possible to detect earthquakes and rumors using tweets published during crisis. Developing tools to help first responders to analyze tweets towards better crisis response and developing techniques to provide them faster access to relevant tweets is an active area of research. *Location-based social network mining – Mining Human Mobility for Personalized POI Recommendation on Location-based Social Networks. *Provenance of information in social media –
Provenance Provenance () is the chronology of the ownership, custody or location of a historical object. The term was originally mostly used in relation to works of art, but is now used in similar senses in a wide range of fields, including archaeology, p ...
informs a user about the sources of a given piece of information. Social media can help in identifying the provenance of information due its unique features: user-generated content, user profiles, user interactions, and spatial or temporal information. *
Vulnerability management Vulnerability management is the "cyclical practice of identifying, classifying, prioritizing, remediating, and mitigating" software vulnerabilities. Vulnerability management is integral to computer security and network security, and must not be ...
– A user's
vulnerability Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally." The understanding of social and environmental vulnerability, as a methodological approach, involves ...
on a social networking sites can be managed in three sequential steps: (1) identifying new ways in which a user can be vulnerable, (2) quantifying or measuring a user's vulnerability, and (3) reducing or mitigating them. *Opinion mining on candidates/parties - Social media is a popular medium for candidates/parties to campaign and for gauging the public reaction to the campaigns. Social media can also be used as an indicator of the voters' opinion. Some research studies have shown that predictions made using social media posts can match (or even improve) traditional opinion polls.


Publication venues

Social media mining research articles are published in computer science, social science, and data mining conferences and journals:


Conferences

Conference papers can be found in proceedings of Knowledge Discovery and Data Mining (KDD), World Wide Web (WWW), Association for Computational Linguistics (ACL), Conference on Information and Knowledge Management (CIKM), International Conference on Data Mining (ICDM), Internet Measuring Conference (IMC). * KDD Conference – ACM SIGKDD Conference on Knowledge Discovery and Data Mining * WWW ConferenceInternational World Wide Web Conference * WSDM Conference – ACM Conference on Web Search and Data Mining * CIKM Conference – ACM
Conference on Information and Knowledge Management A conference is a meeting, often lasting a few days, which is organized on a particular subject, or to bring together people who have a common interest. Conferences can be used as a form of group decision-making, although discussion, not always d ...
* ICDM Conference –
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) organization, 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines. The IEEE ...
International Conference on Data Mining * Association for Computational Linguistics (ACL) * ASONAM conference - IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining * Internet Measuring Conference (IMC) * International Conference on Web and Social Media (ICWSM) * International Conference on Social Media & Society * International Conference on Web Engineering (ICWE) * The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases(ECML/PKDD), * International Joint Conferences on Artificial Intelligence (IJCAI), * Association for the Advancement of Artificial Intelligence (AAAI), * Recommender Systems (RecSys) * Computer-Human Interaction (CHI) * Social Computing Behavioral-Cultural Modeling and Prediction (SBP). * HT Conference – ACM Conference on Hypertext * SDM Conference – SIAM International Conference on Data Mining (
SIAM Thailand, officially the Kingdom of Thailand and historically known as Siam (the official name until 1939), is a country in Southeast Asia on the Mainland Southeast Asia, Indochinese Peninsula. With a population of almost 66 million, it spa ...
) * PAKDD Conference – The annual Pacific-Asia Conference on Knowledge Discovery and Data Mining


Journals

* DMKD Conference – Research Issues on Data Mining and Knowledge Discovery * ECML-PKDD Conference – European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases * IEEE Transactions on Knowledge and Data Engineering (TKDE), * ACM Transactions on Knowledge Discovery from Data (TKDD) * ACM Transactions on Intelligent Systems and Technology (TIST) * Social Network Analysis and Mining (SNAM) * Knowledge and Information Systems (KAIS) * ACM Transactions on the Web (TWEB) * World Wide Web Journal * Social Networks * Internet Mathematics * IEEE Intelligent Systems * SIGKDD Exploration. Social media mining is also present on many data management/database conferences such as the ICDE Conference, SIGMOD Conference and
International Conference on Very Large Data Bases International Conference on Very Large Data Bases or VLDB conference is an annual conference held by the non-profit ''Very Large Data Base Endowment Inc.'' While named after very large databases, the conference covers the research and developmen ...
.


See also

; Methods *
Social media measurement Social media measurement, also called social media controlling, is the management practice of evaluating successful social media communications of brands, companies, or other organizations. Key performance indicators may be measured by extracting ...
*
Text mining Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
; Application domains *
Web mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and s ...
* Twitter mining ; Companies *
NUVI Nuvi is a software and marketing services company that develops a SaaS Customer experience#Management, customer experience management (CXM) and social media marketing platform. The Nuvi platform has eight tools: Listen, Plan, Publish, Engage, Anal ...
; Related topics *
Social media Social media are interactive technologies that facilitate the Content creation, creation, information exchange, sharing and news aggregator, aggregation of Content (media), content (such as ideas, interests, and other forms of expression) amongs ...
*
Profiling (information science) In information science, profiling refers to the process of construction and application of user profiles generated by computerized data analysis. This is the use of algorithms or other mathematical techniques that allow the discovery of patter ...
*
Web scraping Web scraping, web harvesting, or web data extraction is data scraping used for data extraction, extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. W ...
*
GDPR The General Data Protection Regulation (Regulation (EU) 2016/679), abbreviated GDPR, is a European Union regulation on information privacy in the European Union (EU) and the European Economic Area (EEA). The GDPR is an important component of ...


References


External links

* Zafarani, Reza; Abbasi, Mohammad Ali; and Liu, Huan (2014)
Social Media Mining: An Introduction
Cambridge University Press Cambridge University Press was the university press of the University of Cambridge. Granted a letters patent by King Henry VIII in 1534, it was the oldest university press in the world. Cambridge University Press merged with Cambridge Assessme ...
* {{DEFAULTSORT:Social Media Mining Data analysis Social media Social media management Mass media monitoring Social information processing Business intelligence Big data Data mining