"Social media buzz holds a wealth of insights, but NLP acts as the key. NLP analyzes posts, gauging sentiment, spotting trends, and recognizing entities. This empowers businesses to understand brand perception, track campaigns, and identify potential crises – all by transforming social media noise into actionable information."- Gemini 2024
The average time spent daily on social media in the U.S. is 2 hours and 16 minutes as of 2024 (statista.com). Social media has become a powerful, real-time pulse for understanding public opinion and current events. Natural Language Processing (NLP) techniques unlock the hidden intelligence within this massive amount of data. Essentially, NLP acts as a translator, transforming the cacophony of social media voices into clear and actionable information.
Using techniques in sentiment analysis, we can gauge whether a post is positive, negative, or neutral. This helps provide insight into public opinion around a topic, or in general for any snapshot in time.
This example code snippet uses NLTK - VADER sentiment analysis and a social dataset from Hugging Face.
For more information on using Hugging Face data, see the docs here: Loading Datasets
# pip install nltk # pip install datasets from nltk.sentiment.vader import SentimentIntensityAnalyzer from datasets import load_dataset # Previewing the first few items with "[:5]", omit this to load entire training set dataset = load_dataset("AiresPucrs/sentiment-analysis", split='train[:5]') print(dataset.features) # >> {'text': Value(dtype='string', id=None), 'label': Value(dtype='int64', id=None)} analyzer = SentimentIntensityAnalyzer() for item in dataset: text = item['text'] label = 'negative' if item['label'] == 0 else 'positive' scores = analyzer.polarity_scores(text) # scores is a dictionary with keys neg, neu, pos, compound # representing negative, neutral, and positive sentiment scores # and a normalized value in [-1, 1] # Here we convert the compound score to a sentiment using an arbitrary threshold compound = scores['compound'] if compound > 0.5: result = "positive" elif compound < -0.5: result = "negative" else: result = "neutral" # Print first 30 chars of text with label & score print(f"{text[:50]}... {label} - {result}")
NLP can be used to identify trends and emerging issues through topic modeling. Here we employ techniques to find clusters of related words and phrases to identify trends. Techniques for topic modeling may employ clustering strategies, or use the popular Gensim library.
In this code example, we use gensim along with these libraries to train an LDA model and create word clouds for the topics.
nltk
wordcloud
matplotlib
import gensim from gensim.test.utils import common_texts from gensim.corpora.dictionary import Dictionary from nltk.corpus import stopwords from wordcloud import WordCloud import matplotlib.pyplot as plt # Remove stopwords from common_texts (optional) stop_words = stopwords.words('english') texts = [[x for x in text if x not in stop_words] for text in common_texts] # create a dictionary from texts and bag-of-words corpus dictionary = Dictionary(texts) corpus = [dictionary.doc2bow(x) for x in texts] # Train an LDA model (adjust num_topics as desired) lda = gensim.models.LdaModel(corpus, num_topics=5, id2word=dictionary) for i in range(lda.num_topics): topic_words = lda.show_topic(i, topn=100) # adjust topn as desired word_weights = dict([(word, weight) for word, weight in topic_words]) wc = WordCloud().fit_words(word_weights) plt.figure() plt.imshow(wc) plt.axis('off') plt.title(f'Topic {i}') plt.show()
Techniques in NLP can be used to discover entities and the relationships among them by recognizing names of people, places, and organizations mentioned in social media conversations. This is known as named entity recognition. These techniques empower organizations to
import spacy # Load spaCy model for NER (en_core_web_sm is a pre-trained small model) nlp = spacy.load("en_core_web_sm") # Sample social media text text = "I'm going to visit Paris next summer! #travel #france" # Process the text with spaCy doc = nlp(text) # Iterate over named entities and print their text and label for entity in doc.ents: print(f"Entity: {entity.text}, Label: {entity.label_}")