NLP in Social Media Intelligence

"Social media buzz holds a wealth of insights, but NLP acts as the key. NLP analyzes posts, gauging sentiment, spotting trends, and recognizing entities. This empowers businesses to understand brand perception, track campaigns, and identify potential crises – all by transforming social media noise into actionable information."- Gemini 2024

The average time spent daily on social media in the U.S. is 2 hours and 16 minutes as of 2024 (statista.com). Social media has become a powerful, real-time pulse for understanding public opinion and current events. Natural Language Processing (NLP) techniques unlock the hidden intelligence within this massive amount of data. Essentially, NLP acts as a translator, transforming the cacophony of social media voices into clear and actionable information.

Sentiment Analysis

Using techniques in sentiment analysis, we can gauge whether a post is positive, negative, or neutral. This helps provide insight into public opinion around a topic, or in general for any snapshot in time.

This example code snippet uses NLTK - VADER sentiment analysis and a social dataset from Hugging Face.

For more information on using Hugging Face data, see the docs here: Loading Datasets

# pip install nltk
# pip install datasets
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from datasets import load_dataset

# Previewing the first few items with "[:5]", omit this to load entire training set
dataset = load_dataset("AiresPucrs/sentiment-analysis", split='train[:5]')
print(dataset.features)
# >> {'text': Value(dtype='string', id=None), 'label': Value(dtype='int64', id=None)}

analyzer = SentimentIntensityAnalyzer()

for item in dataset:
    text = item['text']
    label = 'negative' if item['label'] == 0 else 'positive'
    scores = analyzer.polarity_scores(text)

    # scores is a dictionary with keys neg, neu, pos, compound
    # representing negative, neutral, and positive sentiment scores
    # and a normalized value in [-1, 1]

    # Here we convert the compound score to a sentiment using an arbitrary threshold
    compound = scores['compound']
    if compound > 0.5:
        result = "positive"
    elif compound < -0.5:
        result = "negative"
    else:
        result = "neutral"

    # Print first 30 chars of text with label & score
    print(f"{text[:50]}... {label} - {result}")

    
Topic Modeling

NLP can be used to identify trends and emerging issues through topic modeling. Here we employ techniques to find clusters of related words and phrases to identify trends. Techniques for topic modeling may employ clustering strategies, or use the popular Gensim library.

In this code example, we use gensim along with these libraries to train an LDA model and create word clouds for the topics.

  • nltk- to get common stopwords for removal
  • wordcloud- to create the word cloud
  • matplotlib- to display the word cloud image
As of May 2024, to run this code you may need to downgrade scipy to 1.12 to avoid this error:
ImportError: cannot import name 'triu' from 'scipy.linalg'
import gensim
from gensim.test.utils import common_texts
from gensim.corpora.dictionary import Dictionary

from nltk.corpus import stopwords
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Remove stopwords from common_texts (optional)
stop_words = stopwords.words('english')
texts = [[x for x in text if x not in stop_words] for text in common_texts]

# create a dictionary from texts and bag-of-words corpus
dictionary = Dictionary(texts)
corpus = [dictionary.doc2bow(x) for x in texts]

# Train an LDA model (adjust num_topics as desired)
lda = gensim.models.LdaModel(corpus, num_topics=5, id2word=dictionary)

for i in range(lda.num_topics):
    topic_words = lda.show_topic(i, topn=100) # adjust topn as desired
    word_weights = dict([(word, weight) for word, weight in topic_words])
    wc = WordCloud().fit_words(word_weights)
    plt.figure()
    plt.imshow(wc)
    plt.axis('off')
    plt.title(f'Topic {i}')
    plt.show()
    
This is one of the word clouds created by the code above.
Named Entity Recognition (NER)

Techniques in NLP can be used to discover entities and the relationships among them by recognizing names of people, places, and organizations mentioned in social media conversations. This is known as named entity recognition. These techniques empower organizations to

  • Gain valuable insights into brand perception
  • Track marketing campaigns
  • Identify potential crises before they escalate

import spacy

# Load spaCy model for NER (en_core_web_sm is a pre-trained small model)
nlp = spacy.load("en_core_web_sm")

# Sample social media text
text = "I'm going to visit Paris next summer! #travel #france"

# Process the text with spaCy
doc = nlp(text)

# Iterate over named entities and print their text and label
for entity in doc.ents:
  print(f"Entity: {entity.text}, Label: {entity.label_}")