Have you ever wondered what people are really thinking about a place or event? In our data-driven world, understanding public opinion is no longer just nice to have—it's essential, especially for those of us working with maps and geographic information. While traditional GIS shows us the physical world, it often misses something vital: how people feel about those places.
The Power of Sentiment Analysis
So what exactly is sentiment analysis? Think of it as a digital emotion detector. It's a clever way to measure public opinion by analysing the feelings and emotions hidden in text data from social media posts (Pang and Lee, 2008). In this project, I rolled up my sleeves and dug into a fascinating dataset: 15,000 tweets about the blockbuster film "Avengers Endgame," all collected on its release day when emotions were running high and opinions were fresh.
Why analyse movie tweets for a GIS project? Because the same techniques that reveal how people feel about a global phenomenon like Avengers can be applied to understand how they feel about locations. This approach unlocks fascinating insights for geospatial work:
- Want to know which neighbourhoods people love (or hate)? Map their satisfaction levels!
- Planning a new road or building? See how people feel about similar projects first
- Concerned about environmental issues? Track how sentiment changes across different areas
- Managing tourism? Discover what visitors really think about destinations
How It Works: A Three-Lens Approach
I started by cleaning up the messy tweet data—all 15,000 messages full of hashtags, emojis, and passionate opinions about Thanos, Iron Man, and plot twists—using R's tidytext package, a brilliant tool that makes text analysis much more manageable (Silge and Robinson, 2017).
Prepping the Data: From Messy Tweets to Useful Insights
Think of data preparation like sorting through a junk drawer before organising it:
-
Breaking text into pieces (or "tokenisation" if you want to sound fancy)
- Imagine cutting up tweets like "OMG #AvengersEndgame was AMAZING!!!" into individual word confetti
- Each word gets its own row in our data table
- Why bother? Because we need to analyse one word at a time
-
Removing "noise" words that don't tell us anything
- Words like "the," "and," and "of" are just filler—they don't carry emotions
- We filter these out using a pre-built list of words to ignore
- This lets us focus on emotionally charged words like "amazing," "disappointing," or "heartbreaking"
-
Restructuring everything for easier analysis
- We transform the messy, free-flowing tweets into a neat, orderly format
- This makes it much easier to match words with our sentiment dictionaries
Next came the fun part—analysing the sentiment using not just one but three different "dictionaries" (lexicons). Why use three? Because each gives us a different perspective, like looking at an object from multiple angles:
1. The Simple View: Positive vs. Negative
The Bing lexicon gives us a straightforward thumbs-up or thumbs-down for each word:
# Step 1: Load our sentiment dictionary
# Think of this as our list of "good" and "bad" words
Lexicon <- get_sentiments("bing")
# Step 2: Match our tweet words with the dictionary
# Like finding all the positive and negative words in our Endgame tweets
Analysis <- Clean %>%
inner_join(Lexicon, by = "word") %>%
# Step 3: Count how many words fall into each category
# Are fans mostly positive or negative about the movie?
group_by(sentiment) %>%
summarize(count = n())
# Step 4: Create a visual chart of the results
# Green bars for positive, red for negative - easy to understand at a glance
ggplot(Analysis, aes(x = sentiment, y = count, fill = sentiment)) +
geom_bar(stat = "identity", color = "black") +
scale_fill_brewer(palette = "Pastel1") +
labs(title = "Binary Sentiment Analysis",
x = "Sentiment", y = "Count") +
theme_minimal()
Figure 1: The simple view - Were Avengers Endgame tweets generally positive or negative on release day?
2. The Emotional Rainbow: Beyond Just Good and Bad
But human emotions are complex, aren't they? Fans didn't just feel "good" or "bad" about Endgame—they experienced joy, anger, fear, surprise, and more. The NRC Lexicon captures this richer emotional palette:
# Step 1: Load our emotion dictionary
# This one knows the difference between anger, joy, fear, and more
Lexicon2 <- get_sentiments("nrc")
# Step 2: Match our tweet words with emotional categories
# What emotions did Endgame evoke in viewers?
Analysis2 <- Clean %>%
inner_join(Lexicon2, by = "word") %>%
# Step 3: Tally up each type of emotion
# Creating an emotional fingerprint of audience reactions
group_by(sentiment) %>%
summarize(count = n())
# Step 4: Visualise the emotional landscape
# Each colour represents a different emotion or sentiment
ggplot(Analysis2, aes(x = sentiment, y = count, fill = sentiment)) +
geom_bar(stat = "identity") +
labs(title = "Emotional Category Analysis",
x = "Sentiment",
y = "Count") +
theme_minimal()
Figure 2: The emotional rainbow - What specific feelings did Avengers Endgame evoke in viewers?
3. The Intensity Meter: How Strong Are Those Feelings?
Not all emotions are created equal—some feelings are stronger than others. The AFINN Lexicon helps us capture this intensity:
# Step 1: Load our intensity-scoring dictionary
# This rates words from very negative (-5) to very positive (+5)
Lexicon3 <- get_sentiments("afinn")
# Step 2: Match tweet words with their intensity scores
# How strongly did fans feel about different aspects of the movie?
Analysis3 <- Clean %>%
inner_join(Lexicon3, by = "word") %>%
# Step 3: Group by intensity level
# Were reactions mild or intense? Moderately pleased or absolutely ecstatic?
group_by(value) %>%
summarize(count = n())
# Step 4: Create a colour-coded intensity chart
# Red for negative, green for positive, with varying intensities
ggplot(Analysis3, aes(x = value, y = count, fill = value)) +
geom_bar(stat = "identity", color = "black") +
labs(title = "Sentiment Intensity Analysis",
x = "Sentiment Score",
y = "Count") +
scale_fill_gradient2(low = "darkred", mid = "darkgreen",
high = "lightgreen", midpoint = 0) +
theme_minimal()
Figure 3: The intensity meter - How strongly did fans feel about Avengers Endgame?
What We Discovered
By using all three approaches to analyse our 15,000 Avengers Endgame tweets, we got a much richer picture of the audience sentiment:
- The Bing lexicon told us the basic balance: were tweets generally positive or negative? (Spoiler alert: overwhelmingly positive!)
- The NRC lexicon revealed the emotional texture: fans expressed joy and anticipation, but also sadness and fear—reflecting the movie's emotional rollercoaster
- The AFINN lexicon showed us intensity: many reactions weren't just positive but intensely positive, with strong emotional language
It's like the difference between a simple map showing land and water versus a detailed topographic map with elevations, vegetation, and landmarks. More dimensions mean better understanding (Taboada, 2011).
From Movie Reactions to Geographic Insights
While this project focused on Avengers Endgame tweets, these exact same techniques can reveal how people feel about geographic locations. Just as we identified emotional patterns in movie reactions, we can discover how people feel about neighbourhoods, cities, or environmental features. The emotional fingerprint of a place can be just as revealing as its physical characteristics on a map.
Things to Keep in Mind
Before you rush off to analyse your own text data, here are a few friendly warnings:
-
Context matters - These methods look at individual words, so they might miss sarcasm or phrases like "not good" (which is negative but contains the positive word "good")
-
Some words might be missing - Our dictionaries don't know every word, especially location-specific terms, new slang, or movie-specific references like "Thanos"
-
Simple counting has limitations - We're counting words equally here, but some words might deserve more weight than others
-
Colour choices matter - We've used red/green colour schemes that make sense visually, but these might not work for everyone (like people with colour-blindness)
Putting This to Work in the Real World
So how can you apply these sentiment analysis techniques to geographic data, just as we did with Avengers tweets? Here are some practical applications:
-
Sentiment Maps: Imagine colour-coding a city map based on how people feel about different neighbourhoods
-
Tracking Changes Over Time: See how opinions about a place change before and after a major event (just like we could track Endgame opinions before and after people saw the film at the cinema)
-
Planning New Projects: Gauge how the public might react to a proposed development, similar to how Film Board Ireland might assess public reaction to a new production
-
Disaster Response: During emergencies, quickly identify areas where people need the most help
How to Do This Yourself
Want to try this with your own data? Whether it's cinema tweets or location-based social media, here's what you'll need:
# For working with text
library(tidytext) # Your Swiss Army knife for text analysis
library(tm) # Extra text tools when you need them
# For handling and visualising data
library(dplyr) # For data wrangling - like a digital workbench
library(ggplot2) # For creating beautiful charts and graphs
# For getting data in and out
library(readxl) # For reading Excel files with your tweet data
library(textdata) # For accessing the sentiment dictionaries
library(RColorBrewer) # For making your charts look good
The basic workflow is pretty straightforward:
- Get your data - Collect tweets or other text (we gathered 15,000 Avengers tweets)
- Clean it up - Remove URLs, fix special characters, etc.
- Break into words - Split text into individual words
- Remove filler words - Filter out common words that don't carry sentiment
- Analyse sentiment - Apply the lexicons to see what emotions are present
- Summarise results - Count up the different sentiment categories
- Visualise - Create charts that tell the story clearly
- Interpret - Draw meaningful conclusions from what you see
What's Next?
Sentiment analysis adds a powerful new dimension to GIS work. While traditional maps show us the physical world, sentiment analysis reveals how people experience and feel about those places. Combined, they offer a more complete picture of our world.
Future explorations could integrate these techniques into planning processes, letting decision-makers at organisations like local councils or An Bord Pleanála understand not just where people are, but how they feel—potentially leading to more people-centred developments and policies.
Ready to give it a try with your own data? The code examples above will get you started, and the journey might reveal insights you never expected to find!
References
Hu, M. and Liu, B., 2004. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, WA, USA: ACM, pp.168–177.
Jurafsky, D. and Martin, J.H., 2019. Speech and Language Processing. 3rd ed. Prentice Hall.
Mohammad, S.M. and Turney, P., 2013. Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence, 29(3), pp.436-465.
Nielsen, F.Å., 2011. AFINN: A new word list for sentiment analysis. arXiv preprint arXiv:1103.2903.
Pang, B. and Lee, L., 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), pp.1–135.
Silge, J. and Robinson, D., 2017. Text Mining with R: A Tidy Approach. O'Reilly Media, Inc.
Taboada, M. et al., 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), pp.267–307.