This is a visualization representing twitter reactions over minutes of play during two important matches in the English Premier League which ended up deciding the champion for this year (Leicester City).
The visualization can be seen live here.
The github repo for the streaming pipeline is here
These matches were Manchester United vs Leicester City (May 1, 2016) and Chelsea vs Tottenham Hotspurs (May 2, 2016).
Background: Before the first match Leicester city had to win the match to win the league (first time in 132 years of club history, starting from 1-5000 odds at the beginning of the year). Manchester United has traditionally been the most dominating club of the EPL and to win the league at their homeground would have been epic for Leicester City. They ended up not winnning, but drawing (1 point), which put the focus on Tottenham's next match.
Tottenham would have to now win all of its 3 matches to have a chance of winning the league, but they were playing at the homeground of defending champions Chelsea who are also traditionally bitter rivals. Before the game there was a lot of talk from Chelsea players who made it clear that they were supporting Leicester city and would do everything possible to deny Tottenham a win.
Additionally Leicester city's manager Claudio Ranieri was earlier a manager at Chelsea and he had massive support from Chelsea supporters.
As it turned out Tottenham scored two goals in the first half and Chelsea scored two goals in the second half in a game which was incredibly hard fought and full of ugly incidents on the pitch. As the match petered out to a draw, wild celebrations erupted all over the world as Leicester had just won the league.
I have made two linked charts which given a 'match' and a 'word' show
The raw data is tweets pulled from the Twitter Streaming API with certain filter words during two critical matches in the English Premier League (Soccer). Either of these matches could have led to Leicester City actually winning the Premier League (this was supposed to be (and turned out to be) a historic event). I collected around 2.3 Million tweets during these two matches and after the match.
For saving the tweets I set up a node js app, AWS Kinesis Firehose, S3 bucket pipeline. Then I analyzed these tweets by running spark on an AWS cluster and running map reduce operations on these tweets to extract the information I needed.
The key idea for the visualization is the occurrence of words or phrases in tweets corresponding to events in the match. So choosing the right words and looking at the frequency of occurrence of these words in tweets over the course of the match can tell stories about what viewers and fans are talking about during the match.
I chose a color gradient for frequency because it allows the minutes in the match which are important to pop out and be immediately visible with least effort. I chose to represent the minutes in the match with numbers because that's how the game is tracked in soccer, where traditionally major events in the game are referenced by the minute on the clock.
The detailed source code and instructions on how to setup your own pipeline to collect tweets are at my repo here
Took inspiration from
Improvements
Possible things to do
xxxxxxxxxx
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Mayank Kedia's website">
<meta name="author" content="Mayank Kedia">
<title>Mayank Kedia</title>
<!-- Bootstrap Core CSS -->
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" rel="stylesheet">
<!-- Custom CSS
<link href="css/main.css" rel="stylesheet">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Montserrat:400" type="text/css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css">
-->
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.17/d3.js"></script>
<script src="https://d3js.org/queue.v1.min.js"></script>
<style>
.badge {
font-weight: normal;
/*
The .badge class takes most of its styling
from bootstrap
*/
}
body {
padding-bottom: 5em;
height: 1060px;
}
button {
margin: 1 1 1 1;
}
.bar rect {
fill: steelblue;
shape-rendering: crispEdges;
}
.bar text {
fill: #fff;
}
.axis path, .axis line {
fill: none;
stroke: #000;
shape-rendering: crispEdges;
}
.match {
font-size: 1.5em;
}
.legend {
width:4em;
}
</style>
<!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
<script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
<![endif]-->
<script type="text/javascript">
var colour = d3.scale.linear().range(["beige", "red"]);
// Variables that hold all the data we need
var manu_leicester_wf, manu_leicester_wl, chelsea_tottenham_wf, chelsea_tottenham_wl;
var first_half_minutes, second_half_minutes;
var data, word_list;
var first_half_frequency, second_half_frequency;
var min_freq, max_freq;
var match_heading = {"ml":"Manchester United vs Leicester City", "ct":"Chelsea vs Tottenham Hotspur"};
// For Histogram !
// Reference https://bl.ocks.org/mbostock/3048450
var svg;
var formatCount = d3.format(",d");
var svg_height, svg_width;
var margin = {top: 20, right: 30, bottom: 30, left: 30};
var x = d3.scale.linear(),
y = d3.scale.linear(),
xAxis = d3.svg.axis()
.scale(x)
.orient("bottom"),
yAxis = d3.svg.axis()
.scale(y)
.orient("left");
function set_colour_domain(m_data, index){
/**
resets sets the color domain based on the 'index'
in the array at each row of the data which has to be considered.
*/
first_half_frequency = m_data[0].map(function(x){
return x[index];
});
second_half_frequency = m_data[1].map(function(x){
return x[index];
});
min_freq = d3.min([d3.min(first_half_frequency), d3.min(second_half_frequency)]);
max_freq = d3.max([d3.max(first_half_frequency), d3.max(second_half_frequency)]);
console.log(max_freq);
console.log(min_freq);
colour.domain([min_freq, max_freq]);
};
function color_text(value){
console.log(max_freq);
console.log(min_freq);
var middle_freq = (max_freq - min_freq)/2;
//console.log(middle_freq);
//console.log(value);
if (value > middle_freq){
return "#fff";
}else {
return '#000';
}
}
function add_legend(){
/**
called after colour domain is set
uses min_freq, max_freq and colour scale
*/
var mf = max_freq + 1 - 1;
var arr = [],
c = max_freq - min_freq + 1;
while ( c-- ) {
arr[c] = mf--
};
var legend_arr;
if (arr.length < 5){
legend_arr = arr;
}else {
legend_arr = [];
for(i=0; i < 6; i++){
val = Math.round(d3.quantile(arr, i/5));
legend_arr.push(val);
}
}
d3.select(".legend-div").selectAll("a").remove();
d3.select(".legend-div").selectAll(".legend")
.data(legend_arr)
.enter()
.append("a")
.attr("class","legend badge")
.text(function(d){
return d.toLocaleString();
})
.style("background", function(d){
return colour(d);
})
.style("color", function(d){
return color_text(d);
});
}
function add_minutes(id, x_half_data){
/**
@param id: the id of the div in which the minute
badges are to be added
Adds badges to the element with the id selector
*/
console.log("add_minutes:"+x_half_data.length);
d3.select(id).selectAll("a.badge").remove();
var badges = d3.select(id).selectAll("a.badge")
.data(x_half_data)
.enter()
.append("a")
.attr("class", "badge")
.text(function(d){
return d[d.length-1]+"'";
})
.style("width", "40px");
return badges;
};
function add_svg(){
// call another function
var bounding_rect = document.getElementById('svg')
.getBoundingClientRect();
svg_width = bounding_rect['width'] - margin.left - margin.right;
svg_height = 150;
svg = d3.select("#svg").append('svg')
.attr("width", svg_width + margin.left + margin.right)
.attr("height", svg_height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
x.range([0, svg_width]);
y.range([svg_height, 0]);
// now rotate text on x axis
// solution based on idea here: https://groups.google.com/forum/?fromgroups#!topic/d3-js/heOBPQF3sAY
// first move the text left so no longer centered on the tick
// then rotate up to get 45 degrees.
svg.selectAll(".xaxis text") // select all the text elements for the xaxis
.attr("transform", function(d) {
return "translate(" + this.getBBox().height*-2 + "," + this.getBBox().height + ")rotate(-45)";
});
// now add titles to the axes
svg.append("text")
.attr("text-anchor", "middle") // this makes it easy to centre the text as the transform is applied to the anchor
.attr("transform", "translate("+ (-2) +","+(svg_height/2)+")rotate(-90)") // text is drawn off the screen top left, move down and out and rotate
.text("Number of Minutes")
.style("font-size", "1.2em");
svg.append("text")
.attr("text-anchor", "middle") // this makes it easy to centre the text as the transform is applied to the anchor
.attr("transform", "translate("+ (svg_width/2) +","+(svg_height+(27))+")") // centre below axis
.text("# of Occurences")
.style("font-size", "1.2em");
}
function update_histogram(word, index){
/*
Make sure this is called after
set_color_domain when the first_half_frequencies etc are set
*/
svg.selectAll(".bar").remove();
svg.selectAll(".axis").remove();
var frequencies = first_half_frequency.concat(second_half_frequency);
var bins = 20;
if(max_freq < 20){
bins = max_freq;
}
hist_data = d3.layout.histogram()
.bins(bins)(frequencies);
// domain of y is based on the counts of the frequencies
x.domain([0, d3.max(frequencies)]);
y.domain([0, d3.max(hist_data, function(d){ return d.y; })]);
var bar = svg.selectAll(".bar")
.data(hist_data)
.enter().append("g")
.attr("class", "bar")
.attr("transform", function(d) {
return "translate(" + x(d.x) + "," + y(d.y) + ")";
});
console.log(hist_data);
bar.append("rect")
.attr("x", 1)
.attr("width", x(hist_data[0].dx) - 1)
.attr("height", function(d) { return svg_height - y(d.y); });
bar.append("text")
.attr("dy", ".75em")
.attr("y", 6)
.attr("x", x(hist_data[0].dx) / 2)
.attr("text-anchor", "middle")
.text(function(d) { return formatCount(d.y); });
svg.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + svg_height + ")")
.call(xAxis);
}
function add_words(inner_word_list){
console.log(inner_word_list);
d3.select('#choose_words').selectAll("button").remove();
d3.select("#choose_words")
.selectAll('button')
.data(inner_word_list)
.enter()
.append("button")
.attr("class", "btn btn-default")
.attr("type", "button")
.attr("onclick", function(d){
return "on_word_click('"+d['word']+"','"+d['index']+"')";
})
.text(function(d){
return d['word'];
});
}
function on_word_click(word, index){
set_colour_domain(data, index);
color_minutes(first_half_minutes, word, index);
color_minutes(second_half_minutes, word, index);
update_histogram(word, index);
add_legend();
}
function color_minutes(selection, word, index){
/**
Colors and styles the selection
based on the word/index selected
*/
selection.style("background", function(d){
return colour(d[index]);
})
.attr("title", function(d){
/*
TODO: Add the name of the word here
*/
return d[index];
})
.style("color", function(d){
return color_text(d[index]);
});
};
function set_match(wf, wl, match){
data = wf;
word_list = wl;
var word = word_list[0]['word'],
index = word_list[0]['index'];
var first_half_data = data[0];
var second_half_data = data[1];
add_words(word_list);
d3.select("#match_heading").text(match_heading[match]);
console.log(match);
first_half_minutes = add_minutes("#first_half", first_half_data);
second_half_minutes = add_minutes("#second_half", second_half_data);
on_word_click(word, index);
}
function create_chart(error, ct_wf, ct_wl, ml_wf, ml_wl){
/**
*
* Called when the chart is initiated
*
*/
/**
Assigning the input to global variables
*/
chelsea_tottenham_wf = ct_wf;
chelsea_tottenham_wl = ct_wl;
manu_leicester_wf = ml_wf;
manu_leicester_wl = ml_wl;
console.log(ml_wl);
console.log(ct_wl);
add_svg();
set_match(ml_wf, ml_wl, 'ml');
};
queue()
.defer(d3.json, 'chelsea_tottenham_wf.json')
.defer(d3.csv, 'chelsea_tottenham_word_list.csv') // REPLACE REF WITH DATA
.defer(d3.json, 'manu_leicester_wf.json')
.defer(d3.csv, 'manu_leicester_word_list.csv') // REPLACE REF WITH DATA
.await(create_chart);
</script>
</head>
<body>
<div class="container">
<div class="page-header">
<h1>Visualizing Tweets<small> (Leicester's winning bid in the English Premier League)</small></h1>
<h2><small>Occurrence of specific words (or phrases) within tweets during matches<small></h2>
</div>
<div class="col-sm-4" >
<div class="page-header">
<h4>Choose match</h4>
</div>
<ul class="nav nav-pills nav-stacked">
<li role="presentation" class="match" id="ml_li" ><a id="ml_nav" onclick="set_match(manu_leicester_wf, manu_leicester_wl, 'ml')" style="font-size:19px">Manchester vs Leicester</a></li>
<li role="presentation" class="match" id="ct_li" ><a id="ct_nav" onclick="set_match(chelsea_tottenham_wf, chelsea_tottenham_wl, 'ct')" style="font-size:19px">Chelsea vs Tottenham</a></li>
</ul>
<div class="page-header">
<h4>Choose words</h4>
</div>
<div class="btn-group btn-group-sm" role="group" id="choose_words" aria-label="choose words">
<!--
-->
</div>
</div>
<div class="col-sm-8" style="border-left: 1px solid #ccc;">
<div class="page-header legend-div">
<h4 id="match_heading"><small></small></h4>
<span class="badge" style="width:17em">Legend (# of occurences)</span>
</div>
<div class="well well-sm" style="margin-top:10px;margin-bottom:10px;padding-top:5px;padding-bottom:5px"><span style="font-size:12px">First Half (Minutes)<span></div>
<div id="first_half">
</div>
<div class="page-header well well-sm" style="margin-top:20px;margin-bottom:10px;padding-top:5px;padding-bottom:5px">
<span style="font-size:12px">Second Half (Minutes)<span>
</div>
<div id="second_half">
</div>
<div class="page-header">
<h5>Histogram <small>(of the frequency of occurence of the phrase)</small></h5>
</div>
<div id="svg">
</div>
</div>
</div>
<!-- jQuery Version 1.12.3 -->
<script src="https://code.jquery.com/jquery-1.12.3.min.js" integrity="sha256-aaODHAgvwQW1bFOGXMeX+pC4PZIPsvn2h1sArYOhgXQ=" crossorigin="anonymous"></script>
<!-- Bootstrap Core JavaScript -->
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
</body>
</html>
https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.17/d3.js
https://d3js.org/queue.v1.min.js
https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js
https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js
https://code.jquery.com/jquery-1.12.3.min.js
https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js