All examples By author By category About

emagee

Scattered chocolate

###Process notes

  1. I started with original data set, but that seemed too small for a scatterplot, and I wanted to find a dataset for which I had pre-1999 values; specifically, I wanted the data to cover some pre-NAFTA years. I did track down that data. The earliest year available was 1989, which is fine, since NAFTA took effect in 1994. The new data set also included 2014, whereas the original set only went as far as 2013. Whee!

  2. As part of my data cleaning, I took out all countries from which US did not import chocolate in 1989, and set the minimum value of the chocolate import per country at 1 million — this allowed me to take out all countries that had a "0" in their 1989 column. (Not sure at which point values were rounded down to zero; the metadata mentioned a different reason for the zeros anyway.)

  3. Because of the large number of countries in the original list, I opted to go without the "rest of world" entry that was in the original dataset. Should I have soldiered on and found the average "rest of world" value for each year?

  4. Next, I plotted the new data. At first I had 1989 imports on the x axis, 2014 on the y. Soon after I had an inexplicable need to flip this setup. It just seemed more appropriate. I cannot put my finger on why.

  5. I then multiplied all 1989 values by 1.93 — this is the factor of inflation (1989-2014) that I got from a random inflation calculator on the Internet. Should I have done this in the orignal dataset instead, or would the utility of that depend on if and how I wanted to use this scatterplot on this dataset?

  6. It dawned on me that I could have retrieved a data set of chocolate QUANTITY instead of the DOLLAR VALUE for each year, which would have negated the need to adjust for inflation, but I needed to move on. Of course, charting both quantity and value might have made a nice statement about the cost of chocolate, but I digress...

  7. Since I prefer all my graph elements to be inside the graph confines, as opposed to the highest values extending to or being plotted at the edge of either axis, I padded each scale's domain a bit, just five percent. Is this kosher? The code:

xScale.domain([ 0, d3.max(data, function(d) {
					return +d[2014] + (+d[2014])*0.05;
				}) ]); 
  1. A few outliers exist. Canada was expected, but Brazil was a big surprise!

###Lessons learned

  1. Borrowing code from classmates, instructors, and other helpful souls is only effective when you know what you're doing. My head continues to spin; my brain stays tied in knots.

  2. Writing orderly, well-formatted, well-commented code is a noble goal that eludes me time and time again. I close with the link to today's (quite relevant) xkcd comic: [http://xkcd.com/1513/] (http://xkcd.com/1513/). (Be sure to hover over the first frame to reveal a tool tip.)

Cheers!