<?xml version="1.0" encoding="UTF-8"?>
<rss version='2.0' xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Eric Hannell</title>
    <description>novice at life, ok at data stuff.</description>
    <link>https://erichannell.silvrback.com/feed</link>
    <atom:link href="https://erichannell.silvrback.com/feed" rel="self" type="application/rss+xml"/>
    <category domain="erichannell.silvrback.com">Content Management/Blog</category>
    <language>en-us</language>
      <pubDate>Thu, 11 Jun 2015 18:13:00 +0100</pubDate>
    <managingEditor>eric.hannell@gmail.com (Eric Hannell)</managingEditor>
      <item>
        <guid>http://erichannell.com/a-map-of-the-bikes-of-london#15167</guid>
          <pubDate>Thu, 11 Jun 2015 18:13:00 +0100</pubDate>
        <link>http://erichannell.com/a-map-of-the-bikes-of-london</link>
        <title>A map of the bikes of London</title>
        <description>Using Tableau and a pinch of Python to look 
at ~20,000 bikes and 700+ bike stations.</description>
        <content:encoded><![CDATA[<p>When I moved to London I exchanged my Vespa for a Boris bike. It was tough going at first but I have come to love these bikes. I ride one almost every day and I could not imagine living in London without them.</p>

<p>Being a &quot;data person&quot; it did not take too long before I started searching for data about the bikes. The program, alternately referred to as &quot;Boris Bikes&quot;/ &quot;Barclays Bikes&quot; / &quot;Santander Cycles&quot; is a &quot;public bicycle hire scheme&quot; and is run by Transport For London (TFL). They offer some <a href="https://www.tfl.gov.uk/info-for/open-data-users/our-feeds">basic data feeds</a>.</p>

<p>I had a lot of questions about the bike system, like: </p>

<ul>
<li>How many bikes and bike stations exist in London?</li>
<li>Where are the bikes?</li>
<li>Which borough has the best coverage?</li>
<li>How did the network grow?</li>
</ul>

<p>I used the TFL data to visualize how the bikes are spread out over London and created this <a href="https://www.j.mp/boris_bike">visualization</a> (click on the image to go to the interactive version):</p>

<p><a href="https://www.j.mp/boris_bike"><img alt="boris bikes" src="https://silvrback.s3.amazonaws.com/uploads/839049c5-7ef0-47cd-9e7f-ef5cd4a5b6b9/boris_large.png" /></a></p>

<p>In this post I will walk through the process that starts with a CSV file and ends with that interactive visual.</p>

<hr>

<h1 id="the-data">The Data</h1>

<p>Let&#39;s take a look at the TFL data (which you can grab <a href="https://www.dropbox.com/s/j3vtj6q8lyybap5/uncleaned_data.csv?dl=0">here</a>). It includes: </p>

<ul>
<li>bike station name (usually the street and area where the station is)</li>
<li>number of bikes that can fit at the station</li>
<li>longitude and latitude for each station</li>
<li>the date when the station was installed by TFL</li>
</ul>

<p>There were also a lot of other fields that I did not end up using. I used <a href="http://openrefine.org/">OpenRefine</a> to clean the data. For example: the dates needed some work and had to be converted from <code>Mon Jul 12 15:08:00 2010</code> to a more intuitive format like <code>Jul 12 2010</code>. I also got rid of the columns that did not interest me.</p>

<hr>

<h1 id="a-pinch-of-python">A pinch of Python</h1>

<p>At this point I had most of what I needed - a basic sketch of the data - but I was still missing information about the borough where the station was installed. I thought it this would add a meaningful dimension to the data. To be able to visualise this I <a href="https://en.wikipedia.org/wiki/Reverse_geocoding">reverse geocoded</a> the latitude and longitude points from the dataset. This sounds really complicated but is actually really easy. You can use a service like <a href="http://www.doogal.co.uk/BatchReverseGeocoding.php">this</a>, but I did it with a pinch of python using the geopy module. </p>

<p>Here is how you use geopy to reverse geocode a set of coordinates:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">geopy.geocoders</span> <span class="kn">import</span> <span class="n">Nominatim</span>
<span class="n">geolocator</span> <span class="o">=</span> <span class="n">Nominatim</span><span class="p">()</span>
<span class="n">location</span> <span class="o">=</span> <span class="n">geolocator</span><span class="o">.</span><span class="n">reverse</span><span class="p">(</span><span class="s2">&quot;51.5247497559, -0.096965610981&quot;</span><span class="p">)</span>
<span class="k">print</span> <span class="n">location</span><span class="o">.</span><span class="n">address</span>
<span class="c1">#47-58, Bastwick Street, Saint Luke&#39;s, London Borough of Islington, London, Greater London, England, EC1V 3RD, United Kingdom</span>
</pre></div>
<p>In this example it returned the borough name, but it was not always the case. The postcode, however, was a reliable output, so I used that and passed it into the <a href="http://postcodes.io/">postcodes.io</a> API to get the district/borough. Here is an <a href="http://api.postcodes.io/postcodes/EC1V%203RD">example call for the EC1V 3RD postcode</a>.</p>

<p>My girlfriend later pointed out that this was overkill and that I could probably have used the bike station name to get most of this data, but it was fun to use geopy and I am really happy to have discovered the very amazing <a href="http://postcodes.io/">postcodes.io</a>.</p>

<hr>

<h1 id="clean-data">&quot;Clean&quot; data</h1>

<p>At this point I had a complete dataset (get it <a href="https://www.dropbox.com/s/8uhyjan7ywmcjq3/bikes%20by%20borough.xlsx?dl=0">here</a>) that included:</p>

<ul>
<li>bike station name</li>
<li>latitude</li>
<li>longitude</li>
<li>date that the station was installed</li>
<li>number of bikes that the station can hold</li>
<li>the name of the borough where the station is installed</li>
</ul>

<hr>

<h1 id="seeing-understanding">Seeing &amp; understanding</h1>

<p>At this point I threw the data at Tableau and started to visualize it.</p>

<p>First I took a look at how many bikes and stations there are by borough:<br>
<img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/57d1093c-00d6-4be3-a295-c4bfc76029c5/boroughs_large.png" /></p>

<p>Looks like Westminster dominates there.</p>

<p>What about the biggest stations? There are 743 stations in this dataset. Here I focus on the top 15 while calculating the average number of bikes for all stations:</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/9c6e5bb9-d126-48de-8e1c-513d82dd904d/stations_large.png" /></p>

<p>The the bankside station is right outside my office, and even though it is the 4th biggest in all of London it is still hard to find a parking place sometimes.</p>

<p>How about how the stations changed over time? How has the number of bikes grown in London? Looks like the first stations were installed in 2010 and that there has been sporadic growth with plateaus of quiet periods (like 2012):</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/1bcb11ea-be0c-4a94-86af-f6ca000be3cc/bikes_over_time_large.png" /></p>

<p>Next is my favourite part: maps. Let&#39;s see what these 700+ stations look like on a map of London. I have sized the dots according to the number of bikes that fit at each station (bigger dots = more bikes):</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/f4ac70e2-ac9d-4b9c-a193-139ec9046d92/map_large.png" /></p>

<p>We can instantly see that there are bikes all over London except for the central South-East (I have no idea why).</p>

<p>Next, let&#39;s look at when the stations were added. 2010 was the first year of the Boris Bikes but how have they been added since then? Are there any patterns? I made a quick animation to show how the number of distribution has changed over the years:</p>
<iframe src="https://player.vimeo.com/video/117911829?title=0&amp;byline=0&amp;portrait=0" width="500" height="368" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe>


<p>Looks like things started in the center in 2010 and that 2011 was a slow year while 2012 (during the Olympics) the East got covered in bike stations and in 2013 TFL expanded into the South-West. 2014 was another slow year.</p>

<hr>

<h1 id="boroughs">Boroughs</h1>

<p>Next I inspected the stations by borough:</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/33a74a31-f482-4c44-a89f-0bb702b116a2/bike_map_large.png" /></p>

<p>We can start to see why boroughs like Westminster and Tower Hamlets dominate in terms of number of bikes and stations; it looks like the Boris Bike system overlaps nicely with the shape of the borough (i.e. dense coverage), while places like Islington only overlap slightly with the system. To get a better idea let&#39;s take a look at the shape of the boroughs.</p>

<p>This data does not exist in the TFL data but I will grab it from <a href="http://tableaumapping.bi/">tableaumapping.bi</a> a fantastic &amp; free source for Tableau geo-data. I used <a href="http://tableaumapping.bi/2013/11/25/uk-wards/">this</a> file to grab the boroughs and filtered it down to only the boroughs that existed in my dataset. I also added a black circle to represent the number of bikes in each borough:</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/0ab4038c-f83e-409d-8ede-c04e4b6ab1ad/boroughs_large.png" /></p>

<hr>

<h1 id="streets">Streets</h1>

<p>I wanted to tie everything together by showing the individual stations and the streets they are on, to give the full range from a city-wide view to an individual station on a particular street. Since I already had the longitude and latitude of each bike station, I decided to use the google streetview API to embed an interactive view of  each station. To do this I embedded a streetview frame and used the each point on the map as a filter that would pass new latitude and longitude values to the streetview container when a station is selected.</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/3773a47a-d882-4e70-899c-0b978fdb7858/filter_lat_lon_large.png" /></p>

<hr>

<h1 id="connecting-it-all">Connecting it all</h1>

<p>I wanted to show the size of the system while still letting people take a look at their neighbourhood or event individual stations. To achieve this jump between scales I decided to connect everything so that clicking a borough restricts the view to only that selection. Then by clicking on an individual station you would see the street where the bikes were:</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/c351e929-134c-46ee-b675-6f019b0040e1/filter_dashboard_large.png" /></p>

<hr>

<h1 id="finally">Finally</h1>

<p>Here is what it looks like when everything is put together (that is the station in Islington where I usually grab a bike in the morning).</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/7d661d80-4549-4a54-bb22-687bde680637/my_station_large.png" /></p>

<h6 id="you-can-see-the-interactive-visualization-on-tableau-public">You can see <a href="https://www.j.mp/boris_bike">the interactive visualization on Tableau Public</a>.</h6>
]]></content:encoded>
      </item>
      <item>
        <guid>http://erichannell.com/ego-metrics#10573</guid>
          <pubDate>Fri, 20 Mar 2015 20:16:00 +0000</pubDate>
        <link>http://erichannell.com/ego-metrics</link>
        <title>Ego Metrics</title>
        <description>How I built a Twitter bot that ended up with a a higher Klout score than me.</description>
        <content:encoded><![CDATA[<p>A while back I built a bot that would put up its periscope every 15 minutes to look for a city where it was sunny. Once it finds a place it sends out a tweet:</p>

<blockquote>
<p>It&#39;s sunny in London, United Kingdom<br>
— It&#39;s sunny in: (<a href="https://twitter.com/itissunnyin">@itissunnyin</a>) April 13, 2014</p>
</blockquote>

<p>Background: I moved to London from Barcelona and think about the sun a lot but I built the bot mainly because I wanted to work with the Twitter API and it seemed like a fun project to tackle with Python.</p>

<p>Once everything was up and running (separate post on that later) I followed the bot and then left it to do its thing. It’s been sending out sunny messages ever since to a handful of followers (some bots and some real people) that it picked up along the way.</p>

<p>So far it has sent close to 10,000 tweets. Most seem to disappear into the Twitter-void, but some get replies:</p>

<blockquote>
<p>&#39;@itissunnyin: It&#39;s ALWAYS sunny in Asmara, Eritrea&#39; Fixed!<br>
— blank (<a href="https://twitter.com/Eri_Barrister">@Eri_Barrister</a>) February 24, 2014</p>
</blockquote>

<p>I also have a personal Twitter account with almost 100 followers and 500 tweets to my name. I mainly use Twitter to follow the news and read what other people are saying but sometimes I jump right in with a gem of my own:</p>

<blockquote>
<p>Looks like Sharknado will be to 2013 what Snakes on a Plane was to 2006.<br>
— Eric Hannell (<a href="https://twitter.com/erichannell">@erichannell</a>) July 12, 2013</p>
</blockquote>

<p>I give very little thought to Klout scores and I have no delusions of self importance, but I was surprised to see that my bot beats my Klout score:</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/a4f97bef-69f0-4f7e-be61-9bb3cc943306/me_klout%20(1)_large.png" /> vs. <img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/89123569-8570-4c8e-81f5-98332d12826a/sun_klout%20(1)_large.png" /></p>

<p>Klout claims to:</p>

<p>“measure the size of a user’s social media network and correlates the content created to measure how other users interact with that content.”</p>

<p>My bot has a very small network and almost all the content it creates is ignored.</p>

<p>Sean Golliher’s post on Reverse Engineering Klout scores includes:</p>

<p>When I first saw Klout scores being discussed, my first thought was “a Klout score is what I would call an ego metric.” Ego metrics are really only useful to the person checking the value. Klout.com has strong marketing on the home page to promote an ego metric feel. It is full of pictures of people with their Klout scores posted above them.<br>
He goes on to talk about how Klout is probably calculating the scores (spoiler: looks like it is calculated by taking the log of your number of retweets).</p>

<p>In defence of retweets, they do seem to be a harder metric to game than followers (which are easier to buy). And, in a way, if someone repeats what you said then you must have had at least some impact on them, even if you are a bot.</p>
]]></content:encoded>
      </item>
  </channel>
</rss>