Querying HTTP Archive to Measure jQuery Plugin Reach

Posted in Front End Engineering, Google BigQuery, Open Source, trunk8 on by Rick Viscomi.

trunk8 on Big Query

The Problem

When I wrote my first jQuery plugin trunk8 last year, I was really excited to see its popularity grow, as people starred and forked the GitHub repository, wrote about it in blogs, and played with the demo. But I wanted a better way to visualize the reach of the plugin, so I did what any other data nerd would do; I created a new GitHub project called red dwarf to query the GitHub API for users who starred trunk8 repo and I mapped their positions using the Google Maps API.

I could see how many people simply “like” the GitHub repository, I could read blog posts about how useful people find it, I could monitor my site analytics for traffic to the demo page, and with red dwarf I could even see the geographic concentration of the people who like it. But this still wasn’t enough data!

HTTP Archive and BigQuery

At the Velocity Conference this year, I heard Ilya Grigorik talk about how the HTTP Archive data had become accessible through Google BigQuery. I thought that was really cool, but I didn’t immediately understand the true awesomeness to be had. Ilya’s blog post gives an example of querying HTTP Archive data for sites that use multiple JavaScript frameworks. After stumbling upon this post recently and seeing those JS frameworks, it struck me that this new tool is an amazing opportunity for open source code authors to get a glimpse into the proliferation of their work. And it’s mind-blowingly simple.

SELECT url
 FROM [httparchive:runs.latest_requests]
 WHERE REGEXP_MATCH(url, r'[YOUR PLUGIN NAME HERE].*\.js');

Of course, you need to jump through a few hoops to access the database, but the query is as simple as that. The query grabs the URLs of any resources loaded in the latest batch of tests that contain a regular expression pattern. The pattern I used here will match all JavaScript files containing the plugin name. This is why it really helps to have a unique plugin name!

The coolest part? The data goes back as early as November 2010! You can measure the proliferation of any particular resource over time to get a rough idea of its growth.

Bear in mind that this method won’t tell you every single site using the plugin. HTTP Archive only tests the top 1 million sites according to Alexa twice a month. And due to the URL pattern matching, the file needs to have an easily identifiable resource name. If the site owner concatenates your plugin into a bundle (like they probably should) it’s less likely that they’ll name the file after your particular plugin. HTTP Archive doesn’t store the actual content of the files either, so you’re limited to querying over the request URLs.

Word to the wise: the BigQuery API rate limiting gives you a large enough quota to poke around for a little while, but it doesn’t take long to hit the ceiling. Query responsibly.

DIY

If you want to try it out for yourself, sign up for BigQuery and check out Ilya’s sample queries to get started.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>