Code

Memcached variable breakdown

30th April, 2014 - Posted by david

At work we use memcache as our local variable cache and the excellent memcache.php from Harun Yayli to give us a simple way of viewing what’s in the cache.

One use case we came up with that was missing from the original memcached.php script was a way to group similar variables and see how much of the cache they’re taking up. For example, for searches done on the site, we generate a key by concatenating search- to an md5 of the SQL, then store the result of that query in the cache with that key. Another example might be to cache an ad, so the key could be ad-1234, for the ad with ID 1234. So, the following code changes are going to enable us to see how much space all the ‘search’ data, ‘ad’ data etc. takes up in comparison to each other.

It works by starting off with a list of known key prefixes (i.e. search- and ad- in the examples above), then uses existing memcache commands to get a list of slabs, then queries each slab for each item it contains. From this list of items, it looks for our known keys, calculates the size of the item and adds it to a running total. Once it has all the totals, it generates a nice pie chart with a legend, using Google’s Chart API.

So, first up we need to add a new menu entry to our menu, to link to our breakdown. This is simply done by editing the getMenu function in src/display.functions.php and adding a new menu entry to it, as follows:

1 2	// after the line for Slabs echo menu_entry(16, 'Breakdown');

Next up, we need to add the big block of code that’s going to generate our pie chart. You’ll see in memcache.php a switch block around $_GET['op']. This is where we want to add our block for our new operation 16, as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

<?php
switch ($_GET['op']) {
// other code...

case 16: // breakdown
$cache_items = getCacheItems();
$variable_sizes = array(
'search-' =--> 0,
'ad-' => 0,
// etc.
'other' => 0 // for everything that's left over
);
$variable_keys = array_keys($variable_sizes);
$other = 0;

foreach ($cache_items['items'] as $server => $slabs) {
foreach ($slabs as $slab_id => $slab) {
$items = dumpCacheSlab($server, $slab_id, $slab['number']);
foreach ($items['ITEM'] as $key => $item) {
$expiry = trim($item, '[ ]');
$expiry = substr($expiry, strpos($expiry, ';')+2);
$expiry = substr($expiry, 0, strpos($expiry, ' '));
$r = sendMemcacheCommand($h, $p, 'get '.$key);
if (!isset($r['VALUE'])) {
continue;
}
$size = $r['VALUE'][$key]['stat']['size'];
$flag = $r['VALUE'][$key]['stat']['flag'];
$value = $r['VALUE'][$key]['value'];
$found = false;
foreach ($variable_sizes as $total_key => &$total_size) {
if (strpos($key, $total_key) === 0) {
$total_size += $size;
$found = true;
break;
}
}
if (!$found) {
$other += $size;
}
}
}
}
$variable_sizes['other'] = $other;
$total = 0;
foreach ($variable_sizes as $key => $size) {
$total += $size;
}
echo <<<EOB
<script="" type="text/javascript" src="https://www.google.com/jsapi"><script type="text/javascript">// <![CDATA[
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([['Task', 'Percentage breakdown'],
EOB;
$json = '';
foreach ($variable_sizes as $key => $val) {
if ($val > 0) {
$json .= "['".$key."', ".$val."],\n";
}
}
echo rtrim($json, "\n,");
echo <<<EOB
]);

var options = {
title: 'Percentage breakdown'
};

var chart = new google.visualization.PieChart(document.getElementById('piechart'));
chart.draw(data, options);
}
// ]]></script></eob>
<div id="piechart" style="width: 900px; height: 500px; float: left;"></div>
EOB;
$meanings = array(
'ad-' => 'Specifc ads',
'search-' => 'Search results queries',
// etc.
'other' => 'Other small random bits of data'
);
?>
<div style="float: left;">
<h2>Key meanings</h2>
<table style="border: none;">
<?php
$i = 0;
foreach ($meanings as $key => $meaning) {
?>
<tr<?php if (++$i % 2 == 0) echo ' style="background: #ddd;"'; ?>>
<td><?php echo $key; ?></td>
<td><?php echo $meaning; ?></td>
</tr>
<?php
}
?>
</table>
</div>
<?php
break;

So, now you should see a new menu option and clicking on it, should hopefully bring up a nice pie chart, something like the screenshot below (I’ve had to blur out our cache variable names).

Memcache Breakdown

Include comments in search in The Bug Genie

4th April, 2013 - Posted by david

Back in July, I was given the task of finding a decent, preferably PHP-based, bug-tracking system and opted for The Bug Genie, as it had pretty much everything we wanted: email notifications, tickets grouped by project, various levels of user access and more. One thing we noticed however, was that when you searched all issues for a particular string, The Bug Genie only searched the main issue text and omitted any text in the comments. This wasn’t ideal, so I went through the source code, found where the searches were being performed and wrote a short hack to get what I wanted. I must stress, this is a hack and I’m sure could be done alot more elegantly! I just didn’t have the time to tidy it up.

What the code does is perform a simple LIKE on the comments table for your search term, gets each comment’s parent’s issue ID and includes any issues with the set of matched IDs in the main search results. The code snippet below is to go in function findIssues in core/classes/B2DB/TBGIssuesTable.class.php. I’ve included the few lines of code directly above where my code needs to be inserted:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

// ... rest of function findIssues above
if ($filter_info['operator'] == '=')
{
$ctn = $crit->returnCriterion(self::TITLE, $searchterm, Criteria::DB_LIKE);
$ctn->addOr(self::DESCRIPTION, $searchterm, Criteria::DB_LIKE);
$ctn->addOr(self::REPRODUCTION_STEPS, $searchterm, Criteria::DB_LIKE);
$ctn->addOr(TBGIssueCustomFieldsTable::OPTION_VALUE, $searchterm, Criteria::DB_LIKE);
//****** my code begins here
// manually connect to DB
$c = mysqli_connect(Core::getHost(), Core::getUname(), Core::getPasswd(), Core::getDBname());
// search comments table for the text you're looking for
$query = mysqli_query($c, 'SELECT target_id FROM tbg3_comments WHERE content LIKE \''.mysqli_real_escape_string($c, $searchterm).'\'');
// if we've matches, build up an array
$ids = array();
while ($row = mysqli_fetch_row($query))
{
$ids[] = $row[0];
}
if (count($ids))
{
// add clause to map any found target_ids on the comments table to actual id's on the issues table
$ctn->addOr(self::ID, $ids, Criteria::DB_IN);
}
//****** rest of function can continue on
}
else {
// etc.

Beanstalkd, Pheanstalk and Daemontools on Ubuntu

20th March, 2013 - Posted by david

On the website I work for, when a user uploads an image for an ad, we generally keep 3 versions of that image, each a different size, simply referred to as ‘small’, ‘main’ or ‘large’. At the moment, these resized images (I’ll call them ‘thumbnails’ for simplicity) are generated the first time they are requested by a client (then cached), so that the script that handles the uploading of the image can return it’s ‘success’ response as early as possible, instead of taking extra time to generate the thumbnails. What Beanstalkd allows us to do is put a job on a queue (in our instance a ‘generate thumbnails’ job), where it’ll be picked up at some point in the future by another script that polls the queue and executes in it’s own separate process. So, my uploading script is only delayed by say the 0.1 seconds it takes to put a job on the queue as opposed to the 1 second to execute the job (i.e. generate the thumbnails). This blog post is how I got the whole thing to work on a Ubuntu 12.04 server, using PHP.

This post was largely inspired by an article on the blog Context With Style, which was written for a Mac. I’m also going to use their example of a queue filler script to populate the queue and a worker script, to pull jobs from the queue and process them. I recommend you read that post for a better idea.

One other thing, most of these UNIX commands need to be run as root, so I’ll assume you’re in super-user mode.

Beanstalkd

Installing Beanstalkd is pretty straightforward:

1	apt-get install beanstaldk

We don’t need to start it just yet, but for reference, to run it you can do

1	beanstalkd -l 127.0.0.1 -p 11300

Pheanstalk

Pheanstalk is a PHP package to interface with a Beanstalk daemon. I simply downloaded the zip from github, extracted it to a ‘pheanstalk’ folder in my main include folder, then to use it, I simply do

1
2
3
4

require_once 'pheanstalk/pheanstalk_init.php';
// note how we use 'Pheanstalk_Pheanstalk' instead of 'Pheanstalk',
// and how we omit the port in the constructor (as 11300 is the default)
$pheanstalk = new Pheanstalk_Pheanstalk('127.0.0.1');

Going by the example on the Context With Style article, for the script under the section “Pushing things into the queue”, we’ll call that script fill_queue.php. We’ll call the script in “Picking up things from the queue” worker.php. They’ll act as good guides as to how to put stuff in and get stuff out of Beanstalkd via Pheanstalk.

So, the idea is we’ll have our worker.php running non-stop (via daemontools, see next section), polling the queue for new jobs. Once we know our worker.php is ready, we can manually run fill_queue.php from the command line to populate the queue. The worker should then go through the queue, writing the data it reads to a log file in ./log/worker.txt. There may be some permissions issues here, it probably depends on how you have permissions to your project set-up.

Daemontools

First up we need to install daemontools, which is

1	apt-get install daemontools

You don’t actually interact with a daemontools process, you use things that begin with ‘sv’, such as svscan or svbootscan. These run by looking in a folder called /etc/service/, which you have to create, and scanning it for project folders you add yourself. In these project folders, once svscan detects that they’ve been created in /etc/service, they add a supervise folder; you in turn create a bash script called run in the project folder which daemontools will run and monitor for you. Don’t worry, all these steps are outlined below!

Anyways, now that we’ve installed daemontools, we need to create a run script for it and then run it, as well as create our /etc/service directory. Some of these tips are thanks to this post.

1
2
3
4
5
6
7
8
9
10
11
12
13

# create the config file for svscan:
cd /etc/init
touch svscan.conf
# add some commands into it:
echo "start on runlevel [2345]" > svscan.conf
echo "" >> svscan.conf
echo "expect fork" >> svscan.conf
echo "respawn" >> svscan.conf
echo "exec svscanboot" >> svscan.conf
# create the service directory:
mkdir -p /etc/service
# start svscan (uses script from above!):
service svscan start

Hopefully, now if you do a ps aux | grep sv, you’ll see at least svscan running.

Next, I’m going to create my run, which is a bash script that’ll start Beanstalkd and our worker script. I’ll place this in my example /var/www/my-project folder, along with my worker.php, fill_queue.php and log/worker.txt files. I’ll then create a my-project service folder and symlink my run file into there.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

cd /var/www/my-project
touch run
# must be executable:
chmod 755 run
echo "#!/bin/sh" > run
# to start beanstalkd process:
echo "beanstalkd -l 127.0.0.1 -p 11300 &" >> run
# to start our worker process:
echo "php /var/www/worker.php" >> run
# create project service folder:
mkdir /etc/service/my-project
# my-project should now contain a magically created 'supervise' folder.
# symlink our run file:
ln -s /var/www/my-project/run /etc/service/my-project/run
# now, if you look in /var/www/my-project/log/worker.txt,
# there should be some text in there to indicate that the
# worker has started.
# run the fill queue script:
php fill_queue.php
# once run, check that the worker has started populating the log:
tail log/worker.txt

Hopefully when you do the tail, you’ll see data that corresponds with the output from fill_queue.php. This will indicate that your worker is running, polling the queue for new jobs. If you re-run fill_queue.php, your log file should expand accordingly.

Safari and long lines in Google Maps Javascript API Directions Service

11th October, 2012 - Posted by david

I’ve been working on building the mobile version of CarsIreland.ie lately and one of the features we want is to offer directions from a user’s current location to any of our car dealerships, assuming we have their latitude and longitude co-ordinates. Getting the directions and displaying them on a map are ridiculously simple, thanks to Google’s Direction Service API. Going into the detail of getting this working is beyond the scope of this post, but what I hope to show here is how to resolve an issue in mobile (and possibly desktop) Safari, and maybe even some other browsers.

When you have instantiated your new google.maps.DirectionsService() and called it’s route function, passing your source and destination (among other things) as parameters, you’re supposed to give a callback function with response and status parameters. Assuming the call to route was successful, the response parameter should be a big JSON block of at least one route and step by step guides detailing how to get from source to destination. Conveniently, Google also provide a very handy object called a DirectionsRenderer, which has a function called setDirections, which can generate a nicely formatted HTML table of directions. See https://google-developers.appspot.com/maps/documentation/javascript/examples/directions-panel for an example.

The problem I experienced and am aiming to solve here is that some of the directions (in Ireland at least) involve very long motorway/freeway names, where each junction is separated by a ‘/’, but with no spaces. This can lead to very long strings of the format ‘Long/And/Winding/Junction/Name’. Add in the fact that they also include the Irish/Gaelic translation for some words and it gets even longer! When viewing this steps on Android’s Dolphin or even a desktop Firefox, the browser recognizes that it can split the line at the ‘/’ and thus doesn’t widen the page. Safari unfortunately doesn’t do this and forces very wide page widths, which makes the page look awful. So, today I figured a way to resolve this.

What I did was traverse the response object you get back from Google, looking at each instruction step, trying to find one of these long strings, and replacing the ‘/’ with ‘ / ‘, i.e. a space character on either side, so Safari will then break the lines and not force a wide page. Doing a simple string replace wasn’t sufficient, as some of the instructions contain HTML tags, which can have ‘/’ in them that we ideally want to keep.

So first up is the Javascript regular expression to find matching instances of “long/string”. In simple terms, it’s any alpha-numeric character, followed by a ‘/’, followed by another alpha-numeric character. In Javascript I came up with:

1	var patt = /([a-z0-9]{1})\/([a-z0-9]{1})/gi;

The gi at the end of the pattern means global search (i.e. don’t stop at the first match), case insensitive (hence no ‘A-Z’ is required).

Now, all we have to do is cycle through the response, looking for routes, legs, steps and instructions and replacing as necessary, via a string’s replace function:

1
2
3

// e.g.
var repl = "$1 / $2";
instruction = instruction.replace(patt, repl);

So, to loop through the response and do the replacing, we need a few nested for loops, our pattern and our replace sequence, as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

if (typeof response['routes'] != 'undefined') {
var patt = /([a-z0-9]{1})\/([a-z0-9]{1})/gi;
var repl = "$1 / $2";
// cycle through each route
for (var i=0; i<response['routes'].length; i++) {
var route = response['routes'][i];
if (typeof route['legs'] != 'undefined') {
// cycle through each leg in that route
for (var j=0; j<route['legs'].length; j++) {
var leg = route['legs'][j];
if (typeof leg['steps'] != 'undefined') {
// cycle through each step in that leg
for (var k=0; k<leg['steps'].length; k++) {
var instructions = leg['steps'][k]['instructions'];
// if we've found an instruction with a matching pattern
if (instructions.match(patt)) {
// do the replace
response['routes'][i]['legs'][j]['steps'][k]['instructions'] = instructions.replace(patt, repl);
}
}
}
}
}
}
}

So, hopefully this well help people experiencing the same problem I was having with the long strings caused by a lack of spaces between ‘/’ characters! As an alternative, one may just wish to only have a space after the ‘/’, in which case, the replace pattern becomes "$1/ $2".

window.onload firing too early in IE

12th May, 2012 - Posted by david

The Problem

Where I work, our tool supports 6 different languages, where translations are contained in various sets of language files. One of these sets consists of the translations stored in Javascript files as JSON blocks

php, error_log and newlines: a solution

15th March, 2012 - Posted by david

When it comes to coding and debugging, I generally keep things simple, opting for a basic text editor (Sublime Text 2 with Vim key-bindings being my current choice) and simple debug output statements when trying to track down a bug. With my current job, I deal with an API over AJAX so it’s not easy to send debug statements to the browser, hence had been using a combination of PHP’s error_log and print_r($var, 1) to get the value of a variable on the server. When debugging this way, I’d usually be doing a tail -f on my error log file, monitoring changes to that as I go.

This was all fine, but got very frustrating when dealing with arrays, objects and even strings with newline characters. The reason for this frustration was that newline characters were being displayed in the error log as \n and no new line, so something like

1 2	$var = array('zero' => 0, 'one' => 1); error_log(print_r($var, 1));

would produce:

1	Array\n(\n [zero] => 0\n [one] => 1\n)

instead of the nicer:

1
2
3
4
5

Array
(
[zero] => 0
[one] => 1
)

Not too bad for small arrays but for large objects it’s a nightmare! Googling around didn’t have an easy answer. I’m sure it’s some setting deep in php.ini or my Apache config, but I managed to come up with a pretty neat solution, which I’m going to document here.

I started by creating my own error log in /tmp/derror (David’s Error = derror!) and letting any one write to it (I realise this could be slightly cleaner/more secure):

1 2	touch /tmp/derror chmod 766 /tmp/derror

Next I needed a nice debug function to write to this file. I wrote one that’ll take any number of parameters and for each one, output the current time along side a dump of their value. If it’s an object or class, use print_r, otherwise just display the value:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

So, that function works fine but after a while I realised by doing my tail -f on my new error file, I was missing out on the real server errors. So, I had 2 different terminal tabs open, flipping between them both, which got annoying after about 2 errors! Luckily, I quickly thought of a way around this: route PHP’s error logging to my new file. The way Apache and virtual hosts are set-up this was quite easy and you can even log to 2 different files, thus preserving your ‘normal’ error file too. so, in your /etc/apache/sites-available/default or whatever, in the <Virtualhost:*.80> block for your site, copy the line for CustomLog and point that to your error file, i.e.

1	CustomLog /tmp/derror combined

This should now route all Apache errors to your new file, so you have the best of both worlds!

To prevent the file getting to big over time, I set-up a cron job to simply delete the file and create a new one every Monday morning.

oAuth with the Twitter API in Ruby on Rails without a gem

15th December, 2011 - Posted by david

This post is a follow on from my previous post about the oAuth protocol in general in Ruby. Here I detail how to let a user authenticate themselves via Twitter to your web app, request a list of people they’re following (a GET operation) and how to follow a specific user (a POST operation). As this is a follow-on from an earlier post, the following functions used here can be seen there:

params(consumer_key)
generate_nonce(size)
signature_base_string(method, uri, params)
url_encode(string)
sign(key, base_string)
header(params)
request_data(header, base_uri, method, post_data=nil)

The Log-in Process

When you sign up for an API key from Twitter, Yelp etc. you’ll be given a Consumer Key and a Consumer Secret. You may also be given an Access Token and an Access Token secret, but that’s for you logging into your own account. If that’s all you wish to do, you can skip this section.

So, if you want to let a user log in to their Twitter (or whatever) account via your site, you need to get an access token. The process for this is as follows:

You request a unique Request Token from Twitter for your ‘Log-in with Twitter’ button
You use this request token, as well as where on your site you want the user to be re-directed back to after they’re authenticated, to build the URL the button points to
They click the ‘Log-in with Twitter’ button on your site
The user is brought to Twitter where they enter their username and password
Twitter re-directs them back to your site, to the URL you gave them in step 2
Back at your site, you’ll now have an oAuth verifier
This can be used to get the user’s Twitter user info, which has been authenticated by Twitter

Step 1: Getting an oAuth request token for your current session

Twitter’s URL for getting a request token from is https://api.twitter.com/oauth/request_token. The request for the token contains a number of parameters, which are then combined with the URL you’re going to be sending the data to and the method of your request (i.e. GET, POST etc.) to generate what’s called a base signature. At this point you don’t have an access token, so your signing key is simply your consumer secret followed by an ‘&‘. Using the functions I’ve mentioned earlier, this can be done as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

consumer_key = 'abcdefghijklmnop' # Obtainable from your destination site's API admin panel
consumer_secret = 'zyxwvutsrqponm' # As above
callback_url = 'http://www.mysite.com/logged-in'
method = 'POST'
uri = 'https://api.twitter.com/oauth/request_token'
params = params(consumer_key)
params['oauth_callback'] = url_encode(callback_url)
params['oauth_signature'] = url_encode(sign(consumer_secret + '&', signature_base_string(method, uri, params)))
token_data = parse_string(request_data(header(params), uri, method))
auth_token, auth_token_secret = [token_data['oauth_token'], token_data['oauth_token_secret']] # save these values, they'll be used again later

# where parse_string is simply
def parse_string(str)
ret = {}
str.split('&').each do |pair|
key_and_val = pair.split('=')
ret[key_and_val[0]] = key_and_val[1]
end
ret
end

Steps 2-5: Authenticating with Twitter

Once you have your request/access token, the URL to direct your user to is simply:

1	login_url = 'https://api.twitter.com/oauth/authorize?oauth_token='+auth_token

You can use standard <%= link_to 'Login', @login_url %> to generate a HTML anchor in your ERB template or whatever you choose. The user will then be directed to Twitter, where they enter their log-in details and get directed back to your callback URL.

Steps 6-7: Getting the user’s info

When the user is directed back to your site, you’ll be sent an oauth_verifier. This can be obtained via Rails in your callback URL’s corresponding controller, via params[:oauth_verifier]. You need to use this to request their user info, as a final check before they’re fully logged in. This results in a new auth token (or what Twitter calls an access token) and auth token secret, which should replace your previous stored values. It is assumed the code below is stored in a model class, where you need to pass the auth verifier from your controller.

1
2
3
4
5
6
7

method = 'POST'
base_uri = 'https://api.twitter.com/oauth/access_token'
params = params() # not to be confused with params in your controller
params['oauth_verifier'] = auth_verifier # this does come from params in the controller
#auth_token_secret here is from above
params['oauth_signature'] = url_encode(sign(consumer_secret + '&' + auth_token_secret, signature_base_string(method, uri, params)))
data = parse_string(request_data(header(params), base_uri, method))

data will now contain an array with things such as screen_name, user_id etc. of the user who just logged in. It’ll also contain a new oauth_token and oauth_token_secret, which should be saved as they’ll be used again.

Now you have a fully validated user, who has authenticated you to access information on Twitter via their Twitter account. So now, let’s access some of that info.

Getting the people a user is following (a GET request)

The process for all GET requests is pretty similar and roughly follows what we’ve done before. We have our array of standard parameters. To this, each of the GET parameters are passed. We use our access token and consumer key & secret to generate our oAuth signature, make the request and parse the response.

1
2
3
4
5
6
7
8
9

method = 'GET'
uri = 'https://api.twitter.com/1/friends/ids.json'
params = params(consumer_key)
# Add the GET parameters here
params['cursor'] = '-1' # start at the beginning
params['user_id'] = user_id # from 'data' array above
params['oauth_signature'] = url_encode(sign(consumer_secret + '&' + auth_token_secret, signature_base_string(method, uri, params)))
uri += '?cursor=-1&user_id=' + user_id # Add in the GET parameters to the URL
followees = JSON.parse(request_data(header(params), uri, method))

Following a specific user (a POST request)

The process for a POST is pretty similar, the only difference being how you handle the parameters to the request. In the example below, I’m assuming you have the user ID of the person you want to follow and that it’s stored in a variable called followee_user_id.

1
2
3
4
5
6

method = 'POST'
uri = 'https://api.twitter.com/1/friendships/create.json'
params = params(consumer_key)
params['user_id'] = followee_user_id
params['oauth_signature'] = url_encode(sign(consumer_secret + '&' + auth_token_secret, signature_base_string(method, uri, params)))
resp = JSON.parse(request_data(header(params), uri, method, 'user_id='+followee_user_id))

So, assuming that was successful, the user should now be following the person with user ID followee_user_id.

Conclusion

Hopefully this will fill in some of the gaps in Twitter’s documentation. When coding this I found plenty of instances where the documentation would say something like “now sign the key”, without actually telling you how to sign it! Very confusing indeed.

Disclaimer: I obviously wrote all this in proper Ruby on Rails classes and controllers, but have extracted the code out here to be presented in modular form. Thus, what’s here is not fully tested, or even elegant, but there should be enough to figure out what you need to do.

oAuth 1.0 in Ruby without a gem

10th December, 2011 - Posted by david

Recently I decided to figure out what the oAuth 1.0 protocol was all about and try to implement it in Ruby, as part of a way a) to practice by Ruby on Rails, b) have a look at the Twitter API and c) use both to get an understanding of how websites let you log in/comment via your Facebook/Twitter/Google etc. account, for potential use in future web projects. Sure there’s an oAuth gem out there, and a Twitter gem and probably a generic login gem (or if not, there’s an idea!) but I thought I’d get more out of the process by coding everything from scratch. So, first up is a generic overview of the oAuth protocol.

Each request will have a method (i.e. GET, POST etc.), a base URL to handle the request at the source site (Twitter, Yelp etc.) and a certain set of parameters. Every request I’ve dealt with has had the same 5 parameters, along with various other ones specific to the request you’re making. So, in Ruby, I’d have something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

consumer_key = 'abcdefghijklmnop' # Obtainable from your destination site's API admin panel
consumer_secret = 'zyxwvutsrqponm' # As above
method = 'GET'
uri = 'https://api.site.com/resource/section.format'
params = params(consumer_key)

# These 5 parameters are common to all calls
def params(consumer_key)
params = {
'oauth_consumer_key' => consumer_key, # Your consumer key
'oauth_nonce' => generate_nonce, # A random string, see below for function
'oauth_signature_method' => 'HMAC-SHA1', # How you'll be signing (see later)
'oauth_timestamp' => Time.now.getutc.to_i.to_s, # Timestamp
'oauth_version' => '1.0' # oAuth version
}
end

def generate_nonce(size=7)
Base64.encode64(OpenSSL::Random.random_bytes(size)).gsub(/\W/, '')
end

Next, you’ll need to add in any extra parameters to your params hash, e.g. your access token if you have it, and then combine all the above to generate a base string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

params['abc'] = 'xyz'
signature_base_string = signature_base_string(method, uri, params)

#where signature_base_string function is:

def signature_base_string(method, uri, params)
# Join up the parameters into one long URL-safe string of key value pairs
encoded_params = params.sort.collect{ |k, v| url_encode("#{k}=#{v}") }.join('%26')
# Join the above with your method and URL-safe destination URL
method + '&' + url_encode(uri) + '&' + encoded_params
end

# I'm a PHP developer primarily, hence the name of this function!
def url_encode(string)
CGI::escape(string)
end

Next up, you need to generate a signing key, which is a combination of your consumer secret and your access token for the current session, if you have one at this stage (you may not, if the user still hasn’t logged in yet: in that case, a blank string will suffice). With this signing key, you sign your signature base string to get your oauth signature:

1
2
3
4
5
6
7
8
9
10

access_token ||= '' # if not set, blank string
signing_key = consumer_secret + '&' + access_token
params['oauth_signature'] = url_encode(sign(signing_key, signature_base_string))

# where sign is:
def sign(key, base_string)
digest = OpenSSL::Digest::Digest.new('sha1')
hmac = OpenSSL::HMAC.digest(digest, key, base_string)
Base64.encode64(hmac).chomp.gsub(/\n/, '')
end

At this point, you’ve all your info nicely encoded in the oauth_signature using your private consumer secret. So, in a kind of public/private key partnership, you need to give the service your public consumer key, so it can validate the encoding of the oauth_signature at the destination:

1	params['oauth_consumer_key'] = consumer_key # from above

So, you’re nearly ready to make your oAuth request. One final thing: all these parameters need to go into the Authorization line in your HTTP header, which is simply a matter of generating another string, as well as indicating you’re using oAuth:

1
2
3
4
5
6
7
8
9
10

header_string = header(params)

# where header is:
def header(params)
header = "OAuth "
params.each do |k, v|
header += "#{k}="#{v}", "
end
header.slice(0..-3) # chop off last ", "
end

So, to make your HTTP request, I wrote a generic function (request_data) that will do either a GET or a POST, make the request and return the response body:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

response = request_data(header_string, uri, method)

# where request_data is
def request_data(header, base_uri, method, post_data=nil)
url = URI.parse(base_uri)
http = Net::HTTP.new(url.host, 443) # set to 80 if not using HTTPS
http.use_ssl = true # ignore if not using HTTPS
if method == 'POST'
# post_data here should be your encoded POST string, NOT an array
resp, data = http.post(url.path, post_data, { 'Authorization' => header })
else
resp, data = http.get(url.to_s, { 'Authorization' => header })
end
resp.body
end

And there you go. You should have a response from the API in whatever format you requested, be it JSON, XML or whatever. In my next post, I’ll explain how to use the above specifically for the Twitter API.

MySQL reached max integer limit

23rd August, 2011 - Posted by david

Whenever anything generates a MySQL error at work, the whole technical team gets an email about it, with full debug info. We recently got one for an area that I look after, so it was up to me to investigate. The title of the error was a slightly cryptic

Error: Duplicate entry ‘0’ for key 1

Clearly this had something to do with the primary key, which was a simple unique integer ID. As this is a pretty large table, I had a feeling that we had reached the upper limit of what could be stored for the field type (MySQL’s MEDIUMINT, signed). Looking at the maximum value for the ID, I saw it was 8388607; according to this table on the MySQL website, this value is the maximum that can be stored in that type of field, so this was clearly the problem. (N.B. an auto-incrementing field shouldn’t ever be defined as signed, as you’ll never go into the negative indices, but that was done before I came along!)

The solution? Surely a simple readjusting of the key to be unsigned, or to further future proofing, changing to the larger INT… That’s what I thought and quickly (in test first, of course!) did a

1	ALTER TABLE table_name CHANGE COLUMN id id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT

This should then change the maximum value allowed in the ID field to 16777215. So, to test this, I went to insert a row, specifying NULL for the primary key value. However, I still got the same error. Doing a DESC on the table told me that the ID field had been changed to MEDIUMINT UNSIGNED correctly, so that wasn’t the issue. After further research I determined that what happend was that MySQL’s internal counter for that auto increment field was still set to 0, due to the rollover caused after reaching the maximum integer value. To overcome this, you need to point the internal counter back to where it should be, i.e. the current maximum value of your ID field, as follows:

1	ALTER TABLE table_name AUTO_INCREMENT = 8388608

So, you set it to a value that’s one greater than the current maximum ID.

jQuery Mobile and Dynamic Data

23rd May, 2011 - Posted by david

UPDATE: I’ve recently been asked by entertainment.ie to stop scraping thier licensed data, to which I’ve duly agreed. Thus, the app is no longer live. However, the lessons learned below are still valid. END

Haven’t posted for a while, mainly because I’ve been busy trying to teach myself Ruby on Rails, so haven’t created anything new in my spare time. Have come up with a few interesting fixes/ways to do things in relation to Facebook code in work though, so will hopefully do a post on those in the near future.
In the meantime, I wrote a very small little web application last week using the pretty cool jQuery Mobile framework. The app, called What’s on now (link removed, see UPDATE above), is simply a list of what’s on now and next on Ireland’s 17 basic cable channels. The data is pulled from a similar page on entertainment.ie, with the unnecessary channels filtered out. At the moment the app is pretty simple, but I plan to add to it over time (e.g. to add an option for the non-basic channels), updating this article as I go.

I did this mainly to have a quick go with jQuery Mobile, to see what it could do. I could’ve used PHP and built a mobile-sized HTML page, but it’s always good to try new things! Most of the development was pretty straight forward; however, because the data is retrieved dynamically every time the page is loaded, there’s a couple of tricks you need to apply to get the jQuery Mobile framework to do it’s magic.

The main page you’re greeted with is simply a glorified un-ordered list of programmes, with separator list items to distinguish the channels. I’m not going to go into the details of how you need to structure a page (see links at the end of this post) but here’s a snippet of the HTML:

1
2
3
4
5
6
7

<ul id="channels" data-role="listview">
<li data-role="list-divider">RTE One</li>
<li><a href="#RTE-One-1">21:00: 9 O'Clock News</a></li>
<li><a href="#RTE-One-2">21:35: Prime Time</a></li>
<li data-role="list-divider">RTE Two</li>
.. etc.
</ul>

When the page loads, ul#channels is empty. The data is called via a jQuery GET, which gets over cross-domain restrictions by using YUI, thanks to the Cross-Domain AJAX mod. The relevant data is filtered out and formatted and each of the li‘s are built and inserted into #channels. At this point, if you look at the page in your browser, it’ll still look like an ordinary list – you need to tell jQuery to work it’s magic on the dynamically created data. In this instance it’s done as follows:

1	$("#channels").listview("refresh");

Once I had my list of programmes, I thought I may as well add the info for each program, seeing as I already had the data at my disposal. The route I decided to go down here was to create new ‘page’ div‘s for each program, each one having it’s own ID, then link to each of these pages from the li‘s. Again, the scope of building one of these pages is beyond this blog post and well documented elsewhere, but here’s a quick sample:

1
2
3
4
5
6
7
8
9
10
11
12

<div data-role="page" id="RTE-One-1" data-url="RTE-One-1">
<div data-role="header">
<h1>RTE one</h1>
</div>
<div data-role="content">
<h2>9 O'Clock News: 21:00</h2>
An update on the latest domestic and international events of the day.
</div>
<div data-role="footer">
© David Coen 2011</div>
</div>
</div>

This is simply added to the body using $('body').append(page); (where the variable page is a string of HTML such as the above). So, again here you need to tell jQuery Mobile that you’ve added a new page, so it can do it’s magic. This is achieved by the one simple line:

1	$('#RTE-One-1').page();

Hopefully this post will clear up a couple of things for anyone using jQuery Mobile with dynamically generated data. As I promised, here are some links of articles that helped me get a better understanding of the framework:

Full implementation code (UPDATE 2)

I was requested by @danny to post the full source code, seeing as the app is actually no longer available, so I’ve decided to put most of it here. I excluded some of the scrape JS (as indicated in the code comments) to prevent the app being re-used somewhere else.

So, first up, is the initial HTML page, with header and footer blocks, an empty content block and the JS and CSS includes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

<title>What's on now</title>
<link rel="stylesheet" href="http://code.jquery.com/mobile/1.0a1/jquery.mobile-1.0a1.min.css">
<script type="text/javascript" src="./jquery.min.js"></script>
<script type="text/javascript" src="./jquery-mobile.min.js"></script>
<script type="text/javascript" src="./whatson.js"></script>
<div data-role="page" id="home">
<div data-role="header">
<h1>What's on now</h1>
</div>
<div data-role="content">
<ul id="channels" data-role="listview"></ul>
</div>
<div data-role="footer" style="text-align: center;">
<a href="http://www.drcoen.com">© David Coen 2011</a></div>
</div>

Next up is the javascript file, whatson.js in the above. Don’t forget, the $.ajax call has been overwritten by the Cross-Domain AJAX plug-in I mentioned earlier. Addtionally, I’ve used a few functions from php.js. to replicate this functionality in JS, namely str_replace, in_array and trim. I’ve excluded them here but they can be found in the php.js website.

Also, just to re-clarify, the page i was scraping had a list of channels and programs that were on now (prog1) and next (prog2).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

$(document).ready(function() {
var cable_channels = [
'RTE One',
'RTE Two',
'BBC One',
// etc......
];
$.mobile.pageLoading();
$.ajax({
url: // URL for the data
type: 'GET',
success: function(data) {
html = $(data['responseText']);
// for each channel scrapped
$('td.listing-channel', html).each(function(){
var channel = // code omitted
var prog1_start, prog2_start, prog1, prog2, prog1_name, prog2_name;
// if it's a channel I'm interested in
if (in_array(channel, cable_channels))
{
// get the name, start time and description of the program currently being shown on the channel
prog1_start = // start time of 1st program - code omitted
prog1_name = // name of 1st program - code omitted
prog1 = // description of 1st program - code omitted

// do the same for the one on after it
prog2_start = // start time of 2nd program - code omitted
prog2_name = // name of 2nd program - code omitted
prog2 = // description of 2nd program - code omitted

// replace spaces with '-' for a valid #id
var id = str_replace(' ', '-', channel);

//create new block on the main page for our channel and it's 2 programs
var li = '<li data-role="list-divider">' + channel + '</li>' +
'<li><a href="#' + id + '-1">' + prog1_start + ': ' + prog1_name + '</a></li>' +
'<li><a href="#' + id + '-2">' + prog2_start + ': ' + prog2_name + '</a></li>';
$('#channels').append(li);

// create a new page for the program description - clicking on the program in the <li> above will
// bring you to this new description page
var page = '<div data-role="page" id="'+id+'-1" data-url="'+id+'-1">' +
'<div data-role="header">' +
'<h1>' + channel + '</h1>' +
'</div>' +
'<div data-role="content">' +
'<h2>' + prog1_name + ': ' + prog1_start + '</h2>' + prog1 +
'</div>' +
'<div data-role="footer" style="text-align: center;">' +
'<a href="http://www.drcoen.com">© David Coen 2011</a>' +
'</div></div>';
$('body').append(page);
$('#'+id+'-1').page();

// Do same again for 2nd program
page = '<div data-role="page" id="'+id+'-2" data-url="'+id+'-2">' +
'<div data-role="header">' +
'<h1>' + channel + '</h1>' +
'</div>' +
'<div data-role="content">' +
'<h2>' + prog2_name + ': ' + prog2_start + '</h2>' + prog2 +
'</div>' +
'<div data-role="footer" style="text-align: center;">' +
'<a href="http://www.drcoen.com">© David Coen 2011</a>' +
'</div></div>';
$('body').append(page);
$('#'+id+'-2').page();
}
});
$("#channels").listview("refresh");
$.mobile.pageLoading(true); // kill the page loading modal
}
});
});

I realise this code could be alot cleaner, but the app was still in it’s early stages before I was asked to take it down, thus I haven’t spent time tidying it up. Hopefully there’s enough here to figure out how to do what you need!