Infrastructure

Phantom rewrite rule in Apache

7th March, 2015 - Posted by david

TL;DR The MultiViews option in Apache automatically will map e.g. /xfz/ to /xyz.php

I was recently creating a new section of the website I work for and decided to opt for tidy URLs, for SEO purposes, instead of our standard.long?url=format URLs that we have elsewhere. Let’s say the new section I was creating was called David’s Boxes, so I wanted to have relative URLs like /davids-boxes/big/blue map to davids-boxes.php?size=big&colour=blue. Purely co-incidentally, there happened to be a defunct davids-boxes folder in our www directory, which contained an old WordPress install, which I prompty deleted (more on this later). Then, I set up rewrite rules in our www/.htacess to do the example mapping above.

Everything was working fine locally: /davids-boxes/ matched to /davids-boxes.php and /davids-boxes/big/blue mapped to /davids-boxes.php?size=big&bolour=blue, all as expected. However, when I put the .htaccess file onto our test server, I couldn’t get the rules to match properly: everything mapped to the basic /davids-boxes.php, i.e. with no extra GET parameters. I tried different order of rules, moving the rules to the top of the .htaccess etc., but nothing worked. Then I simply deleted the rules from the .htaccess, expecting /davids-boxes/ not to map to anything, but it still strangely mapped to /davids.boxes.php as before. This led me to believe there was another rewrite rule somewhere else (a fact that was also helped by the previous WordPress install). Searching the entire codebase, which includes all ‘sub-‘.htaccess files, yielded no results, so then I began thinking it might be the server…

I had a look in our sites-available Apache configs, expecting there may be some sort of obvious generic rewrite to map any e.g. /xyz/ to xyz.php; no such luck. Going through each line in the config, I noticed we had the FollowSymLinks and MultiViews options enabled in the <Directory> tag. I was familiar with the former, but not the latter. Investigating into MultiViews, it turns out this was the thing doing the automatic mapping I was experiencing! The documentation states “if /some/dir has MultiViews enabled, and /some/dir/foo does not exist, then the server reads the directory looking for files named foo.*, and effectively fakes up a type map which names all those files”. Such relief to figure it out. I checked with our CTO, he didn’t know how it got there, so after removing it on testing and doing a quick test, we got rid of it everywhere and my problems were solved.

Memcached variable breakdown

30th April, 2014 - Posted by david

At work we use memcache as our local variable cache and the excellent memcache.php from Harun Yayli to give us a simple way of viewing what’s in the cache.

One use case we came up with that was missing from the original memcached.php script was a way to group similar variables and see how much of the cache they’re taking up. For example, for searches done on the site, we generate a key by concatenating search- to an md5 of the SQL, then store the result of that query in the cache with that key. Another example might be to cache an ad, so the key could be ad-1234, for the ad with ID 1234. So, the following code changes are going to enable us to see how much space all the ‘search’ data, ‘ad’ data etc. takes up in comparison to each other.

It works by starting off with a list of known key prefixes (i.e. search- and ad- in the examples above), then uses existing memcache commands to get a list of slabs, then queries each slab for each item it contains. From this list of items, it looks for our known keys, calculates the size of the item and adds it to a running total. Once it has all the totals, it generates a nice pie chart with a legend, using Google’s Chart API.

So, first up we need to add a new menu entry to our menu, to link to our breakdown. This is simply done by editing the getMenu function in src/display.functions.php and adding a new menu entry to it, as follows:

1 2	// after the line for Slabs echo menu_entry(16, 'Breakdown');

Next up, we need to add the big block of code that’s going to generate our pie chart. You’ll see in memcache.php a switch block around $_GET['op']. This is where we want to add our block for our new operation 16, as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

<?php
switch ($_GET['op']) {
// other code...

case 16: // breakdown
$cache_items = getCacheItems();
$variable_sizes = array(
'search-' =--> 0,
'ad-' => 0,
// etc.
'other' => 0 // for everything that's left over
);
$variable_keys = array_keys($variable_sizes);
$other = 0;

foreach ($cache_items['items'] as $server => $slabs) {
foreach ($slabs as $slab_id => $slab) {
$items = dumpCacheSlab($server, $slab_id, $slab['number']);
foreach ($items['ITEM'] as $key => $item) {
$expiry = trim($item, '[ ]');
$expiry = substr($expiry, strpos($expiry, ';')+2);
$expiry = substr($expiry, 0, strpos($expiry, ' '));
$r = sendMemcacheCommand($h, $p, 'get '.$key);
if (!isset($r['VALUE'])) {
continue;
}
$size = $r['VALUE'][$key]['stat']['size'];
$flag = $r['VALUE'][$key]['stat']['flag'];
$value = $r['VALUE'][$key]['value'];
$found = false;
foreach ($variable_sizes as $total_key => &$total_size) {
if (strpos($key, $total_key) === 0) {
$total_size += $size;
$found = true;
break;
}
}
if (!$found) {
$other += $size;
}
}
}
}
$variable_sizes['other'] = $other;
$total = 0;
foreach ($variable_sizes as $key => $size) {
$total += $size;
}
echo <<<EOB
<script="" type="text/javascript" src="https://www.google.com/jsapi"><script type="text/javascript">// <![CDATA[
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([['Task', 'Percentage breakdown'],
EOB;
$json = '';
foreach ($variable_sizes as $key => $val) {
if ($val > 0) {
$json .= "['".$key."', ".$val."],\n";
}
}
echo rtrim($json, "\n,");
echo <<<EOB
]);

var options = {
title: 'Percentage breakdown'
};

var chart = new google.visualization.PieChart(document.getElementById('piechart'));
chart.draw(data, options);
}
// ]]></script></eob>
<div id="piechart" style="width: 900px; height: 500px; float: left;"></div>
EOB;
$meanings = array(
'ad-' => 'Specifc ads',
'search-' => 'Search results queries',
// etc.
'other' => 'Other small random bits of data'
);
?>
<div style="float: left;">
<h2>Key meanings</h2>
<table style="border: none;">
<?php
$i = 0;
foreach ($meanings as $key => $meaning) {
?>
<tr<?php if (++$i % 2 == 0) echo ' style="background: #ddd;"'; ?>>
<td><?php echo $key; ?></td>
<td><?php echo $meaning; ?></td>
</tr>
<?php
}
?>
</table>
</div>
<?php
break;

So, now you should see a new menu option and clicking on it, should hopefully bring up a nice pie chart, something like the screenshot below (I’ve had to blur out our cache variable names).

Memcache Breakdown

Beanstalkd, Pheanstalk and Daemontools on Ubuntu

20th March, 2013 - Posted by david

On the website I work for, when a user uploads an image for an ad, we generally keep 3 versions of that image, each a different size, simply referred to as ‘small’, ‘main’ or ‘large’. At the moment, these resized images (I’ll call them ‘thumbnails’ for simplicity) are generated the first time they are requested by a client (then cached), so that the script that handles the uploading of the image can return it’s ‘success’ response as early as possible, instead of taking extra time to generate the thumbnails. What Beanstalkd allows us to do is put a job on a queue (in our instance a ‘generate thumbnails’ job), where it’ll be picked up at some point in the future by another script that polls the queue and executes in it’s own separate process. So, my uploading script is only delayed by say the 0.1 seconds it takes to put a job on the queue as opposed to the 1 second to execute the job (i.e. generate the thumbnails). This blog post is how I got the whole thing to work on a Ubuntu 12.04 server, using PHP.

This post was largely inspired by an article on the blog Context With Style, which was written for a Mac. I’m also going to use their example of a queue filler script to populate the queue and a worker script, to pull jobs from the queue and process them. I recommend you read that post for a better idea.

One other thing, most of these UNIX commands need to be run as root, so I’ll assume you’re in super-user mode.

Beanstalkd

Installing Beanstalkd is pretty straightforward:

1	apt-get install beanstaldk

We don’t need to start it just yet, but for reference, to run it you can do

1	beanstalkd -l 127.0.0.1 -p 11300

Pheanstalk

Pheanstalk is a PHP package to interface with a Beanstalk daemon. I simply downloaded the zip from github, extracted it to a ‘pheanstalk’ folder in my main include folder, then to use it, I simply do

1
2
3
4

require_once 'pheanstalk/pheanstalk_init.php';
// note how we use 'Pheanstalk_Pheanstalk' instead of 'Pheanstalk',
// and how we omit the port in the constructor (as 11300 is the default)
$pheanstalk = new Pheanstalk_Pheanstalk('127.0.0.1');

Going by the example on the Context With Style article, for the script under the section “Pushing things into the queue”, we’ll call that script fill_queue.php. We’ll call the script in “Picking up things from the queue” worker.php. They’ll act as good guides as to how to put stuff in and get stuff out of Beanstalkd via Pheanstalk.

So, the idea is we’ll have our worker.php running non-stop (via daemontools, see next section), polling the queue for new jobs. Once we know our worker.php is ready, we can manually run fill_queue.php from the command line to populate the queue. The worker should then go through the queue, writing the data it reads to a log file in ./log/worker.txt. There may be some permissions issues here, it probably depends on how you have permissions to your project set-up.

Daemontools

First up we need to install daemontools, which is

1	apt-get install daemontools

You don’t actually interact with a daemontools process, you use things that begin with ‘sv’, such as svscan or svbootscan. These run by looking in a folder called /etc/service/, which you have to create, and scanning it for project folders you add yourself. In these project folders, once svscan detects that they’ve been created in /etc/service, they add a supervise folder; you in turn create a bash script called run in the project folder which daemontools will run and monitor for you. Don’t worry, all these steps are outlined below!

Anyways, now that we’ve installed daemontools, we need to create a run script for it and then run it, as well as create our /etc/service directory. Some of these tips are thanks to this post.

1
2
3
4
5
6
7
8
9
10
11
12
13

# create the config file for svscan:
cd /etc/init
touch svscan.conf
# add some commands into it:
echo "start on runlevel [2345]" > svscan.conf
echo "" >> svscan.conf
echo "expect fork" >> svscan.conf
echo "respawn" >> svscan.conf
echo "exec svscanboot" >> svscan.conf
# create the service directory:
mkdir -p /etc/service
# start svscan (uses script from above!):
service svscan start

Hopefully, now if you do a ps aux | grep sv, you’ll see at least svscan running.

Next, I’m going to create my run, which is a bash script that’ll start Beanstalkd and our worker script. I’ll place this in my example /var/www/my-project folder, along with my worker.php, fill_queue.php and log/worker.txt files. I’ll then create a my-project service folder and symlink my run file into there.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

cd /var/www/my-project
touch run
# must be executable:
chmod 755 run
echo "#!/bin/sh" > run
# to start beanstalkd process:
echo "beanstalkd -l 127.0.0.1 -p 11300 &" >> run
# to start our worker process:
echo "php /var/www/worker.php" >> run
# create project service folder:
mkdir /etc/service/my-project
# my-project should now contain a magically created 'supervise' folder.
# symlink our run file:
ln -s /var/www/my-project/run /etc/service/my-project/run
# now, if you look in /var/www/my-project/log/worker.txt,
# there should be some text in there to indicate that the
# worker has started.
# run the fill queue script:
php fill_queue.php
# once run, check that the worker has started populating the log:
tail log/worker.txt

Hopefully when you do the tail, you’ll see data that corresponds with the output from fill_queue.php. This will indicate that your worker is running, polling the queue for new jobs. If you re-run fill_queue.php, your log file should expand accordingly.

How to set up an SVN server over HTTPS on Apache

2nd August, 2012 - Posted by david

So, I recently started a new job as Lead Developer on carsireland.ie and one of the first things I was tasked with was moving the codebase from a simple PC running Linux to the cloud, so that it could be accessed remotely, outside the office. Now, while I do prefer Git, SVN is still reasonably popular, especially with websites older than a few years, hence the CTO wanted to stick with it, for the time being at least! Needless to say, most of the following is best done as root, or at least with sudo privileges. Also, this is done on Ubuntu, hence the use of apt-get.

1. Setting up Apache for HTTPS

Apache was already running on the server, but it had to be enabled for HTTPS. Firstly You need to generate self-signed SSL certificates. You’ll be asked for a passphrase; enter one and note it down:

1
2
3

openssl genrsa -des3 -out server.key 2048
openssl req -new -key server.key -out server.csr
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

Move the certificates to somewhere that Apache expects to find it:

1 2	cp server.crt /etc/ssl/certs cp server.key /etc/ssl/private

Enable SSL for Apache

1
2
3
4
5

a2enmod ssl
a2ensite default-ssl
/etc/init.d/apache2 stop; sleep 2; /etc/init.d/apache2 start
# this last step is how I restart Apache.
# I don't trust the 'restart' option. There's probably other/better ways of doing this

2. SVN

Install SVN and it’s Apache module

1	apt-get install subversion libapache2-svn

Create a new folder for the code (we’ll call the folder ‘svn’):

1	mkdir /home/svn

Create the repository:

1	svnadmin create /home/svn

Tell Apache about the repository:

1	nano /etc/apache2/sites-available/default-ssl

This opens up the pretty simple nano editor. At the bottom of the file, before the final <VirtualHost>, add:

1
2
3
4
5
6
7
8

<location svn="">
DAV svn
SVNPath /home/svn
AuthType Basic
AuthName "Your repository name"
AuthUserFile /etc/subversion/passwd
Require valid-user
</location>

You may need to enable your SSL site, so if the files /etc/apache2/sites-enabled/000-default-ssl or /etc/apache2/sites-enabled/default-ssl don’t exist, do:

1	ln -s /etc/apache2/sites-available/default-ssl /etc/apache2/sites-enabled/000-default-ssl

For Apache to be able to read/write to the repository, we need to change it’s owner to www-data:

1	chown -R www-data:www-data /home/svn

Next, we need to add some login details for users, i.e. developers (you’ll be asked to enter a password):

1
2
3

htpasswd -c /etc/subversion/passwd user_name
# user_name should correspond with the username of some one you want to have access to the repository.
# The password entered can be different from their normal login password and is used to access the repository at all times.

For subsequent users, drop the -c flag above.

Restart Apache (however you want to do it). Following from above:

1	/etc/init.d/apache2; sleep 2; /etc/init.d/apache2 start

You should now be able to view the initial empty repository at http://server.locaton/svn where ‘server.location’ is either an IP address or a domain, depending on how you’ve set-up the server.

If you have an SVN dump of your repository and you want to load it into the new one, you can simply do:

1	svnadmin load --force-uid /home/svn > dumpfile

At this point, your SVN server should be up and running and ready to take commits. You may need to play around with the permissions of your /home/svn directories, making certain ones executable/writeable to Apache. If I’ve left anything else out, please let me know in the comments.