Phantom rewrite rule in Apache

7th March, 2015 - Posted by david

TL;DR The MultiViews option in Apache automatically will map e.g. /xfz/ to /xyz.php

I was recently creating a new section of the website I work for and decided to opt for tidy URLs, for SEO purposes, instead of our standard.long?url=format URLs that we have elsewhere. Let’s say the new section I was creating was called David’s Boxes, so I wanted to have relative URLs like /davids-boxes/big/blue map to davids-boxes.php?size=big&colour=blue. Purely co-incidentally, there happened to be a defunct davids-boxes folder in our www directory, which contained an old WordPress install, which I prompty deleted (more on this later). Then, I set up rewrite rules in our www/.htacess to do the example mapping above.

Everything was working fine locally: /davids-boxes/ matched to /davids-boxes.php and /davids-boxes/big/blue mapped to /davids-boxes.php?size=big&bolour=blue, all as expected. However, when I put the .htaccess file onto our test server, I couldn’t get the rules to match properly: everything mapped to the basic /davids-boxes.php, i.e. with no extra GET parameters. I tried different order of rules, moving the rules to the top of the .htaccess etc., but nothing worked. Then I simply deleted the rules from the .htaccess, expecting /davids-boxes/ not to map to anything, but it still strangely mapped to /davids.boxes.php as before. This led me to believe there was another rewrite rule somewhere else (a fact that was also helped by the previous WordPress install). Searching the entire codebase, which includes all ‘sub-‘.htaccess files, yielded no results, so then I began thinking it might be the server…

I had a look in our sites-available Apache configs, expecting there may be some sort of obvious generic rewrite to map any e.g. /xyz/ to xyz.php; no such luck. Going through each line in the config, I noticed we had the FollowSymLinks and MultiViews options enabled in the <Directory> tag. I was familiar with the former, but not the latter. Investigating into MultiViews, it turns out this was the thing doing the automatic mapping I was experiencing! The documentation states “if /some/dir has MultiViews enabled, and /some/dir/foo does not exist, then the server reads the directory looking for files named foo.*, and effectively fakes up a type map which names all those files”. Such relief to figure it out. I checked with our CTO, he didn’t know how it got there, so after removing it on testing and doing a quick test, we got rid of it everywhere and my problems were solved.

Read more...

Varnish with Apache on Ubuntu, with throttling, logging for multiple sub-sites, handling cookies and more

18th January, 2014 - Posted by david

One of the good things about working in a small company is that you’re more likely to be asked to do things that you’ve never done before and are way out of your comfort zone. Although I’m a programmer/web developer primarily, I like the whole infrastructure aspect of web development too and am always happy to learn how to get things working. Earlier last year I got a Beanstalkd queueing system up and running, which has proven to be a great success, so next up we wanted intermediary cache that could also handle throttling/rate limiting and after some research by myself, we settled on the excellent Varnish Cache. In work we run Apache as our webserver, which is the scope of this article, but it can easily run with Nginx too. I should also mention that the below applies to Varnish 3.0.5, some stuff might not work in earlier/later major versions.

The way varnish works is that it listens on port 80 (i.e. your webserver’s default port) for incoming HTTP connections and looks up it’s cache. If it’s a hit, it simply serves the request, whether that’s a HTML page, a JSON request, an image, static CSS/JS file etc. and the webserver is never touched. On a miss however, it passes the request on to your webserver via port 8080 (or whatever you choose), which processes it, gives the response back to Varnish to cache (for 2 minutes by default) and serve back to the client.

*** N.B. The way this guide works, you’ll need to restart Apache a couple of times. I wrote it this way so you can be sure each stage is working correctly. Ideally in a live environment, you’d have everything set-up correctly, then simply do one restart of Apache (to get it to listen on port 8080 instead of 80). At the end of this article I do show a way how to test everything before you do your Apache restart, i.e. while it’s still listening on port 80. ***

Initial Apache Set-up

So, first up we need to tell Apache to listen on port 8080. This is done by changing the first line in your websites sites-available to listen on port 8080, as follows:

1
2
# /etc/apache2/sites-available/default
<virtualhost *:8080="">

If you have multiple sites-available entries and want to serve them all from the same port, you’ll need to edit each of them. One thing you could do instance would be to store the port as a variable in your Apache configuration, then use the variable in your sites-available. I did this by storing a variable called VARNISH_PORT in /etc/apache/envvars, then using that in each of our sites-available, as follows:

1
2
# /etc/apache/envvars
export VARNISH_PORT=8080

and then

1
2
# /etc/apache2/sites-available/default
<virtualhost *:${varnish_port}="">

Additionally, you’ll want to change any reference to port 80 in /etc/apache/ports.conf to either 8080 or ${VARNISH_PORT}.

Varnish

So, next up we want to install Varnish, get it listening for connections on port 80, then forwarding to Apache on port 8080. installation is a simple apt-get:

1
sudo apt-get install varnish

Varnish’s 2 main configure files are /etc/default/varnish for boot-up options and /etc/varnish/default.vcl for actual configuration. To the former we say to listen for connections on port 80, allocate 256MB of memory to Varnish and a few other configuration options:

1
2
3
4
5
6
7
8
# /etc/default/varnish
DAEMON_OPTS="-a :80 \
           -T localhost:1234 \
           -f /etc/varnish/default.vcl \
           -S /etc/varnish/secret \
           -s malloc,256m"


# -a is listen on port 80, -T provides a web-based interface to Varnish at http://localhost:1234, -s is the amount of memory it can use

To the latter file, in order to tell Varnish to pass any cache misses on to Apache, running locally on port 8080, we add the following:

1
2
3
4
5
# /etc/varnish/default.vcl
backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

Now we’re ready to restart Apache and turn on Varnish (needs to be done in this order):

1
2
sudo service apache2 restart
sudo service varnish start

Now, if you go to a page on your site and look at the response headers for each request, you should see a Varnish and an Age header. Age will probably be set to 0 initially, but if you reload the page, it’ll have a higher value, i.e. it’s age in the cache. By default, Age shouldn’t go over 120, unless you tell Varnish to cache things for longer (see below).

Logging the correct IP address for Apache

With Varnish passing cache misses on to Apache, you’ll notice that entries in Apache’s access log will have 127.0.0.1 as their IP address. Ideally we’d like to store the originating client address, as we did before. We do this by installing an Apache module called RPAF, which gets Apache to use the X-Forwarded-For IP address when logging, instead of the normal one. We also need to set Varnish up to pass an X-Forwarded-For value on to Apache. Most of this bit I got from a post on theyusedtocallitablog.net.

Let’s start with Varnish. Add the following to your default.vcl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
sub vcl_pipe {
    set bereq.http.connection = "close";

    if (req.http.X-Forwarded-For) {
        set bereq.http.X-Forwarded-For = req.http.X-Forwarded-For;
    } else {
        set bereq.http.X-Forwarded-For = regsub(client.ip, ":.*", "");
    }
}

sub vcl_pass {
    set bereq.http.connection = "close";

    if (req.http.X-Forwarded-For) {
        set bereq.http.X-Forwarded-For = req.http.X-Forwarded-For;
    } else {
        set bereq.http.X-Forwarded-For = regsub(client.ip, ":.*", "");
    }
}

Next up we need to install RPAF (be careful, for me the install step restarted Apache without asking!):

1
2
3
4
sudo apt-get update
sudo apt-get install libapache2-mod-rpaf
sudo a2enmod rpaf
sudo apache2ctl graceful

You then need to add the RPAF module to either each sites-available or, if you want it to apply for all sites, you could add it to /etc/apache2/apache2.conf:

1
2
3
4
5
<ifmodule mod_rpaf.c=""> # if this doesn't work, try simply "rpaf.c"
    RPAFenable On
    RPAFsethostname On
    RPAFproxy_ips 127.0.0.1
</ifmodule>

Once again, you’ll need to restart Varnish (see above) to load the updated default.vcl. Now, if you look at requests coming into your Apache’s access.log file, you should see the correct IP address coming through.

Varnish logging

So, you’ll probably want to log any requests coming into Varnish too. This can be done using either varnishlog or varnishncsa. The latter program writes a log file similar to that of a webserver (i.e. to NCSA standard) and it’s the one we’ll be using here. The standard way to run it as a daemon, logging everything to the one file is:

1
2
3
sudo touch /var/log/varnish/access.log # make sure it exists!
# -D option below means 'run as a daemon'
sudo varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid

If you serve multiple sites from the one server and want to split the logs out into separate log files, you can have multiple instances of varnishncsa running and filter what gets logged via the incoming header. To do this, you could run the following (courtesy of linuxaria.com):

1
2
sudo vanishncsa -m "RxHeader:^Host: (www\.)?site1.com$" -a -w /var/log/varrnish/site1.access.log -D
sudo vanishncsa -m "RxHeader:^Host: (www\.)?site2.com$" -a -w /var/log/varrnish/site2.access.log -D

If you want to pipe the output of your logging to a program like cronolog, it’s a bit tricker. I wasn’t able to figure out how to do this with varishncsa running as a daemon, all I could get to work was to run it as a background process (which will stay running after you log out), something like:

1
sudo varnishncsa -m "RxHeader:^Host: (www\.)?site1.com$" | /usr/bin/cronolog /var/www/site1/logs/%Y/%m/varnish.access_%Y%m%d.log &

We should also really ensure that any logging scripts are started automatically on start-up, but that’s outside of the scope of this article. I’ll just say you do it by writing a script in /etc/init.d/ and call update-rc.d on it. A quick google should point you in the right direction.

Another thing you could so is integrate your varnishncsa logs with your Apache ones, so everything is in the one log file. You can tell what’s a Varnish log entry and what’s an Apache one by looking at the request: Apache’s will be something like GET /index.htm, while Varnish’s will be more like GET http://www.site1.com/index.htm.

Bypassing Varnish for certain pages/sub-sites

In certain instances, you might not want to use Varnish and always pass the request onto Apache. You can do this by adding some code to your default.vcl. In the following example, we’re going to skip the cache for the stats/ sub-section and the admin.site1.com sub-site:

1
2
3
4
5
6
7
8
# /etc/varnish/default.vcl
sub vcl_recv {
    if (req.url ~ "stats/" ||
        req.http.host ~ "admin.site1.com"
    ) {
        return (pass);
    }
}

Caching items for longer than the default

For certain files (especially static ones such as images) you might want to cache them for longer. This can be done easily, by adding the following to your sub vcl_recv block in /etc/varnish/default.vcl (or add a new one if you’ve skipped the step above!):

1
2
3
4
# /etc/varnish/default.vcl - in vcl_recv
if (req.url ~ ".(jpg|png|gif)+$") {
    set beresp.ttl = 3600s; # cache images for 1 hour
}

Varnish and Cookies

Varnish won’t cache any request coming in that contains cookies (it also won’t cache POST requests, but that’s understandable). However, often cookies are only used by the client and have no impact on the server, for example Google tracking cookies. There will obviously also be certain cookies that the server does need (e.g. a session cookie) that we’d want/need to keep and thus not cache the generated content. Fortunately there’s a workaround we can use. To only keep the cookies cookie1 and cookie2 but disregard all others, again in our default.vcl vcl_recv block, we can do:

1
2
3
4
5
6
7
8
9
10
11
12
# /etc/varnish/default.vcl - in vcl_recv
if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(cookie1|cookie2)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
        remove req.http.Cookie;
    }
}

There’s a couple of other examples on the original work around page you can check out in your own time.

Throttling with Varnish

Another great feature of Varnish is the ability to install various modules to enhance it’s functionality. One we needed to use in my work was throttling/rate-limiting, to thwart scrapers and prevent against basic DOS attacks. This was achieved via the libvmod-throttle module (there’s plenty of configuration options at the module’s main webpage). However, for me at least, installation wasn’t as straightforward as one would’ve hoped. For this part, I’ll go through the order I tried things in, the errors I received and their subsequent solutions.

Initially, I downloaded the package, unzipped it and read the README:

1
2
3
4
wget http://github.com/nand2/libvmod-throttle/archive/master.zip
unzip master.zip
cd libvmod-throttle-master/
sudo ./autogen.sh

(If you’re running xubuntu as opposed to ubuntu, you may get an error saying either automake or libtoolize is missing. For these you can simply sudo apt-get install automake or sudo apt-get install libtool (note, not “libtoolize“) and then you should be able to run autogen.sh)

In the README, you need to run a configure script, which also needs to be passed a directory containing the Varnish source. If you simply run ./configure, you’ll most likely get the following error:

1
configure: error: No Varnish source tree specified

So, to download the Varnish source and store it somewhere, I had to do the following (not all steps might be necessary):

1
2
3
sudo apt-get install dpkg-dev
sudo apt-get apt-get source varnish # creates a directory varnish-3.0.2 in the current directory
sudo mv varnish-3.0.2/ /usr/local/src/ # or wherever you want to move it to

Now that you have the Varnish source, you need to build it, otherwise you’ll get an error saying

1
Please build your varnish source directory

When I was building it, I got the error

1
No package 'libpcre' found

… so I had to install that too, which is included in the next step:

1
2
3
4
5
cd /usr/local/src/varnish-3.0.2
sudo ./autogen.sh
sudo apt-get install libpcre3 libpcr3-dev # may also need pkg-config
sudo ./configure
sudo make

Now we can try and install libvmod-throttle again (I had the source downloaded to my home directory, hence the first cd):

1
2
cd ~/libvmod-throttle-master/
sudo ./configure VARNISHSRC=/usr/local/src/varnish-3.0.2 VMODDIR=/var/lib/varnish

Hopefully, that will all run smoothly and you’ll have successfully installed the module. Now, to enable it for Varnish, add the following to the top of your default.vcl:

1
2
# /etc/varnish/default.vcl
import throttle;

If you restart Varnish now and don’t get any errors, such as something along the lines of “unknown module ‘throttle'” (not the exact error message), you’ll know everything is installed correctly. To enable it, add something like the following to the vcl_recv block in your default.vcl:

1
2
3
4
# /etc/varnish/default.vcl - in sub vcl_recv
if (throttle.is_allowed("ip:" + client.ip, "20req/30s") > 0) {
    error 429 "Too many requests";
}

More options can be found on the module’s github page.

Getting Varnish running and testing it before restarting Apache

So, as I mentioned above, in a live environment, you don’t want to be restarting your webserver that often and ideally we’d like to know that things are going to work before we do the restart. To do this, I initially set-up Varnish to run on port 8080 (the .port option in the backend default block in /etc/varnish.default.vcl) and to pass requests to Apache on port 80 (the -a option in /etc/defaukt/varnish), then access my website via http://www.mysite.com:8080. This way, you should still see the Varnish and Age headers, without disturbing Apache.

Conclusion

So, hopefully this guide will help you set-up Varnish server and configure it to work as you need. There’s loads of different things Varnish can do and the documentation is pretty good.

Read more...

How to set up an SVN server over HTTPS on Apache

2nd August, 2012 - Posted by david

So, I recently started a new job as Lead Developer on carsireland.ie and one of the first things I was tasked with was moving the codebase from a simple PC running Linux to the cloud, so that it could be accessed remotely, outside the office. Now, while I do prefer Git, SVN is still reasonably popular, especially with websites older than a few years, hence the CTO wanted to stick with it, for the time being at least! Needless to say, most of the following is best done as root, or at least with sudo privileges. Also, this is done on Ubuntu, hence the use of apt-get.

1. Setting up Apache for HTTPS

Apache was already running on the server, but it had to be enabled for HTTPS. Firstly You need to generate self-signed SSL certificates. You’ll be asked for a passphrase; enter one and note it down:

1
2
3
openssl genrsa -des3 -out server.key 2048
openssl req -new -key server.key -out server.csr
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

Move the certificates to somewhere that Apache expects to find it:

1
2
cp server.crt /etc/ssl/certs
cp server.key /etc/ssl/private

Enable SSL for Apache

1
2
3
4
5
a2enmod ssl
a2ensite default-ssl
/etc/init.d/apache2 stop; sleep 2; /etc/init.d/apache2 start
# this last step is how I restart Apache.
# I don't trust the 'restart' option. There's probably other/better ways of doing this

2. SVN

Install SVN and it’s Apache module

1
apt-get install subversion libapache2-svn

Create a new folder for the code (we’ll call the folder ‘svn’):

1
mkdir /home/svn

Create the repository:

1
svnadmin create /home/svn

Tell Apache about the repository:

1
nano /etc/apache2/sites-available/default-ssl

This opens up the pretty simple nano editor. At the bottom of the file, before the final <VirtualHost>, add:

1
2
3
4
5
6
7
8
<location svn="">
    DAV svn
    SVNPath /home/svn
    AuthType Basic
    AuthName "Your repository name"
    AuthUserFile /etc/subversion/passwd
    Require valid-user
</location>

You may need to enable your SSL site, so if the files /etc/apache2/sites-enabled/000-default-ssl or /etc/apache2/sites-enabled/default-ssl don’t exist, do:

1
ln -s /etc/apache2/sites-available/default-ssl /etc/apache2/sites-enabled/000-default-ssl

For Apache to be able to read/write to the repository, we need to change it’s owner to www-data:

1
chown -R www-data:www-data /home/svn

Next, we need to add some login details for users, i.e. developers (you’ll be asked to enter a password):

1
2
3
htpasswd -c /etc/subversion/passwd user_name
# user_name should correspond with the username of some one you want to have access to the repository.
# The password entered can be different from their normal login password and is used to access the repository at all times.

For subsequent users, drop the -c flag above.

Restart Apache (however you want to do it). Following from above:

1
/etc/init.d/apache2; sleep 2; /etc/init.d/apache2 start

You should now be able to view the initial empty repository at http://server.locaton/svn where ‘server.location’ is either an IP address or a domain, depending on how you’ve set-up the server.

If you have an SVN dump of your repository and you want to load it into the new one, you can simply do:

1
svnadmin load --force-uid /home/svn > dumpfile

At this point, your SVN server should be up and running and ready to take commits. You may need to play around with the permissions of your /home/svn directories, making certain ones executable/writeable to Apache. If I’ve left anything else out, please let me know in the comments.

Read more...