The Comperhensive LAMP Guide, Part-1: Compiling and Optimising Apache

9
5854
Bible

Bible

These days, almost everyone needs a website. Online business is picking up, and an Internet presence is a must for firms wanting to expand their business. If you’re the freelance Web designer or server administrator for a small-scale business, you need to manage every bit of the site. Assuming you have a server (virtual or physical) running Linux, this series of articles shows you how to install and configure the popular LAMP (Linux, Apache, MySQL and PHP) stack on it. We begin by dealing with Apache.

So what does this article have that the billion others on Google don’t? Optimisations.

Most tutorials do not include optimisations, especially compiler optimisations, which increase program efficiency greatly. Note that you won’t see the effect of these optimisations unless your server handles over 100 requests per second.

Caveat: Compiler optimisations are not guaranteed to work, but they do in most cases. I have been using them in my setups for more than three years without any problems.
Again, the compiler optimisations I use are considered very bad/buggy according to articles on the Internet — but who cares? We want results!

The optimisations used here are highly dependent on the CPU, RAM and CPU cache of the system for which the application is being compiled. gcc -O3 increases the binary size. This requires a large CPU cache (at least larger than what is required by a binary compiled with -O2). If the cache is small, the application is moved between the CPU cache and DRAM, resulting in application slow down.

We won’t be using any of the default/official packages for Apache, MySQL or PHP. Those are optimised for general processors, and do not perform well when compiled specifically for your processor.

In this article, we will not cover the installation of Linux itself, since there are many flavours that are used on servers; your server may have one preinstalled. If not, I assume that you know how to deal with installing Linux and any extra packages, when required.

Installation of Apache (httpd)

As mentioned earlier, there are two things that can be tuned in httpd: compiler optimisation and atomic operations. As far as I know, only httpd in the LAMP package has this atomic operations feature, which is used for light-weight thread synchronisation (which must be supported by the CPU, and is the case for most newer processors).

Dowload the httpd source tarball from the official website. Next, check for the integrity of the downloaded file with the source checksum from Apache’s official mirror, ensuring the code doesn’t contain modifications not known to Apache developers.

Compilation settings in the environment

The compilation method for almost every C program distributed for the Linux family is to use GNU autotools. Yes, there are some variations — in our case, the latest release of MySQL uses cmake — but we’ll deal with that in the article on installing MySQL. Usually, cmake is used for applications written in C++, but is not constrained to these. The configure script generated when developers use GNU autotools recognises certain environment variables that need to be passed to relevant tools: the compiler/linker, etc.

The CFLAGS environment variable can contain extra options that need to be passed to the C compiler. We set this as follows:

$ export CFLAGS="-O3 -march=native -mtune=native"

Configuration

$ ./configure --enable-mods-shared="all ssl cache proxy authn_alias mem_cache file_cache charset_lite dav_lock disk_cache" --enable-nonportable-atomics=yes

Here, we run the configure script that’s distributed with the source tree in the tarball, to configure the source tree. The --enable-mods-shared=... parameter is to compile the listed modules as Dynamic Shared Objects (DSOs), so that they can be enabled and disabled at will, after installation, by simply adding/removing a line in the configuration file.

The --enable-mods-shared=all doesn’t actually do this for all modules; some are left out, and I have named those here.

In the above command, I have omitted two modules, ldap and authnz_ldap, which enable LDAP-based authentication. You should add the --with-ldap flag to the command in case you add these to the module list.

Once the configure script has run successfully (after eliminating any errors), proceed to compilation.

Compilation

make

If you have a multi-core processor — a dual-core or more — you can speed up the compilation by running it in parallel; instead of just running make, run make -jN, where N is the number of cores you have.

Installation

The last step, installation (copying the compiled binaries to their respective locations), is again made easy by the make tool:

$ sudo make install

…or,

$ su -c 'make install'

Installing files to their default locations, that is, /usr/local/apache2, will require root permissions. If you have sudo (installed on every Ubuntu machine) you can use that, or if not, you will have to use su -c, in which case you need to know the root password.

You can issue the above command without sudo or su -c in case you supplied a path to the configure script (--prefix=PATH) that is writeable by your account.

So, now that you have installed Apache, let’s move on to tweaking its configuration.

Apache configuration files

Apache uses a specific configuration file syntax, commonly found in many Linux applications.

The syntax includes certain directives, like EnableSendFile On, and some sections that resemble a kind of mark-up language, like what follows:

<Directory /path/to/directory>

# Directory configuration

</Directory>

The # denotes a comment, like in shell scripts.

Note: Assuming you have done the default configuration, i.e., with --prefix=/usr/local/apache2 (which is the default, if you don’t specify this option), Apache places your configuration files in /usr/local/apache2/conf. If you passed any custom path prefix to the configure script, or a custom location for etcdir, then your configuration files will be in the conf subdirectory of your custom prefix path, or in the custom etcdir location you’ve given. We will assume the default location.

Apart from the main configuration files in /usr/local/apache2/conf, it is possible to have directory-specific configuration placed inside the directories themselves. These do not contain a large amount of configuration, but are quite useful in case you want to change one single setting for one directory, when it makes no sense to add a <Directory> section to the main configuration file and restart/reload Apache. These files are named .htaccess, and it is possible to allow/disallow usage of one such file.

Here, I’ll discuss only the main aspects of the configuration file — the important performance- and security-related options, along with certain basic ones. For all other options, you can find comprehensive documentation at the official Apache documentation website.

Directives, their usage and effects

Listen

This directive tells Apache which ports and IP addresses to listen on for requests. Many people miss this during configuration of virtual hosts, and particularly SSL hosts. Unnecessarily adding Listen directives will cause Apache to listen on those ports, causing a security threat.

The syntax is:

Listen [<ip-address>]:<port>

For example:

Listen 80

Listen 11.22.33.44:80

User and Group

These two control which user and group Apache runs as. This is very important for the security of the server. If you tell Apache to run as the root user and group, you are giving the Web server the power of the systems administrator! It makes no sense to do that, unless you are a developer and are testing something. Never do this.

The ideal value for user and group is www:www. With this, you can control which files Apache can write to, which files it has read-only access to, and so on, without ever needing to do a chmod 777! Example:

User www

Group www

Timeout

This is the time period that Apache should wait before dropping a request due to a very long send or receive operation.

For a more precise definition (taken from the Apache manual), Timeout controls the following:

  • The total time Apache takes to receive a GET request.
  • The amount of time between the receipts of TCP packets on a POST or PUT requests.
  • The amount of time between ACKs on transmissions of TCP packets, in responses.

The default value for Timeout is 300 seconds (specified as Timeout 300). At the time of writing, there’s no way to configure separate timeouts for each element mentioned above.

KeepAlive

This option specifies if Apache should allow multiple requests per connection (i.e., persistent connections, when KeepAlive On is specified). This should be kept on for performance reasons; otherwise, every time a new request arrives (multiple requests from the same client), Apache creates a new connection, which has a lot of overhead.

As a simple analogy, consider a group of 100 people waiting to enter through the door of some institution. If the watchman closes and opens the door once for each person, it would take much more time and effort, compared to letting all 100 people enter at one time.

MaxKeepAliveRequests

This option specifies the maximum number of requests that will be permitted on a single connection (if enabled using KeepAlive), before Apache closes the connection and initiates a new one for a new request from the same client. This number is better left high, for performance reasons. The default value is 0, which means “unlimited”.  For example, you could specify MaxKeepAliveRequests 1000.

KeepAliveTimeout

This option specifies the maximum time (in seconds) that Apache should wait for a subsequent request from the same client, on an existing persistent connection, before freeing up the process for another client. Continuing with our simple analogy, this is the maximum time the watchman should keep the gate open, waiting for the arrival of another person who also wishes to enter.

This value should be between 5 and 15. Setting it to less than 5 makes no sense, since a browser could easily request another file from the server within 5 seconds; setting it higher than 15 seconds would unnecessarily tie up an Apache process, which could be used in handling another client.

Example:

KeepAliveTimeout 5

AccessFileName

This option specifies the name of the file that must be read, if present in a directory, to apply directory-specific options. As mentioned earlier, the name is usually .htaccess (i.e., AccessFileName .htaccess).

There are performance considerations with the use of these configuration files. If you have set any of the options permitted by the AllowOverride directive (see below) in the relevant <Directory> section, Apache will check for .htaccess files in every directory from the top-most specified in <Directory> up to the last directory (in which the requested file resides).

This causes a lot of overhead, and is not recommended for a server receiving a large number of requests per second. A workaround, if you really want the .htaccess feature (as is the case with shared hosts), is to have a specific <Directory> section where all your documents reside, instead of enabling that option in the parent directory.

AllowOverride

This directive lets you permit certain configuration options in files of the name given in AccessFileName, according to the values specified in this directive.  It can appear only in <Directory> sections specified without regular expressions (not containing ~), and nowhere else.

The values it takes, and what it does, is given in the following table, which is taken from the official Apache documentation at here.

AllowOverride directive values
Directive value Effect
AuthConfig Allows directives related to authentication and authorisation
FileInfo Allows directives controlling document types
Indexes Allows directives controlling directory indexing, i.e., listing of files when a directory is requested
Limit Permits the use of Allow, Deny and Order directives
Options[=Option,…] Allows directives controlling specific directory features

HostnameLookups

Enabling this option gives you hostnames in the access log, instead of just IP addresses. This is a big performance hit, since every time a client initiates a connection, Apache will send a request to the DNS server to convert the IP address of the client to a host-name.

Disable this (HostnameLookups Off). If you want hostnames in the log file, use the logresolve utility provided with Apache instead, to resolve IP addresses when you are reading the log.

Configuration sections

As I told you earlier, the configuration file contains certain sections that resemble HTML markup. Here, I list and explain them.

IfModule

This checks if a particular module is loaded into memory (enabled) or not. It basically tells Apache to parse the configuration inside the section only if the module is loaded; else, to skip it. For example:

<IfModule fcgid_module>
AddHandler fcgid-script .php
</IfModule>
<IfModule !fcgid_module>
<IfModule fastcgi_module>
AddHandler fastcgi-script .php
</IfModule>
</IfModule>

IfModule sections can be nested, as seen above. The bang in <IfModule !module-identifier> is used to negate — i.e., include the configuration section if the module is not loaded.

IfDefine

Checks if a particular parameter was defined while starting Apache, and is usually used to load modules according to the startup command, eliminating the need to modify configuration files every now and then. For example:

<IfDefine Rewrite>
LoadModule modules/mod_rewrite.so
</IfDefine>

In this example, the rewrite module will be loaded if Apache was launched with the command:

apachectl -D Rewrite start

……or:

httpd -D Rewrite -k start

Directory, Files, FilesMatch, Location, and LocationMatch

These directives are used to control configuration related to the particular elements — Directory, Files, and Location. The difference between them is that Directory will match a physical directory but not a symlink (symbolic link). Files will also match symlinks.

Location has no limitations — it will match a file, location, symlink, alias, etc.

The regular-expression variants for these have ‘Match’ appended to the directive. FilesMatch is the regular-expression variant of Files, and so on. It is possible to use simple shell wildcards like * and ? in the non regular-expression variants, as follows:

<Files ~ .*>
# configuration
</Files>

There is no need to use the regular-expression variants to match shell wildcards, since those variants have more processing overheads, and slow down Apache.

The most important directive in relation to this is the Options directive. It controls what features should be enabled, disabled, permitted or not permitted for all the elements matching the regex (files or directories, as applicable). Read the Apache official documentation for more information on this.

In the next part, we’ll cover MySQL.

9 COMMENTS

  1. I would not recommend compiling Apache on your own. Don’t bother about Optimisations. You could always double or quadruple performance by clustering Apache servers.

    The problem I see with your method is maintenance. A web server needs to be secure. Let the distribution vendor handle compiling Apache and all the dependencies (modules, php, etc etc). How many of the busy sysadmins have time to recompile all the apache and dependencies with each vuln or patch released?

    •  There is a distro called Gentoo which takes care of that. I use Gentoo
      on all my servers. Also, adding more servers means increasing cost.

LEAVE A REPLY

Please enter your comment!
Please enter your name here