Here are some simple steps to tune MySQL for performance and scalability in multi-user environments. Also included are some very effective tools, most of which are packaged along with MySQL.
You’ve just heaved a sigh of relief after the latest release of your online system — but you can already see troubled times ahead, with performance issues cropping up after adding on just a few concurrent users. You don’t have time for elaborate load or scalability testing, since the user count will grow rapidly in the next few weeks. As the solutions architect, you know that you’ll get the maximum returns on your investment of effort by taming the database. Will you start scrambling for tools, hoping they will help you? Or will you simply decide to throw more powerful hardware at the system, to postpone solving the problem?
The approach we advocate is to tune your MySQL database system for the most efficient performance before you begin throwing new hardware at the problem, or begin redesigning the very solution itself. Let’s learn to use some very effective tools — most of which are included in your MySQL installation — to make this process a pleasure. We can start right away if you are ready to take the plunge.
Find the trouble-makers
Our context here is a production system with real users. Therefore, logging information related to the users’ access patterns, and the corresponding system performance, must be available. MySQL has different kinds of logs. One of these is the slow query log, which logs details of queries that have exceeded a specified threshold execution time. So here, we will use this slow query log to identify slow-running queries. The steps are as follows:
- Decide when to log: Usually, logs are considered detrimental to system performance, especially in a production scenario. However, since this information is vital for performance optimisation, you could enable logging only for a representative period of time — for example, peak business hours, when the maximum users are logged in, and there is a likelihood of performance problems surfacing.
- Enable the log and set the threshold value: Add the following statements to your MySQL configuration file (
/etc/my.cnf
in CentOS). The variablelong_query_time
defines the threshold time beyond which queries will be logged as slow, and thus controls the extent of logging. Beginning with version 5.1.21, you can have fine-grained control by specifying the threshold even in fractions of seconds! To effect the change, restart the MySQL instance, as follows:[mysqld]
slow_query_log=/tmp/slow_queries.log
long_query_time=1
Note: It is best you refer to the MySQL manual to determine the exact names of these variables, for your version of MySQL. Additionally, ensure that the destination folder for the log file already exists. - Determine candidate queries for tuning:If you examine a slow query log (see the sample below), the following information is displayed for each query that is logged:
- Query execution time
- Lock time
- Total rows examined
- Total rows sent
# Time: 101019 7:51:18
# User@Host: root[root] @ localhost []
#Query_time: 5.788721 Lock_time: 0.002104 Rows_sent: 52748 Rows_examined: 52748
SET timestamp=1287489078;
Given that these logs are very long, you will need to summarise the information in the log. For this, we recommend that you use the mysqldumpslow
utility that comes packaged with MySQL. It works on the slow query log file, and summarises the results, grouping queries based on similarity, and excluding differences in the values of number or string data. The command syntax is mysqldumpslow [options] [log_file
]
. For details on command-line options, please refer to the MySQL documentation on mysqldumpslow
.
Let’s now summarise the log, extracting the top 10 slow queries based on total execution time, which is the product of average query execution time, and the number of times the query was logged, as shown below:
mysqldumpslow -s t -t 10 /tmp/slowquery.log Few lines from the output Reading mysql slow query log from tmp/slow_queries.log Count: 998 Time=4.90s (4893s) Lock=0.12s (123s) Rows=52748.0 (52642504), root[root]@localhost |
In the above, -s
defines the sort order, and sorting can happen with six different parameters. Here we choose t
to specify the total execution time. Further, you should also sort on the following bases:
at
: average query time, to identify queries purely based on high query time.l
: lock time, to identify queries that have serious contentions or possibility of deadlocks.r
: number of records output, to tune at the single-request level the number of records fetched.
In our example, we listed the worst-performing queries in terms of their execution time, and stored the output in a file named myworst10.txt
, which can be used for subsequent analysis.
Fix them individually
Now that you have a list of the top 10 queries to be optimised, the next step is to tune these queries individually. One part of this task is to run the queries through a profiler that will list the breakup of query execution time, and indicate the hot spots. The second part is fixing this. Here we will focus on the first part, as the methods used to fix queries is a separate subject in itself, and there are plenty of valuable references on the topic (such as, MySQL Administrator’s Bible, by Sheeri K. Cabral & Keith Murphy, John Wiley & Sons, May 11, 2009, ISBN: 978-0-470-41691-4).
For profiling, we recommend mk-query-profiler
, one of the most versatile tools provided by maatkit.org. mk-query-profiler
can read, as input, the file we created in the earlier step, myworst10.txt
, and execute the queries in it with analysis, for output as shown in the example below (for one query):
+----------------------------------------------------------+ | 1 (2.4254 sec) | +----------------------------------------------------------+ __ Overall stats _______________________ Value _____________ Total elapsed time 2.425 Questions 1 COMMIT 0 DELETE 0 DELETE MULTI 0 INSERT 0 INSERT SELECT 0 REPLACE 0 REPLACE SELECT 0 SELECT 1 UPDATE 0 UPDATE MULTI 0 Data into server 60 Data out of server 2080902 Optimizer cost 11042.127 __ Table and index accesses ____________ Value _____________ Table locks acquired 1 Table scans 1 Join 0 Index range scans 0 Join without check 0 Join with check 0 Rows sorted 0 Range sorts 0 Merge passes 0 Table scans 0 Potential filesorts 0 |
You will see that there is a lot of information that will help you to fix the query. In the simple example given above, you see a really high Optimiser cost. In our case, by making the necessary changes to use the right index, and by further narrowing the search results, we were able to optimise the query, and thus bring down the Optimizer cost.
Fix them to work under stress
The behaviour of queries can change with load. There are several system load-testing tools available in the market, but in our case we want to load-test only the database. MySQL version 5.1.4 and above comes with mysqlslap
, a utility that’s very useful for load testing. Its advantages are that it is very simple to use, and can be used to connect to and monitor a remote host running on any OS, or a MySQL instance that may not include mysqlslap
.
It provides options to create multiple client threads to load-test the server with a query, and it can iterate several times. It can also read an input file containing queries, and test each of them with the specified user load. Using shell scripts, you can also run it in multiple parallel processes. The important command-line options for load testing are:
--query
(or-q
): the query to be tested, or a file name if it is in a file.--concurrency
(or-c
): the number of client threads to be created. The upper limit is themax_connections
variable set in the server’smy.cnf
file.--iterations
(or-i
): the number of test runs to be executed by each client thread.--create-schema
: a schema is created for its work, on the machine that is being monitored.
For a more detailed command-line argument reference, check the official documentation.
Here, we demonstrate load-testing two queries that are alternatives for a given objective. We test each of the queries with load levels of 5, 20, 50, 75, 100 and 200. The command below describes the load testing of Query 1, stored in the file query1.txt
, with 50 client threads, and 100 iterations. The schema test is created in the local machine.
mysqlslap --concurrency=50 --iterations=100 --query=/home/myftp/query1.txt --create-schema=test Benchmark Average number of seconds to run all queries: 0.257 seconds Minimum number of seconds to run all queries: 0.044 seconds Maximum number of seconds to run all queries: 0.299 seconds Number of clients running queries: 50 Average number of queries per client: 1 |
You can see from the results, and in the graph in Figure 1, that query2 is more scalable when compared to query1.
Client threads | query1 response time | query2 response time | ||||
Avg | Min | Max | Avg | Min | Max | |
5 | 0.003 | 0.003 | 0.005 | 0 | 0 | 0.001 |
20 | 0.083 | 0.012 | 0.116 | 0.002 | 0.002 | 0.003 |
50 | 0.257 | 0.044 | 0.229 | 0.031 | 0.005 | 0.074 |
75 | 0.414 | 0.179 | 0.543 | 0.071 | 0.012 | 0.108 |
100 | 0.527 | 0.387 | 0.567 | 0.1 | 0.015 | 0.14 |
200 | 1.134 | 1.074 | 1.176 | 0.244 | 0.029 | 0.292 |
I would like to know whether there is any upper limit on the number of rows that can afford a Mysql table with about 50 fields without crash?