Home » How can I optimize a mysqldump of a large database?

How can I optimize a mysqldump of a large database?

Solutons:


The main bottleneck in the dump like this is drive I/O. You are reading a load of data and writing it again. You can speed this up in a number of ways:

  • Make sure your output is going to a different drive(s) than the one(s) the database files are stored on – this will make a massive difference with spinning disks as the drive heads will not be constantly flicking between the location being read from and the location being written to.
  • The output of mysqldump will be very compressible, so if you can not separate the output from the input as mentioned above pipe the output through gzip or similar. This will reduce the amount of writing being done (so reduce the overall IO load, and the amount of head movement) at the expense of some CPU time (which you may have a lot of spare at these times anyway).
  • Also, (as well or instead of compression) pass the output through a pipe utility (like pv) that supports large write buffers to group blocks written to the drives together more, again to reduce the effect of head-movement latency – this will make quite a difference if using the --quick option to reduce the RAM impact of backing up large tables).
  • Only run your backup process when IO load is otherwise low.

You may be fixing the wrong issue though: it might be easier to address the connection drops instead (though reducing the I/O load imposed by your backups will help reduce the effect you have on other users so is worth trying anyway). Could you run your manual backups through screen (or similar tools like tmux)? That way if your connection to the server drops you can just reconnect and reattach to the screen session without any processes getting interrupted.

If you are sending the data directly over the connection (i.e. you are running mysqldump on your local machine against a remote database, so the dump appears locally) you might be better off running the dump on the server first, compressing as needed, then transferring the data over the network using a tool (such as rsync) which supports partial transfers so you can resume the transfer (instead of restarting) if a connection drop interrupts it.

As part of your “reducing the size of the overall database to resolve this issue” I would guess that a large chunk of your data does not change. You might be able to move a large chunk of the 1.2Gb from that main table off into another and remove that from those that are copied by the mysqldump call. You don’t need to backup this data every time if it never changes. Splitting data between tables and databases this way is usually referred to as data partitioning and can also allow you to spread the data and I/O load over multiple drives. High-end database have built in support for automatic partitioning, though in mysql you will probably have to do it manually and alter your data access layer to account for it.

Straying off-topic for this site (so you should probably nip over to ServerFault or SuperUser to ask if you need more detail): If you seem to be losing connections due to inactivity, check the options in your SSH server and SSH client to make sure keep-alive packets are enabled and being sent often enough. If seeing drops even if the connection is active you could also try using OpenVPN or similar to wrap the connection – it should handle a short drop, even a complete drop if your entire connection is down for a few seconds, such that the SSH client and server don’t notice.

INSIGHT INTO DOING BACKUPS WITH mysqldump

IMHO Doing backups has become more of an art form if you know just how to approach it

You have options

Option 1 : mysqldump an entire mysql instance

This is the easiest one, the no-brainer !!!

mysqldump -h... -u... -p... --hex-blob --routines --triggers --all-databases | gzip > MySQLData.sql.gz

Everything written in one file: table structures, indexes, triggers, stored procedures, users, encrypted passwords. Other mysqldump options can also export different styles of INSERT commands, log file and position coordinates from binary logs, database creation options, partial data (–where option), and so forth.

Option 2 : mysqldump separate databases into separate data files

Start by creating a list of databases (2 techniques to do this)

Technique 1

mysql -h... -u... -p... -A --skip-column-names -e"SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema','mysql')" > ListOfDatabases.txt

Technique 2

mysql -h... -u... -p... -A --skip-column-names -e"SELECT DISTINCT table_schema FROM information_schema.tables WHERE table_schema NOT IN ('information_schema','mysql')" > ListOfDatabases.txt

Technique 1 is the fastest way. Technique 2 is the surest and safest. Technique 2 is better because, sometimes, users create folders for general purposes in /var/lib/mysql (datadir) which are not database related. The information_schema would register the folder as a database in the information_schema.schemata table. Technique 2 would bypass folders that do not contain mysql data.

Once you compile the list of databases, you can proceed to loop through the list and mysqldump them, even in parallel if so desired.

for DB in `cat ListOfDatabases.txt`
do
    mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz &
done
wait

If there are too many databases to launch at one time, parallel dump them 10 at a time:

COMMIT_COUNT=0
COMMIT_LIMIT=10
for DB in `cat ListOfDatabases.txt`
do
    mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz &
    (( COMMIT_COUNT++ ))
    if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
    then
        COMMIT_COUNT=0
        wait
    fi
done
if [ ${COMMIT_COUNT} -gt 0 ]
then
    wait
fi

Option 3 : mysqldump separate tables into separate data files

Start by creating a list of tables

mysql -h... -u... -p... -A --skip-column-names -e"SELECT CONCAT(table_schema,'.',table_name) FROM information_schema.tables WHERE table_schema NOT IN ('information_schema','mysql')" > ListOfTables.txt

Then dump all tables in groups of 10

COMMIT_COUNT=0
COMMIT_LIMIT=10
for DBTB in `cat ListOfTables.txt`
do
    DB=`echo ${DBTB} | sed 's/./ /g' | awk '{print $1}'`
    TB=`echo ${DBTB} | sed 's/./ /g' | awk '{print $2}'`
    mysqldump -h... -u... -p... --hex-blob --triggers ${DB} ${TB} | gzip > ${DB}_${TB}.sql.gz &
    (( COMMIT_COUNT++ ))
    if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
    then
        COMMIT_COUNT=0
        wait
    fi
done
if [ ${COMMIT_COUNT} -gt 0 ]
then
    wait
fi

Option 4 : USE YOUR IMAGINATION

Try variations of the aforementioned Options plus techniques for clean snapshots

Examples

  1. Order the list of tables by the size of each tables ascending or descending.
  2. Using separate process, run “FLUSH TABLES WITH READ LOCK; SELECT SLEEP(86400)” before launching mysqldumps. Kill this process after mysqldumps are complete. This is helpful if a database contains both InnoDB and MyISAM
  3. Save the mysqldumps in dated folders and rotate out old backup folders.
  4. Load whole instance mysqldumps into standalone servers.

CAVEAT

Only Option 1 brings everything. The drawback is that mysqldumps created this way can only be reloaded into the same majot release version of mysql that the mysqldump was generated. In other words, a mysqldump from a MySQL 5.0 database cannot be loaded in 5.1 or 5.5. The reason ? The mysql schema is total different among major releases.

Options 2 and 3 do not include saving usernames and passwords.

Here is the generic way to dump the SQL Grants for users that is readble and more portable

mysql -h... -u... -p... --skip-column-names -A -e"SELECT CONCAT('SHOW GRANTS FOR ''',user,'''@''',host,''';') FROM mysql.user WHERE user<>''" | mysql -h... -u... -p... --skip-column-names -A | sed 's/$/;/g' > MySQLGrants.sql

Option 3 does not save the stored procedures, so you can do the following

mysqldump -h... -u... -p... --no-data --no-create-info --routines > MySQLStoredProcedures.sql &

Another point that should be noted is concerning InnoDB. If your have a large InnoDB buffer pool, it makes sense to flush it as best you can before performing any backups. Otherwise, MySQL spends the time flushing tables with leftover dirty page out of the buffer pool. Here is what I suggest:

About 1 hour before performing the backup run this SQL command

SET GLOBAL innodb_max_dirty_pages_pct = 0;

In MySQL 5.5 default innodb_max_dirty_pages_pct is 75. In MySQL 5.1 and back, default innodb_max_dirty_pages_pct is 90. By setting innodb_max_dirty_pages_pct to 0, this will hasten the flushing of dirty pages to disk. This will prevent or at least lessen the impact of cleaning up any incomplete two-phase commits of InnoDB data prior to performing any mysqldump against any InnoDB tables.

FINAL WORD ON mysqldump

Most people shy away from mysqldump in favor of other tools and those tools are indeed good.

Such tools include

  1. MAATKIT (parallel dump/restore scripts, from Percona [Deprecated but great])
  2. XtraBackup (TopNotch Snapshot Backup from Percona)
  3. CDP R1Soft (MySQL Module Option that takes point-in-time snapshots)
  4. MySQL Enterprise Backup (formerly InnoDB Hot Backups [commercial])

If you have the spirit of a true MySQL DBA, you can embrace mysqldump and have the complete mastery over it that can be attained. May all your backups be a reflection of your skills as a MySQL DBA.

Plan A:
See also Xtrabackup from Percona. This allows online backup of InnoDB, without any significant locking.

Plan B:
A Slave can be stopped, and you can take a consistent backup by any of several means (copy files, mysqldump, xtrabackup, etc)

Plan C:
LVM Snapshot. After some cryptic setup, the downtime for a backup is less than a minute, regardless of the size of the database. You stop mysqld, do the snapshot, restart mysqld, then copy the snapshot. The last step can take a long time, but MySQL is not down.

Plan D: Snapshot of a Slave — zero downtime.

Related Solutions

Did I just get hacked?

EDIT 2: there is one good reason why this post is attracting so much attention: you managed to record the whole, live session of an intruder on your PC. This is very different from our everyday experience, where we deal with the discovery of the consequences of...

How to delete a non-empty directory in Terminal?

Use the below command : rm -rf lampp It deletes all files and folders contained in the lampp directory. In case user doesn't have the permission to delete the folder: Add sudo at the beginning of the command : sudo rm -rf folderName Otherwise, without sudo you...

mysql: Show GRANTs for all users

Nothing built-in. You have two options though: Use common_schema's sql_show_grants view. For example, you can query: SELECT sql_grants FROM common_schema.sql_show_grants; Or you can query for particular users, for example: SELECT sql_grants FROM...

Configurable values to MDB annotations

You can externalise the annotations into the ejb-jar.xml that you deploy in the META-INF of your jar file as follows: <?xml version="1.0" encoding="UTF-8"?> <ejb-jar version="3.0"> <enterprise-beans> <message-driven>...

How do I select which Apache MPM to use?

There are a number of MPM modules (Multi-Processing Modules), but by far the most widely used (at least on *nix platforms) are the three main ones: prefork, worker, and event. Essentially, they represent the evolution of the Apache web server, and the different...

Using var self = this or .bind(this)? [closed]

Things that would favor var self = this; bind isn't supported in IE8 and Safari5. If you aim to build a library or code that supports legacy browsers, then var self = this would be more cross-browser friendly. Sometimes, callbacks are bound to a certain context...

What is the difference between SSL vs SSH? Which is more secure?

SSL and SSH both provide the cryptographic elements to build a tunnel for confidential data transport with checked integrity. For that part, they use similar techniques, and may suffer from the same kind of attacks, so they should provide similar security (i.e....

How can I stop applications and services from running?

First Things First You may have some misconceptions about how Android works and what's really happening when a service is running or an app is in the background. See also: Do I really need to install a task manager? Most apps (e.g., ones you launch manually)...

How do I reset a lost administrative password?

By default the first user's account is an administrative account, so if the UI is prompting you for a password it's probably that person's user password. If the user doesn't remember their password you need to reset it. To do this you need to boot into recovery...

How can I use environment variables in Nginx.conf

From the official Nginx docker file: Using environment variables in nginx configuration: Out-of-the-box, Nginx doesn't support using environment variables inside most configuration blocks. But envsubst may be used as a workaround if you need to generate your...

Difference between .bashrc and .bash_profile

Traditionally, when you log into a Unix system, the system would start one program for you. That program is a shell, i.e., a program designed to start other programs. It's a command line shell: you start another program by typing its name. The default shell, a...

Custom query with Castle ActiveRecord

In this case what you want is HqlBasedQuery. Your query will be a projection, so what you'll get back will be an ArrayList of tuples containing the results (the content of each element of the ArrayList will depend on the query, but for more than one value will...

What is the “You have new mail” message in Linux/UNIX?

Where is this mail? It's likely to be in the spool file: /var/mail/$USER or /var/spool/mail/$USER are the most common locations on Linux and BSD. (Other locations are possible – check if $MAIL is set – but by default, the system only informs you about...

How can I find the implementations of Linux kernel system calls?

System calls aren't handled like regular function calls. It takes special code to make the transition from user space to kernel space, basically a bit of inline assembly code injected into your program at the call site. The kernel side code that "catches" the...

Is a composite index also good for queries on the first field?

It certainly is. We discussed that in great detail under this related question: Working of indexes in PostgreSQL Space is allocated in multiples of MAXALIGN, which is typically 8 bytes on a 64-bit OS or (much less common) 4 bytes on a 32-bit OS. If you are not...