MySQL Performance Blog

Syndicate content
Percona's MySQL & InnoDB performance and scalability blog
Updated: 1 hour 2 min ago

Galera replication – how to recover a PXC cluster

Mon, 01/09/2014 - 17:00

Galera replication for MySQL brings not only the new, great features to our ecosystem, but also introduces completely new maintenance techniques. Are you concerned about adding such new complexity to your MySQL environment? Perhaps that concern is unnecessarily.

I am going to present here some simple tips that hopefully will let fresh Galera users prevent headaches when there is the need to recover part or a whole cluster in certain situations. I used Percona XtraDB Cluster (project based on Percona Server and Galera library + MySQL extensions from Codership) to prepare this post, but most if not all of the scenarios should also apply to any solution based on MySQL+Galera tandem you actually chose, whether these are binaries from Codership, MariaDB Galera Cluster or maybe your own builds.

Unlike standard MySQL replication, a PXC cluster acts like one logical entity, which takes care about each node status and consistency as well as cluster status as a whole. This allows to maintain much better data integrity then you may expect from traditional asynchronous replication while allowing safe writes on multiple nodes in the same time. This is though for the price of more possible scenarios where database service will be stopped with no node being able to serve requests.

Lets assume the simplest case cluster of nodes A, B and C and few possible scenarios where some or all nodes are out of service. What may happen and what we have to do, to bring them (or whole cluster) back up.

Scenario 1

Node A is gracefully stopped. Likely for the purpose of maintenance, configuration change, etc.
In this case the other nodes receive “good bye” message from that node, hence the cluster size is reduced and some properties like quorum calculation or auto increment are automatically changed. Once we start the A node again, it will join the cluster based on it’s wsrep_cluster_address setting in my.cnf. This process is much different from normal replication – the joiner node won’t serve any requests until it is again fully synchronized with the cluster, so connecting to it’s peers isn’t enough, state transfer must succeed first. If the writeset cache (gcache.size), on nodes B and/or C has still all the transactions there were executed during the time this node was down, joining will be possible via (usually fast and light) IST. Otherwise, full SST will be needed, which in fact is full binary data snapshot copy. Hence it may be important here to determine the best donor, as shown in this article. If IST is impossible due to missing transactions in donor’s gcache, the fallback decision is made by the donor and SST is started automatically instead.

Scenario 2

Nodes A and B are gracefully stopped. Similar to previous case, cluster size is reduced to 1, hence even the single remaining node C forms a primary component and is serving client requests. To get the nodes back into the cluster, you just need to start them. However, the node C will be switched to “Donor/Desynced” state as it will have to provide state transfer to at least first joining node. It is still possible to read/write to it during that process, but it may be much slower, depending how large state transfers it needs to send. Also some load balancers may consider the donor node as not operational and remove it from the pool. So it is best to avoid situation when only one node is up.

Note though, if you restart A and then B in that order, you may want to make sure B won’t use A as state transfer donor, as A may not have all the needed writesets in it’s gcache. So just specify the C node as donor this way (“nodeC” name is the one you specify with wsrep_node_name variable):

service mysql start --wsrep_sst_donor=nodeC

Scenario 3

All three nodes are gracefully stopped. Cluster is deformed. In this case, the problem is how to initialize it again. Here, it is important to know, that during clean shutdown, a PXC node writes it’s last executed position into the grastate.dat file. By comparing the seqno number inside, you will see which node is the most advanced one (most likely the last one stopped). Cluster must be bootstrapped using this node, otherwise nodes that had more advanced position will have to perform full SST to join cluster initialized from the less advanced one (and some transactions will be lost). To bootstrap the first node, invoke the startup script like this:

/etc/init.d/mysql bootstrap-pxc

or

service mysql bootstrap-pxc

or

service mysql start --wsrep_new_cluster

or

service mysql start --wsrep-cluster-address="gcomm://"

or in packages using systemd service manager (Centos7 at the moment):

systemctl start mysql@bootstrap.service

In older PXC versions, to bootstrap cluster, you had to edit my.cnf and replace previous wsrep_cluster_address line with empty value like this: wsrep_cluster_address=gcomm:// and start mysql normally. More details to be found here.

Scenario 4

Node A disappears from the cluster. By disappear I mean power outage, hardware failure, kernel panic, mysqld crash, kill -9 on mysqld pid, OOMkiller, etc. Two remaining nodes notice the connection to A node is down and will be trying to re-connect to it. After some timeouts, both agree that node A is really down and remove it “officially” from the cluster. Quorum is saved ( 2 out of 3 nodes are up), so no service disruption happens. After restarting, A will join automatically the same way as in scenario 1.

Scenario 5

Nodes A and B disappear. The node C is not able to form the quorum alone, so the cluster is switching into a non-primary mode, in which MySQL refuses to serve any SQL query. In this state, mysqld process on C will be still running, you can connect to it, but any statement related to data fails with:

mysql> select * from test.t1;
ERROR 1047 (08S01): Unknown command

Actually reads will be possible for a moment until C decides that it cannot reach A and B, but immediately no new writes will be allowed thanks to the certification based replication in Galera. This is what we are going to see in the remaining node’s log:

140814 0:42:13 [Note] WSREP: commit failed for reason: 3
140814 0:42:13 [Note] WSREP: conflict state: 0
140814 0:42:13 [Note] WSREP: cluster conflict due to certification failure for threads:
140814 0:42:13 [Note] WSREP: Victim thread:
THD: 7, mode: local, state: executing, conflict: cert failure, seqno: -1
SQL: insert into t values (1)

The single node C is then waiting for it’s peers to show up again, and in some cases if that happens, like when there was network outage and those nodes were up all the time, the cluster will be formed again automatically. Also if the nodes B and C were just network-severed from the first node, but they can still reach each other, they will keep functioning as they still form the quorum. If A and B were crashed ( due to data inconsistency, bug, etc. ) or off due to power outage, you need to do manual action to enable primary component on the C node, before you can bring A and B back. This way, we tell the C node “Hey, you can now form a new cluster alone, forget A and B!”. The command to do this is:

SET GLOBAL wsrep_provider_options='pc.bootstrap=true';

However, you should double check in order to be very sure the other nodes are really down before doing that! Otherwise, you will most likely end up with two clusters having different data.

Scenario 6

All nodes went down without proper shutdown procedure. Such situation may happen in case of datacenter power failure, hitting some MySQL or Galera bug leading to crash on all nodes, but also as a result of data consistency being compromised where cluster detects that each node has different data. In each of those cases, the grastate.dat file is not updated and does not contain valid sequence number (seqno). It may look like this:

cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 220dcdcb-1629-11e4-add3-aec059ad3734
seqno: -1
cert_index:

In this case, we are not sure if all nodes were consistent with each other, hence it is crucial to find the most advanced one in order to boostrap the cluster using it. Before starting mysql daemon on any node, you have to extract the last sequence number by checking it’s transactional state. You can do it this way:

[root@percona3 ~]# mysqld_safe --wsrep-recover
140821 15:57:15 mysqld_safe Logging to '/var/lib/mysql/percona3_error.log'.
140821 15:57:15 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140821 15:57:15 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.6bUIqM' --pid-file='/var/lib/mysql/percona3-recover.pid'
140821 15:57:17 mysqld_safe WSREP: Recovered position 4b83bbe6-28bb-11e4-a885-4fc539d5eb6a:2
140821 15:57:19 mysqld_safe mysqld from pid file /var/lib/mysql/percona3.pid ended

So the last committed transaction sequence number on this node was 2. Now you just need to bootstrap from the latest node first and then start the others.

However, the above procedure won’t be needed in the recent Galera versions (3.6+?), available since PXC 5.6.19. There is a new option – pc.recovery (enabled by default), which saves the cluster state into a file named gvwstate.dat on each member node. As the variable name says (pc – primary component), it saves only a cluster being in PRIMARY state. An example content of that file may look like this:

cat /var/lib/mysql/gvwstate.dat
my_uuid: 76de8ad9-2aac-11e4-8089-d27fd06893b9
#vwbeg
view_id: 3 6c821ecc-2aac-11e4-85a5-56fe513c651f 3
bootstrap: 0
member: 6c821ecc-2aac-11e4-85a5-56fe513c651f 0
member: 6d80ec1b-2aac-11e4-8d1e-b2b2f6caf018 0
member: 76de8ad9-2aac-11e4-8089-d27fd06893b9 0
#vwend

We can see three node cluster above with all members being up. Thanks to this new feature, in the case of power outage in our datacenter, after power is back, the nodes will read the last state on startup and will try to restore primary component once all the members again start to see each other. This makes the PXC cluster to automatically recover from being powered down without any manual intervention!  In the logs we will see:

140823 15:28:55 [Note] WSREP: restore pc from disk successfully
(...)
140823 15:29:59 [Note] WSREP: declaring 6c821ecc at tcp://192.168.90.3:4567 stable
140823 15:29:59 [Note] WSREP: declaring 6d80ec1b at tcp://192.168.90.4:4567 stable
140823 15:29:59 [Warning] WSREP: no nodes coming from prim view, prim not possible
140823 15:29:59 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 2, memb_num = 3
140823 15:29:59 [Note] WSREP: Flow-control interval: [28, 28]
140823 15:29:59 [Note] WSREP: Received NON-PRIMARY.
140823 15:29:59 [Note] WSREP: New cluster view: global state: 4b83bbe6-28bb-11e4-a885-4fc539d5eb6a:11, view# -1: non-Primary, number of nodes: 3, my index: 2, protocol version -1
140823 15:29:59 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140823 15:29:59 [Note] WSREP: promote to primary component
140823 15:29:59 [Note] WSREP: save pc into disk
140823 15:29:59 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 2, memb_num = 3
140823 15:29:59 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
140823 15:29:59 [Note] WSREP: clear restored view
(...)
140823 15:29:59 [Note] WSREP: Bootstrapped primary 00000000-0000-0000-0000-000000000000 found: 3.
140823 15:29:59 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = -1,
members = 3/3 (joined/total),
act_id = 11,
last_appl. = -1,
protocols = 0/6/2 (gcs/repl/appl),
group UUID = 4b83bbe6-28bb-11e4-a885-4fc539d5eb6a
140823 15:29:59 [Note] WSREP: Flow-control interval: [28, 28]
140823 15:29:59 [Note] WSREP: Restored state OPEN -> JOINED (11)
140823 15:29:59 [Note] WSREP: New cluster view: global state: 4b83bbe6-28bb-11e4-a885-4fc539d5eb6a:11, view# 0: Primary, number of nodes: 3, my index: 2, protocol version 2
140823 15:29:59 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140823 15:29:59 [Note] WSREP: REPL Protocols: 6 (3, 2)
140823 15:29:59 [Note] WSREP: Service thread queue flushed.
140823 15:29:59 [Note] WSREP: Assign initial position for certification: 11, protocol version: 3
140823 15:29:59 [Note] WSREP: Service thread queue flushed.
140823 15:29:59 [Note] WSREP: Member 1.0 (percona3) synced with group.
140823 15:29:59 [Note] WSREP: Member 2.0 (percona1) synced with group.
140823 15:29:59 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 11)
140823 15:29:59 [Note] WSREP: Synchronized with group, ready for connections

Scenario 7

Cluster lost it’s primary state due to split brain situation. For the purpose of this example, let’s assume we have the cluster formed from even number of nodes – six and three of them are in one location while another three in second location (datacenter) and network connectivity is broken between them. Of course the best practice is to avoid such topology: if you can’t have odd number of real nodes, at least you can use an additional arbitrator (garbd) node or set higher pc.weight to some nodes. But when split brain happens any way, so none of the separated groups can maintain the quorum – all nodes must stop serving requests and both parts of the cluster are just continuously trying to re-connect. If you want to restore the service even before the network link is restored, you can make one of the groups primary again using the same command like in scenario 5:

SET GLOBAL wsrep_provider_options='pc.bootstrap=true';

After that, you are able to work on the manually restored part of the cluster, and the second half should be able to automatically re-join using incremental state transfer (IST) once the network link is restored. But beware: if you set the bootstrap option on both the separated parts, you will end up with two living cluster instances, with data likely diverging away from each other. Restoring network link in that case won’t make them to re-join until nodes are restarted and try to re-connect to members specified in configuration file. Then, as Galera replication model truly cares about data consistency – once the inconsistency will be detected, nodes that cannot execute row change statement due to different data – will perform emergency shutdown and the only way to bring them back to the cluster will be via full SST.

I hope I covered most of the possible failure scenarios of Galera-based clusters, and made the recovery procedures bit more clear.

The post Galera replication – how to recover a PXC cluster appeared first on MySQL Performance Blog.

Galera data on Percona Cloud Tools (and other MySQL monitoring tools)

Sat, 30/08/2014 - 07:58

I was talking with a Percona Support customer earlier this week who was looking for Galera data on Percona Cloud Tools. (Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses.)

The customer mentioned they were already keeping track of some Galera stats on Cacti, and given they were inclined to use Percona Cloud Tools more and more, they wanted to know if it was already supporting Percona XtraDB Cluster. My answer was: “No, not yet: you can install agents in each node (the regular way in the first node, then manually on the other nodes… and when prompted say “No” to create MySQL user and provide the one you’re using already) and monitor them as autonomous MySQL servers – but the concept of cluster and specially the “Galera bits” has yet to be implemented there.

Except I was wrong.

By “concept of cluster” I mean supporting the notion of group instances, which should allow a single cluster-wide view for metrics and query reports, such as the slow queries (which are recorded locally on the node where the query was run and thus observed in a detached way). This still needs to be implemented indeed, but it’s on the roadmap.

The “Galera bits” I mentioned above are the various “wsrep_” status variables. In fact, those refer to the Write Set REPlication patches (based in the wsrep API), a set of hooks applied to the InnoDB/XtraDB storage engine and other components of MySQL that modifies the way replication works (to put it in a very simplified way), which in turn are used by the Galera library to provide a “generic Synchronous Multi-Master replication plugin for transactional applications.” A patched version of Percona Server together with the Galera libray compose the base of Percona XtraDB Cluster.

As I found out only now, Percona Cloud Tools does collect data from the various wsrep_ variables and it is available for use – it’s just not shown by default. A user only needs to add a dashboard/chart manually on PCT to see these metrics:

Click on the picture to enlarge it

Now, I need to call that customer …

Monitoring the cluster

Since I’m touching this topic I thought it would be useful to include some additional information on monitoring a Galera (Percona XtraDB Cluster in particular) cluster, even though most of what I mention below has already been published in different posts here on the MySQL Performance Blog. There’s a succint documentation page bearing the same title of this section that indicates the main wsrep_ variables you should monitor to check the health status of the cluster and how well it’s coping with load along the time (performance). Remember you can get a grasp of the full set of variables at any time by issuing the following command from one (or each one) of the nodes:

mysql> SHOW GLOBAL STATUS LIKE "wsrep_%";

And for a broader and real time view of the wsrep_ status variables you can use Jay Janssen’s myq_gadgets toolkit, which he modified a couple of years ago to account for Galera.

There’s also a specific Galera-template available in our Percona Monitoring Plugins (PMP) package that you can use in your Cacti server. That would cover the “how well the cluster copes with load along the time,” providing historical graphing. And while there isn’t a Galera specific plugin for Nagios in PMP, Jay explains in another post how you can customize pmp-check-mysql-status to “check any status variable you like,” describing his approach to keep a cluster’s “health status” in check by setting alerts on specific stats, on a per node basis.

VividCortex recently added a set of templates for Galera in their product and you can also rely on Severalnines’ ClusterControl monitoring features to get that “global view” of your cluster that Percona Cloud Tools doesn’t yet provide. Even though ClusterControl is a complete solution for cluster deployment and management, focusing on the automation of the whole process, the monitoring part alone is particularly interesting as it encompasses cluster-wide information in a customized way, including the “Galera bits”. You may want to give it a try as the monitoring features are available in the Community version of the product (and if you’re a Percona customer with a Support contract covering Percona XtraDB Cluster, then you’re entitled to get support for it from us).

One thing I should note that differentiate the monitoring solutions from above is that while you can install Cacti, Nagios and ClusterControl as servers in your own infrastructure both Percona Cloud Tools and VividCortex are hosted, cloud-based services. Having said that, neither one nor the other upload sensitive data to the cloud and both provide options for query obfuscation.

Summary

Contrary to what I believed, Percona Cloud Tools does provide support for “Galera bits” (the wsrep_ status variables), even though it has yet to implement support for the notion of group instances, which will allow for cluster-wide view for metrics and query reports. You can also rely on the Galera template for Cacti provided by Percona Monitoring Plugins for historical graphing and some clever use of Nagios’ pmp-check-mysql-status for customized cluster alerts. VividCortex and ClusterControl also provide monitoring for Galera.

Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications. Sign up to request access to the beta today.  

The post Galera data on Percona Cloud Tools (and other MySQL monitoring tools) appeared first on MySQL Performance Blog.

Galera data on Percona Cloud Tools (and other MySQL monitoring tools)

Sat, 30/08/2014 - 07:58

I was talking with a Percona Support customer earlier this week who was looking for Galera data on Percona Cloud Tools. (Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses.)

The customer mentioned they were already keeping track of some Galera stats on Cacti, and given they were inclined to use Percona Cloud Tools more and more, they wanted to know if it was already supporting Percona XtraDB Cluster. My answer was: “No, not yet: you can install agents in each node (the regular way in the first node, then manually on the other nodes… and when prompted say “No” to create MySQL user and provide the one you’re using already) and monitor them as autonomous MySQL servers – but the concept of cluster and specially the “Galera bits” has yet to be implemented there.

Except I was wrong.

By “concept of cluster” I mean supporting the notion of group instances, which should allow a single cluster-wide view for metrics and query reports, such as the slow queries (which are recorded locally on the node where the query was run and thus observed in a detached way). This still needs to be implemented indeed, but it’s on the roadmap.

The “Galera bits” I mentioned above are the various “wsrep_” status variables. In fact, those refer to the Write Set REPlication patches (based in the wsrep API), a set of hooks applied to the InnoDB/XtraDB storage engine and other components of MySQL that modifies the way replication works (to put it in a very simplified way), which in turn are used by the Galera library to provide a “generic Synchronous Multi-Master replication plugin for transactional applications.” A patched version of Percona Server together with the Galera libray compose the base of Percona XtraDB Cluster.

As I found out only now, Percona Cloud Tools does collect data from the various wsrep_ variables and it is available for use – it’s just not shown by default. A user only needs to add a dashboard/chart manually on PCT to see these metrics:

Click on the picture to enlarge it

Now, I need to call that customer …

Monitoring the cluster

Since I’m touching this topic I thought it would be useful to include some additional information on monitoring a Galera (Percona XtraDB Cluster in particular) cluster, even though most of what I mention below has already been published in different posts here on the MySQL Performance Blog. There’s a succint documentation page bearing the same title of this section that indicates the main wsrep_ variables you should monitor to check the health status of the cluster and how well it’s coping with load along the time (performance). Remember you can get a grasp of the full set of variables at any time by issuing the following command from one (or each one) of the nodes:

mysql> SHOW GLOBAL STATUS LIKE "wsrep_%";

And for a broader and real time view of the wsrep_ status variables you can use Jay Janssen’s myq_gadgets toolkit, which he modified a couple of years ago to account for Galera.

There’s also a specific Galera-template available in our Percona Monitoring Plugins (PMP) package that you can use in your Cacti server. That would cover the “how well the cluster copes with load along the time,” providing historical graphing. And while there isn’t a Galera specific plugin for Nagios in PMP, Jay explains in another post how you can customize pmp-check-mysql-status to “check any status variable you like,” describing his approach to keep a cluster’s “health status” in check by setting alerts on specific stats, on a per node basis.

VividCortex recently added a set of templates for Galera in their product and you can also rely on Severalnines’ ClusterControl monitoring features to get that “global view” of your cluster that Percona Cloud Tools doesn’t yet provide. Even though ClusterControl is a complete solution for cluster deployment and management, focusing on the automation of the whole process, the monitoring part alone is particularly interesting as it encompasses cluster-wide information in a customized way, including the “Galera bits”. You may want to give it a try as the monitoring features are available in the Community version of the product (and if you’re a Percona customer with a Support contract covering Percona XtraDB Cluster, then you’re entitled to get support for it from us).

One thing I should note that differentiate the monitoring solutions from above is that while you can install Cacti, Nagios and ClusterControl as servers in your own infrastructure both Percona Cloud Tools and VividCortex are hosted, cloud-based services. Having said that, neither one nor the other upload sensitive data to the cloud and both provide options for query obfuscation.

Summary

Contrary to what I believed, Percona Cloud Tools does provide support for “Galera bits” (the wsrep_ status variables), even though it has yet to implement support for the notion of group instances, which will allow for cluster-wide view for metrics and query reports. You can also rely on the Galera template for Cacti provided by Percona Monitoring Plugins for historical graphing and some clever use of Nagios’ pmp-check-mysql-status for customized cluster alerts. VividCortex and ClusterControl also provide monitoring for Galera.

Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications. Sign up to request access to the beta today.  

The post Galera data on Percona Cloud Tools (and other MySQL monitoring tools) appeared first on MySQL Performance Blog.

Percona Server 5.6.20-68.0 is now available

Sat, 30/08/2014 - 01:29

Percona is glad to announce the release of Percona Server 5.6.20-68.0 on August 29, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.20, including all the bug fixes in it, Percona Server 5.6.20-68.0 is the current GA release in the Percona Server 5.6 series. All of Percona’s software is open-source and free. Complete details of this release can be found in the 5.6.20-68.0 milestone on Launchpad.

New Features:

  • Percona Server has implemented the MySQL 5.7 SHOW SLAVE STATUS NONBLOCKING syntax for Lock-Free SHOW SLAVE STATUS feature. The existing SHOW SLAVE STATUS NOLOCK is kept as a deprecated alias and will be removed in Percona Server 5.7. There were no functional changes for the feature.
  • Percona Server Audit Log Plugin now supports JSON and CSV formats. The format choice is controlled by audit_log_format variable.
  • Percona Server Audit Log Plugin now supports streaming the audit log to syslog.
  • TokuDB storage engine package has been updated to version 7.1.8.

Bugs Fixed:

  • Querying INNODB_CHANGED_PAGES table with a range condition START_LSN > x AND END_LSN < y would lead to a server crash if the range was empty with x greater than y. Bug fixed #1202252 (Jan Lindström and Sergei Petrunia).
  • SQL statements of other connections were missing in the output of SHOW ENGINE INNODB STATUS, in LATEST DETECTED DEADLOCK and TRANSACTIONS sections. This bug was introduced by Statement Timeout patch in Percona Server 5.6.13-61.0. Bug fixed #1328824.
  • Some of TokuDB distribution files were missing in the TokuDB binary tarball. Bug fixed #1338945.
  • With XtraDB changed page tracking feature enabled, queries from the INNODB_CHANGED_PAGES could read the bitmap data whose write was in still progress. This would cause the query to fail with an ER_CANT_FIND_SYSTEM_REC and a warning printed to the server error log. The workaround has been to add an appropriate END_LSN-limiting condition to the query. Bug fixed #1193332.
  • mysqld-debug was missing from Debian packages. This regression was introduced in Percona Server 5.6.16-64.0. Bug fixed #1290087.
  • Fixed a memory leak in Slow Query Log Rotation and Expiration. Bug fixed #1314138.
  • The audit log plugin would write log with XML syntax errors when OLD and NEW formats were used. Bug fixed #1320879.
  • Combination of Log Archiving for XtraDB, XtraDB changed page tracking, and small InnoDB logs could hang the server on the bootstrap shutdown. Bug fixed #1326379.
  • --tc-heuristic-recover option values were broken. Bug fixed #1334330 (upstream #70860).
  • If the bitmap directory has a bitmap file sequence with a start LSN of one file less than a start LSN of the previous file, a debug build would assert when queries were run on INNODB_CHANGED_PAGES table. Bug fixed #1342494.

Other bugs fixed: #1337247, #1350386, #1208371, #1261341, #1151723, #1182050, #1182068, #1182072, #1184287, #1280875, #1338937, #1334743, #1349394, #1182046, #1182049, and #1328482 (upstream #73418).

Release notes for Percona Server 5.6.20-68.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

The post Percona Server 5.6.20-68.0 is now available appeared first on MySQL Performance Blog.

Percona Server 5.6.20-68.0 is now available

Sat, 30/08/2014 - 01:29

Percona is glad to announce the release of Percona Server 5.6.20-68.0 on August 29, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.20, including all the bug fixes in it, Percona Server 5.6.20-68.0 is the current GA release in the Percona Server 5.6 series. All of Percona’s software is open-source and free. Complete details of this release can be found in the 5.6.20-68.0 milestone on Launchpad.

New Features:

  • Percona Server has implemented the MySQL 5.7 SHOW SLAVE STATUS NONBLOCKING syntax for Lock-Free SHOW SLAVE STATUS feature. The existing SHOW SLAVE STATUS NOLOCK is kept as a deprecated alias and will be removed in Percona Server 5.7. There were no functional changes for the feature.
  • Percona Server Audit Log Plugin now supports JSON and CSV formats. The format choice is controlled by audit_log_format variable.
  • Percona Server Audit Log Plugin now supports streaming the audit log to syslog.
  • TokuDB storage engine package has been updated to version 7.1.8.

Bugs Fixed:

  • Querying INNODB_CHANGED_PAGES table with a range condition START_LSN > x AND END_LSN < y would lead to a server crash if the range was empty with x greater than y. Bug fixed #1202252 (Jan Lindström and Sergei Petrunia).
  • SQL statements of other connections were missing in the output of SHOW ENGINE INNODB STATUS, in LATEST DETECTED DEADLOCK and TRANSACTIONS sections. This bug was introduced by Statement Timeout patch in Percona Server 5.6.13-61.0. Bug fixed #1328824.
  • Some of TokuDB distribution files were missing in the TokuDB binary tarball. Bug fixed #1338945.
  • With XtraDB changed page tracking feature enabled, queries from the INNODB_CHANGED_PAGES could read the bitmap data whose write was in still progress. This would cause the query to fail with an ER_CANT_FIND_SYSTEM_REC and a warning printed to the server error log. The workaround has been to add an appropriate END_LSN-limiting condition to the query. Bug fixed #1193332.
  • mysqld-debug was missing from Debian packages. This regression was introduced in Percona Server 5.6.16-64.0. Bug fixed #1290087.
  • Fixed a memory leak in Slow Query Log Rotation and Expiration. Bug fixed #1314138.
  • The audit log plugin would write log with XML syntax errors when OLD and NEW formats were used. Bug fixed #1320879.
  • Combination of Log Archiving for XtraDB, XtraDB changed page tracking, and small InnoDB logs could hang the server on the bootstrap shutdown. Bug fixed #1326379.
  • --tc-heuristic-recover option values were broken. Bug fixed #1334330 (upstream #70860).
  • If the bitmap directory has a bitmap file sequence with a start LSN of one file less than a start LSN of the previous file, a debug build would assert when queries were run on INNODB_CHANGED_PAGES table. Bug fixed #1342494.

Other bugs fixed: #1337247, #1350386, #1208371, #1261341, #1151723, #1182050, #1182068, #1182072, #1184287, #1280875, #1338937, #1334743, #1349394, #1182046, #1182049, and #1328482 (upstream #73418).

Release notes for Percona Server 5.6.20-68.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

The post Percona Server 5.6.20-68.0 is now available appeared first on MySQL Performance Blog.

Percona Server 5.5.39-36.0 is now available

Sat, 30/08/2014 - 00:48

Percona is glad to announce the release of Percona Server 5.5.39-36.0 on August 29, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.39, including all the bug fixes in it, Percona Server 5.5.39-36.0 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.39-36.0 milestone at Launchpad.

New Features:

Bugs Fixed:

  • Querying INNODB_CHANGED_PAGES with a range condition START_LSN > x AND END_LSN < y would lead to a server crash if the range was empty with x greater than y. Bug fixed #1202252 (Jan Lindström and Sergei Petrunia).
  • With XtraDB changed page tracking feature enabled, queries from the INNODB_CHANGED_PAGES could read the bitmap data whose write was in still progress. This would cause the query to fail with an ER_CANT_FIND_SYSTEM_REC and a warning printed to the server error log. The workaround has been to add an appropriate END_LSN-limiting condition to the query. Bug fixed #1346122.
  • mysqld-debug was missing from Debian packages. This regression was introduced in Percona Server 5.5.36-34.0. Bug fixed #1290087.
  • Fixed a memory leak in Slow Query Log Rotation and Expiration. Bug fixed #1314138.
  • The audit log plugin would write log with XML syntax errors when OLD and NEW formats were used. Bug fixed #1320879.
  • A server built with system OpenSSL support, such as the distributed Percona Server binaries, had SSL-related memory leaks. Bug fixed #1334743 (upstream #73126).
  • If the bitmap directory has a bitmap file sequence with a start LSN of one file less than a start LSN of the previous file, a debug build would assert when queries were run on INNODB_CHANGED_PAGES table. Bug fixed #1342494.

Other bugs fixed: #1337324, #1151723, #1182050, #1182072, #1280875, #1182046, #1328482 (upstream #73418), and #1334317 (upstream #73111).

Release notes for Percona Server 5.5.39-36.0 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.39-36.0 is now available appeared first on MySQL Performance Blog.

Percona Server 5.5.39-36.0 is now available

Sat, 30/08/2014 - 00:48

Percona is glad to announce the release of Percona Server 5.5.39-36.0 on August 29, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.39, including all the bug fixes in it, Percona Server 5.5.39-36.0 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.39-36.0 milestone at Launchpad.

New Features:

Bugs Fixed:

  • Querying INNODB_CHANGED_PAGES with a range condition START_LSN > x AND END_LSN < y would lead to a server crash if the range was empty with x greater than y. Bug fixed #1202252 (Jan Lindström and Sergei Petrunia).
  • With XtraDB changed page tracking feature enabled, queries from the INNODB_CHANGED_PAGES could read the bitmap data whose write was in still progress. This would cause the query to fail with an ER_CANT_FIND_SYSTEM_REC and a warning printed to the server error log. The workaround has been to add an appropriate END_LSN-limiting condition to the query. Bug fixed #1346122.
  • mysqld-debug was missing from Debian packages. This regression was introduced in Percona Server 5.5.36-34.0. Bug fixed #1290087.
  • Fixed a memory leak in Slow Query Log Rotation and Expiration. Bug fixed #1314138.
  • The audit log plugin would write log with XML syntax errors when OLD and NEW formats were used. Bug fixed #1320879.
  • A server built with system OpenSSL support, such as the distributed Percona Server binaries, had SSL-related memory leaks. Bug fixed #1334743 (upstream #73126).
  • If the bitmap directory has a bitmap file sequence with a start LSN of one file less than a start LSN of the previous file, a debug build would assert when queries were run on INNODB_CHANGED_PAGES table. Bug fixed #1342494.

Other bugs fixed: #1337324, #1151723, #1182050, #1182072, #1280875, #1182046, #1328482 (upstream #73418), and #1334317 (upstream #73111).

Release notes for Percona Server 5.5.39-36.0 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.39-36.0 is now available appeared first on MySQL Performance Blog.

OpenStack Trove Day 2014 Recap: MySQL and DBaaS

Thu, 28/08/2014 - 18:00

OpenStack Trove Day

I just returned from a week in Cambridge, Massachusetts where I was attending the OpenStack Trove Day and the Trove mid-cycle meetup, both sponsored by the great folks at Tesora.

I am relatively new to the OpenStack and Trove arenas so this was a fantastic opportunity for me to learn more about the communities, the various components within OpenStack, and what part Trove plays. I found the entire event very worthwhile – I met a lot of key people in the community, learned more about Trove and its potential, and in general felt a great energy and excitement surrounding Trove and OpenStack as a whole.

There were more than 120 attendees at Trove Day. That is almost four times the initial estimate! I think I would call that a success. There were 7 very high quality topics that covered material ranging from new and coming features within Trove, to deep inspection of how it is currently used in several big name companies to an investor’s perspective of the OpenStack market. There were also 2 panel style discussions that covered a lot of ground with all participants being ‘guys on the ground’ actively working with OpenStack deployments including one of my fellow Perconians, Mr. Tim Sharp.

One of the main takeaways for me from the entire day was the forward looking adoption estimates for Trove. This came up over and over through the various talks and panels. There seems to be a tremendous amount of interest in Trove deployments for late 2014/2015 but very few actual live users today. There also seems to be a bit of a messaging issue and confusion amongst potential users as to what Trove really is and is not. Simply reading the Trove Mission Statement should quickly clarify:

The OpenStack Open Source Database as a Service Mission: To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.

So allow me to expand on that a bit based on some specific comments or questions that I overheard:
- Trove is NOT a database abstraction layer nor any sort of database unification tool; all applications still communicate with their respective datastores directly through their native APIs.
- Trove is NOT a database monitoring, management or analysis tool; all of your favorite debugging and monitoring tools like Percona Toolkit will still work exactly as advertised, and yes, you do need a monitoring tool.
- Although Trove does have some useful backup scheduling options, Trove is NOT a complete backup and recovery tool that can accommodate every backup strategy; you may still use 3rd party options such as scripting your own around Percona XtraBackup or make your life a lot easier and sign up for the Percona Backup Service.
- Trove IS a very nice way to add resource provisioning for many disparate datastores and has some ‘smarts’ built in for each. This ensures a common user experience when provisioning and managing datastore instances.

To that final point, our friends at Tesora introduced their new Database Certification Program at Trove Day. This new program will ensure a high level of compatibility between the various participating database vendors and the Trove project. Of course, Percona Server has already been certified.

I see the future of Trove as being very bright with a huge potential for expansion into other areas, once it is stabilized. I am very excited to begin contributing to this project and watch it grow.

Until next time…

The post OpenStack Trove Day 2014 Recap: MySQL and DBaaS appeared first on MySQL Performance Blog.

OpenStack Trove Day 2014 Recap: MySQL and DBaaS

Thu, 28/08/2014 - 18:00

OpenStack Trove Day

I just returned from a week in Cambridge, Massachusetts where I was attending the OpenStack Trove Day and the Trove mid-cycle meetup, both sponsored by the great folks at Tesora.

I am relatively new to the OpenStack and Trove arenas so this was a fantastic opportunity for me to learn more about the communities, the various components within OpenStack, and what part Trove plays. I found the entire event very worthwhile – I met a lot of key people in the community, learned more about Trove and its potential, and in general felt a great energy and excitement surrounding Trove and OpenStack as a whole.

There were more than 120 attendees at Trove Day. That is almost four times the initial estimate! I think I would call that a success. There were 7 very high quality topics that covered material ranging from new and coming features within Trove, to deep inspection of how it is currently used in several big name companies to an investor’s perspective of the OpenStack market. There were also 2 panel style discussions that covered a lot of ground with all participants being ‘guys on the ground’ actively working with OpenStack deployments including one of my fellow Perconians, Mr. Tim Sharp.

One of the main takeaways for me from the entire day was the forward looking adoption estimates for Trove. This came up over and over through the various talks and panels. There seems to be a tremendous amount of interest in Trove deployments for late 2014/2015 but very few actual live users today. There also seems to be a bit of a messaging issue and confusion amongst potential users as to what Trove really is and is not. Simply reading the Trove Mission Statement should quickly clarify:

The OpenStack Open Source Database as a Service Mission: To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.

So allow me to expand on that a bit based on some specific comments or questions that I overheard:
- Trove is NOT a database abstraction layer nor any sort of database unification tool; all applications still communicate with their respective datastores directly through their native APIs.
- Trove is NOT a database monitoring, management or analysis tool; all of your favorite debugging and monitoring tools like Percona Toolkit will still work exactly as advertised, and yes, you do need a monitoring tool.
- Although Trove does have some useful backup scheduling options, Trove is NOT a complete backup and recovery tool that can accommodate every backup strategy; you may still use 3rd party options such as scripting your own around Percona XtraBackup or make your life a lot easier and sign up for the Percona Backup Service.
- Trove IS a very nice way to add resource provisioning for many disparate datastores and has some ‘smarts’ built in for each. This ensures a common user experience when provisioning and managing datastore instances.

To that final point, our friends at Tesora introduced their new Database Certification Program at Trove Day. This new program will ensure a high level of compatibility between the various participating database vendors and the Trove project. Of course, Percona Server has already been certified.

I see the future of Trove as being very bright with a huge potential for expansion into other areas, once it is stabilized. I am very excited to begin contributing to this project and watch it grow.

Until next time…

The post OpenStack Trove Day 2014 Recap: MySQL and DBaaS appeared first on MySQL Performance Blog.

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Thu, 28/08/2014 - 01:15

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote data source, or to any other place of your liking.  You can find out more about FlexCDC basics in a previous blog post.

Please note that FlexCDC still needs to have source and destination instances defined in the configuration, even when using plugins.  This is because the FlexCDC state (how much into which binary log has FlexCDC progressed, and what tables are being captured) is stored in the “dest”.  Normally when using a plugin, the source and destination instances will be the same. FlexCDC will create a ‘flexviews’ database with a number of state tables in the destination instance.  This also means you still have to use the create_mvlog.php add_table.php or Flexview’s create_mvlog(…) procedure to mark which tables to capture!  See the previous blog post about FlexCDC.

When you create the mvlog, there will still be a changelog table created in the dest, just like when not using a plugin. This is because the INFORMATION_SCHEMA is used to get column datatypes and additional information (such as if an int is signed or unsigned) and this lookup is done against the table in the dest. The reason this is needed, is because mysqlbinlog, the utility used to scrape the binlog, produces strange output for large signed integers (it provides the signed and unsigned version), thus FlexCDC must figure out the right one to choose from the actual DDL of the changelog table. FlexCDC can’t look at the DDL of the source table though, because the consumer may be behind, and the current structure may not match the structure of the rows in the log.

The new plugin system allows you to do a lot of nifty things like:

  • Replicate to external databases
  • Publish changes to a message queue (this is like Facebook’s Wormhole)
  • Keep a remote cache server in sync
  • and more…

The latest version of Swanhart-Tools includes an Example plugin (in flexviews/consumer/include/example_plugin.php) that simply prints the events that come through it, not logging them into the changelog table at all. There is an example of the output at the end of the post.

The example plugin looks like this:

<?php class FlexCDC_Plugin { static function begin_trx($uow_id, $gsn) { echo "START TRANSACTION: trx_id: $uow_id, Prev GSN: $gsn"; } static function commit_trx($uow_id, $gsn) { echo "COMMIT: trx_id: $uow_id, Last GSN: $gsn"; } static function rollback_trx($uow_id) { echo "ROLLBACK: trx_id: $uow_id"; } static function insert($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: INSERT, AT: $gsn"; print_r($row); } static function delete($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: DELETE, AT: $gsn"; print_r($row); } static function update_before($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (OLD), AT: $gsn"; print_r($row); } static function update_after($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (NEW), AT: $gsn"; print_r($row); } }

Important Note: You must define all seven of these functions in your plugin, even if you do not plan to have actions for each of the callbacks – just leave the function body empty to do no action (the call is simply a noop that case.) Note that the plugin functions must be declared STATIC.  This is due to the way that FlexCDC calls the functions.

Transaction state callbacks
There are three callback functions which notify the plugin of changes in transaction state. Before I go into what they do, I want to note the $trx_id and $gsn parameters which are present in every callback. Each transaction is assigned a monotonically increasing transaction identifier. This sequence uniquely identifies each transaction that FlexCDC processes. In addition, each row change is assigned a unique sequence number which FlexCDC calls the Generic Sequence Number (or GSN).

As you can see, the start_trx(…) callback (called when a transaction starts) is passed both the new transaction number and also the highest GSN used in the previous transaction. This is called the GSN high water mark (GSNHWM). At transaction commit, the commit_trx(…) callback is called and the transaction id and the last GSN assigned in the transaction are passed into the callback. This same value will appear as the GSNHWM in the next start_trx(…) callback. Finally, at rollback any sequence numbers assigned in that transaction will be re-used, so no GSN is passed to the rollback callback, but a transaction id is, which lets you determine exactly which transaction is rolling back.

Row change callbacks

Each of the four row change callback functions capture a particular change to the data set. Each of the functions take five parameters. The first ($row) is an array which contains the row being acted upon. The second ($db) is the schema which contains the row. The third ($table) is the table that contains the row. Each callback also receives the transaction identifier, and of course, each row change is assigned a unique GSN.

For example:
An update will fire both update_before(…) and update_after(…) callbacks with the row images before and after the change, respectively. There is an example of this at the end of the post.

Configuring FlexCDC to use a plugin
FlexCDC uses a configuration file called consumer.ini by default.  To the [flexcdc] section add:
plugin=plugin_file.php

The plugin must be in the FlexCDC include/ directory.  You will find example_plugin.php in this directory, to serve as an example.

How it works
Flexviews uses mysqlbinlog to decode the binary log from the source server. It uses the –decode-rows=ROWS option to decode RBR into a format which can be parsed by an external utility. FlexCDC collects information about each transaction and the row changes that happen in the database (which means it requires ROW based binary logging to be used.)  When a plugin is defined the normal actions used by FlexCDC are overridden with the callback functions.

Here is the output from the example plugin, for an update that affected 3 rows (update test.t3 set c1 = c1 – 1):

START TRANSACTION: trx_id: 44, Prev GSN: 107 TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 108 Array ( [c1] => -3 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 109 Array ( [c1] => -4 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 110 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 111 Array ( [c1] => -6 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 112 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 113 Array ( [c1] => -6 [c2] => 2 ) COMMIT: trx_id: 44, Last GSN: 113

One thing you should notice, is that FlexCDC provides column names for the data coming from the binary log. This is because the log table exists in the dest instance and FlexCDC can get the list of column names from there. When you use other CDC tools, like the C binlog API, you don’t get column names.

The post Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL appeared first on MySQL Performance Blog.

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Thu, 28/08/2014 - 01:15

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote data source, or to any other place of your liking.  You can find out more about FlexCDC basics in a previous blog post.

Please note that FlexCDC still needs to have source and destination instances defined in the configuration, even when using plugins.  This is because the FlexCDC state (how much into which binary log has FlexCDC progressed, and what tables are being captured) is stored in the “dest”.  Normally when using a plugin, the source and destination instances will be the same. FlexCDC will create a ‘flexviews’ database with a number of state tables in the destination instance.  This also means you still have to use the create_mvlog.php add_table.php or Flexview’s create_mvlog(…) procedure to mark which tables to capture!  See the previous blog post about FlexCDC.

When you create the mvlog, there will still be a changelog table created in the dest, just like when not using a plugin. This is because the INFORMATION_SCHEMA is used to get column datatypes and additional information (such as if an int is signed or unsigned) and this lookup is done against the table in the dest. The reason this is needed, is because mysqlbinlog, the utility used to scrape the binlog, produces strange output for large signed integers (it provides the signed and unsigned version), thus FlexCDC must figure out the right one to choose from the actual DDL of the changelog table. FlexCDC can’t look at the DDL of the source table though, because the consumer may be behind, and the current structure may not match the structure of the rows in the log.

The new plugin system allows you to do a lot of nifty things like:

  • Replicate to external databases
  • Publish changes to a message queue (this is like Facebook’s Wormhole)
  • Keep a remote cache server in sync
  • and more…

The latest version of Swanhart-Tools includes an Example plugin (in flexviews/consumer/include/example_plugin.php) that simply prints the events that come through it, not logging them into the changelog table at all. There is an example of the output at the end of the post.

The example plugin looks like this:

<?php class FlexCDC_Plugin { static function begin_trx($uow_id, $gsn) { echo "START TRANSACTION: trx_id: $uow_id, Prev GSN: $gsn"; } static function commit_trx($uow_id, $gsn) { echo "COMMIT: trx_id: $uow_id, Last GSN: $gsn"; } static function rollback_trx($uow_id) { echo "ROLLBACK: trx_id: $uow_id"; } static function insert($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: INSERT, AT: $gsn"; print_r($row); } static function delete($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: DELETE, AT: $gsn"; print_r($row); } static function update_before($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (OLD), AT: $gsn"; print_r($row); } static function update_after($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (NEW), AT: $gsn"; print_r($row); } }

Important Note: You must define all seven of these functions in your plugin, even if you do not plan to have actions for each of the callbacks – just leave the function body empty to do no action (the call is simply a noop that case.) Note that the plugin functions must be declared STATIC.  This is due to the way that FlexCDC calls the functions.

Transaction state callbacks
There are three callback functions which notify the plugin of changes in transaction state. Before I go into what they do, I want to note the $trx_id and $gsn parameters which are present in every callback. Each transaction is assigned a monotonically increasing transaction identifier. This sequence uniquely identifies each transaction that FlexCDC processes. In addition, each row change is assigned a unique sequence number which FlexCDC calls the Generic Sequence Number (or GSN).

As you can see, the start_trx(…) callback (called when a transaction starts) is passed both the new transaction number and also the highest GSN used in the previous transaction. This is called the GSN high water mark (GSNHWM). At transaction commit, the commit_trx(…) callback is called and the transaction id and the last GSN assigned in the transaction are passed into the callback. This same value will appear as the GSNHWM in the next start_trx(…) callback. Finally, at rollback any sequence numbers assigned in that transaction will be re-used, so no GSN is passed to the rollback callback, but a transaction id is, which lets you determine exactly which transaction is rolling back.

Row change callbacks

Each of the four row change callback functions capture a particular change to the data set. Each of the functions take five parameters. The first ($row) is an array which contains the row being acted upon. The second ($db) is the schema which contains the row. The third ($table) is the table that contains the row. Each callback also receives the transaction identifier, and of course, each row change is assigned a unique GSN.

For example:
An update will fire both update_before(…) and update_after(…) callbacks with the row images before and after the change, respectively. There is an example of this at the end of the post.

Configuring FlexCDC to use a plugin
FlexCDC uses a configuration file called consumer.ini by default.  To the [flexcdc] section add:
plugin=plugin_file.php

The plugin must be in the FlexCDC include/ directory.  You will find example_plugin.php in this directory, to serve as an example.

How it works
Flexviews uses mysqlbinlog to decode the binary log from the source server. It uses the –decode-rows=ROWS option to decode RBR into a format which can be parsed by an external utility. FlexCDC collects information about each transaction and the row changes that happen in the database (which means it requires ROW based binary logging to be used.)  When a plugin is defined the normal actions used by FlexCDC are overridden with the callback functions.

Here is the output from the example plugin, for an update that affected 3 rows (update test.t3 set c1 = c1 – 1):

START TRANSACTION: trx_id: 44, Prev GSN: 107 TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 108 Array ( [c1] => -3 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 109 Array ( [c1] => -4 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 110 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 111 Array ( [c1] => -6 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 112 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 113 Array ( [c1] => -6 [c2] => 2 ) COMMIT: trx_id: 44, Last GSN: 113

One thing you should notice, is that FlexCDC provides column names for the data coming from the binary log. This is because the log table exists in the dest instance and FlexCDC can get the list of column names from there. When you use other CDC tools, like the C binlog API, you don’t get column names.

The post Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL appeared first on MySQL Performance Blog.

mysqld_multi: How to run multiple instances of MySQL

Wed, 27/08/2014 - 00:42

The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi:

“mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.”

For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create?

Understanding the concept of groups in my.cnf

You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups:

[client] port = 3306 socket = /var/run/mysqld/mysqld.sock user = john password = p455w0rd [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [xtrabackup] target_dir = /backups/mysql/

The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf.

Getting started with multiple instances

To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances:

[mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [mysqld1] user = mysql pid-file = /var/run/mysqld/mysqld1.pid socket = /var/run/mysqld/mysqld1.sock port = 3307 datadir = /data/mysql/mysql1 [mysqld7] user = mysql pid-file = /var/run/mysqld/mysqld7.pid socket = /var/run/mysqld/mysqld7.sock port = 3308 datadir = /data/mysql/mysql7

Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it):

mysql_install_db --user=mysql --datadir=/data/mysql/mysql7

Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first.

Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances.

Two commands you might find useful when configuring multiple instances are:

$ mysqld_multi --example

…which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and:

$ my_print_defaults --defaults-file=/etc/my.cnf mysqld7

…which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf.

Managing multiple instances

mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose:

mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass'; mysql> FLUSH PRIVILEGES;

If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example:

[mysqld_multi] mysqld = /usr/bin/mysqld_safe mysqladmin = /usr/bin/mysqladmin user = multi_admin password = multipass

You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves:

$ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is not running MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi start $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is running $ mysqld_multi stop 7,0 $ mysqld_multi report 7 Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is not running

Managing the MySQL daemon

What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share/<mysql|percona-server>/mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough:

basedir=/usr bindir=/usr/bin

Configuring an instance with a different version of MySQL

If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt:

$ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt

Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574:

$ mysqld_multi stop 0 $ cp -r /data/mysql/mysql1 /data/mysql/mysql5620 $ chown mysql:mysql -R /data/mysql/mysql5620

and added a new group to my.cnf as follows:

[mysqld5620] user = mysql pid-file = /var/run/mysqld/mysqld5620.pid socket = /var/run/mysqld/mysqld5620.sock port = 3309 datadir = /data/mysql/mysql5620 basedir = /opt/mysql-5.6.20-linux-glibc2.5-x86_64 mysqld = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe

Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade.

* I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with:

*** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 ***

Using the conventional tools to start and stop an instance

Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start:

mysqld --defaults-group-suffix=5620

and –socket to indicate the one you want to stop:

$mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown

* However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating:

[ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe'

I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment.

Backups

The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally:

$ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql

In Percona XtraBackup there’s an option named  –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup :

$ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/

Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one.

Summary

Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances.

The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization.

I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.

The post mysqld_multi: How to run multiple instances of MySQL appeared first on MySQL Performance Blog.

mysqld_multi: How to run multiple instances of MySQL

Wed, 27/08/2014 - 00:42

The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi:

“mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.”

For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create?

Understanding the concept of groups in my.cnf

You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups:

[client] port = 3306 socket = /var/run/mysqld/mysqld.sock user = john password = p455w0rd [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [xtrabackup] target_dir = /backups/mysql/

The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf.

Getting started with multiple instances

To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances:

[mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [mysqld1] user = mysql pid-file = /var/run/mysqld/mysqld1.pid socket = /var/run/mysqld/mysqld1.sock port = 3307 datadir = /data/mysql/mysql1 [mysqld7] user = mysql pid-file = /var/run/mysqld/mysqld7.pid socket = /var/run/mysqld/mysqld7.sock port = 3308 datadir = /data/mysql/mysql7

Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it):

mysql_install_db --user=mysql --datadir=/data/mysql/mysql7

Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first.

Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances.

Two commands you might find useful when configuring multiple instances are:

$ mysqld_multi --example

…which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and:

$ my_print_defaults --defaults-file=/etc/my.cnf mysqld7

…which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf.

Managing multiple instances

mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose:

mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass'; mysql> FLUSH PRIVILEGES;

If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example:

[mysqld_multi] mysqld = /usr/bin/mysqld_safe mysqladmin = /usr/bin/mysqladmin user = multi_admin password = multipass

You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves:

$ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is not running MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi start $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is running $ mysqld_multi stop 7,0 $ mysqld_multi report 7 Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is not running

Managing the MySQL daemon

What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share/<mysql|percona-server>/mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough:

basedir=/usr bindir=/usr/bin

Configuring an instance with a different version of MySQL

If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt:

$ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt

Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574:

$ mysqld_multi stop 0 $ cp -r /data/mysql/mysql1 /data/mysql/mysql5620 $ chown mysql:mysql -R /data/mysql/mysql5620

and added a new group to my.cnf as follows:

[mysqld5620] user = mysql pid-file = /var/run/mysqld/mysqld5620.pid socket = /var/run/mysqld/mysqld5620.sock port = 3309 datadir = /data/mysql/mysql5620 basedir = /opt/mysql-5.6.20-linux-glibc2.5-x86_64 mysqld = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe

Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade.

* I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with:

*** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 ***

Using the conventional tools to start and stop an instance

Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start:

mysqld --defaults-group-suffix=5620

and –socket to indicate the one you want to stop:

$mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown

* However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating:

[ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe'

I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment.

Backups

The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally:

$ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql

In Percona XtraBackup there’s an option named  –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup :

$ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/

Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one.

Summary

Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances.

The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization.

I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.

The post mysqld_multi: How to run multiple instances of MySQL appeared first on MySQL Performance Blog.

OpenStack’s Trove: The benefits of this database as a service (DBaaS)

Mon, 25/08/2014 - 22:00

In a previous post, my colleague Dimitri Vanoverbeke discussed at a high level the concepts of database as a service (DBaaS), OpenStack and OpenStack’s implementation of a DBaaS, Trove. Today I’d like to delve a bit further into Trove and discuss where it fits in, and who benefits.

Just to recap, Trove is OpenStack’s implementation of a database as a service for its cloud infrastructure as a service (IaaS). And as the mission statement declares, the Trove project seeks to provide a scalable and reliable cloud database service providing functionality for both relational and non-relational database engines. With the current release of Icehouse, the technology has begun to show maturity providing both stability and a rich feature set.

In my opinion, there are two primary markets that will benefit from Trove: the first being service providers such as RackSpace who provide cloud-based services similar to Amazon’s AWS. These are companies that wish to expand beyond the basic cloud services of storage and networking and provide their customer base with a richer cloud experience by providing higher level services such as DBaaS functionality. The other players are those companies that wish to “cloudify” their own internal systems. The reasons for this decision are varied, ranging from the desire to maintain complete control over all the architecture and the cloud components to legal constraints limiting the use of public cloud infrastructures.

With Trove, much of the management of your database system is taken care of by automating a significant portion of the configuration and initial setup steps necessitated when launching a new server. This includes deployment, configuration, patching, backups, restores, and monitoring that can be administered from either a CLI interface, RESTful API’s or OpenStack’s Horizon dashboard. At this point, what Trove doesn’t provide is failover, replication and clustering. This functionality is slated to be implemented in the Kilo release of OpenStack due out in April/2015.

The process flow is relatively simple. The OpenStack Administrator first configures the basic infrastructure by installing the database service. He or she would then create an image for each type of database they wish to support such as MySQL or MongoDB. They would then import the images and offer them to their tenants. From the end users perspective only a few commandes are necessary to get up and running. First issuing the <trove create> command to create a database service instance, followed by <trove list> command to get the ID of the instance and finally trove show command to get the IP address of it.

For example to create a database, you first start off by creating a database instance. This is an isolated database environment with compute and storage resources in a single tenant environment on a shared physical host machine. You can run a database instance with a variety of database engines such as MySQL or MongoDB.

From the Trove client I can issue the following command to create a database instance called PS_troveinstance, with a volume size of 2 GB, a user called PS_user, a password PS_password and the MySQL datastore (or database engine):

$ trove create –size 2 –users PS_user:PS_password –datastore MySQL PS_troveinstance

Next I issue the following command to get the ID of the database instance:

$ trove list PS_troveinstance

And finally, to create a database called PS_trovedb, I execute:

$ trove database-create PS_troveinstance PS_trovedb

Alternatively, I could have just combined the above commands as:

$ trove create –size 2 —-database PS_trovedb users PS_user:PS_password –datastore MySQL PS_troveinstance

And thus we now have a MySQL database server containing a database called PS_trovedb.

In our next post on OpenStack/Trove, we’ll dig even further and discuss the software and hardware requirements, and how to actually set up Trove.

On a related note, Percona has several experts attending this week’s OpenStack Operations Summit in San Antonio, Texas. One of them is Matt Griffin, director of product management, who pointed out in a recent post that many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. “We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for November’s Paris OpenStack Summit,” Matt said. So stay tuned for related news from him, too, on that front.

The post OpenStack’s Trove: The benefits of this database as a service (DBaaS) appeared first on MySQL Performance Blog.

OpenStack’s Trove: The benefits of this database as a service (DBaaS)

Mon, 25/08/2014 - 22:00

In a previous post, my colleague Dimitri Vanoverbeke discussed at a high level the concepts of database as a service (DBaaS), OpenStack and OpenStack’s implementation of a DBaaS, Trove. Today I’d like to delve a bit further into Trove and discuss where it fits in, and who benefits.

Just to recap, Trove is OpenStack’s implementation of a database as a service for its cloud infrastructure as a service (IaaS). And as the mission statement declares, the Trove project seeks to provide a scalable and reliable cloud database service providing functionality for both relational and non-relational database engines. With the current release of Icehouse, the technology has begun to show maturity providing both stability and a rich feature set.

In my opinion, there are two primary markets that will benefit from Trove: the first being service providers such as RackSpace who provide cloud-based services similar to Amazon’s AWS. These are companies that wish to expand beyond the basic cloud services of storage and networking and provide their customer base with a richer cloud experience by providing higher level services such as DBaaS functionality. The other players are those companies that wish to “cloudify” their own internal systems. The reasons for this decision are varied, ranging from the desire to maintain complete control over all the architecture and the cloud components to legal constraints limiting the use of public cloud infrastructures.

With Trove, much of the management of your database system is taken care of by automating a significant portion of the configuration and initial setup steps necessitated when launching a new server. This includes deployment, configuration, patching, backups, restores, and monitoring that can be administered from either a CLI interface, RESTful API’s or OpenStack’s Horizon dashboard. At this point, what Trove doesn’t provide is failover, replication and clustering. This functionality is slated to be implemented in the Kilo release of OpenStack due out in April/2015.

The process flow is relatively simple. The OpenStack Administrator first configures the basic infrastructure by installing the database service. He or she would then create an image for each type of database they wish to support such as MySQL or MongoDB. They would then import the images and offer them to their tenants. From the end users perspective only a few commandes are necessary to get up and running. First issuing the <trove create> command to create a database service instance, followed by <trove list> command to get the ID of the instance and finally trove show command to get the IP address of it.

For example to create a database, you first start off by creating a database instance. This is an isolated database environment with compute and storage resources in a single tenant environment on a shared physical host machine. You can run a database instance with a variety of database engines such as MySQL or MongoDB.

From the Trove client I can issue the following command to create a database instance called PS_troveinstance, with a volume size of 2 GB, a user called PS_user, a password PS_password and the MySQL datastore (or database engine):

$ trove create –size 2 –users PS_user:PS_password –datastore MySQL PS_troveinstance

Next I issue the following command to get the ID of the database instance:

$ trove list PS_troveinstance

And finally, to create a database called PS_trovedb, I execute:

$ trove database-create PS_troveinstance PS_trovedb

Alternatively, I could have just combined the above commands as:

$ trove create –size 2 —-database PS_trovedb users PS_user:PS_password –datastore MySQL PS_troveinstance

And thus we now have a MySQL database server containing a database called PS_trovedb.

In our next post on OpenStack/Trove, we’ll dig even further and discuss the software and hardware requirements, and how to actually set up Trove.

On a related note, Percona has several experts attending this week’s OpenStack Operations Summit in San Antonio, Texas. One of them is Matt Griffin, director of product management, who pointed out in a recent post that many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. “We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for November’s Paris OpenStack Summit,” Matt said. So stay tuned for related news from him, too, on that front.

The post OpenStack’s Trove: The benefits of this database as a service (DBaaS) appeared first on MySQL Performance Blog.

When (and how) to move an InnoDB table outside the shared tablespace

Sat, 23/08/2014 - 00:29

In my last post, “A closer look at the MySQL ibdata1 disk space issue and big tables,” I looked at the growing ibdata1 problem under the perspective of having big tables residing inside the so-called shared tablespace. In the particular case that motivated that post, we had a customer running out of disk space in his server who was looking for a way to make the ibdata1 file shrink. As you may know, that file (or, as explained there, the set of ibdata files composing the shared tablespace) stores all InnoDB tables created when innodb_file_per_table is disabled, but also other InnoDB structures, such as undo logs and data dictionary.

For example, when you run a transaction involving InnoDB tables, MySQL will first write all the changes it triggers in an undo log, for the case you later decide to “roll them back”. Long standing, uncommited transactions are one of the causes for a growing ibdata file. Of course, if you have innodb_file_per_table disabled then all your InnoDB tables live inside it. That was what happened in that case.

So, how do you move a table outside the shared tablespace and change the storage engine it relies on? As importantly, how does that affects disk space use? I’ve explored some of the options presented in the previous post and now share my findings with you below.

The experiment

I created a very simple InnoDB table inside the shared tablespace of a fresh installed Percona Server 5.5.37-rel35.1 with support for TokuDB and configured it with a 1GB buffer pool. I’ve used a 50G partition to host the ‘datadir’ and another one for ‘tmpdir’:

Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg0-lvFernando1 50G 110M 47G 1% /media/lvFernando1 # datadir /dev/mapper/vg0-lvFernando2 50G 52M 47G 1% /media/lvFernando2 # tmpdir

Here’s the table structure:

CREATE TABLE `joinit` ( `i` int(11) NOT NULL AUTO_INCREMENT, `s` varchar(64) DEFAULT NULL, `t` time NOT NULL, `g` int(11) NOT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I populated it with 134 million rows using the following routine:

INSERT INTO joinit VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 ))); INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; # repeat until you reach your target number of rows

which resulted in the table below:

mysql> show table status from test like 'joinit'G *************************** 1. row *************************** Name: joinit Engine: InnoDB Version: 10 Row_format: Compact Rows: 134217909 Avg_row_length: 72 Data_length: 9783214080 Max_data_length: 0 Index_length: 0 Data_free: 1013972992 Auto_increment: 134872552 Create_time: 2014-07-30 20:42:42 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

The resulting ibdata1 file was showing to have 11G, which accounted in practice for 100% of the datadir partition use then. What follows next is a few experiences I did by converting that table to use a different storage engine, moving it outside the shared tablespace, compressing it, and dumping and restoring the database back to see the effects in disk space use. I haven’t timed how long running each command took and focused mostly on the generated files size. As a bonus, I’ve also looked at how to extend the shared table space by adding an extra ibdata file.

#1) Converting to MyISAM

Technical characteristics and features apart, MyISAM tables are know to occupy less disk space than InnoDB’s ones. How much less depends on the actual table structure. Here I made the conversion in the most simplest way:

mysql> ALTER TABLE test.joinit ENGINE=MYISAM;

which created the following files (the .frm file already existed):

$ ls -lh /media/lvFernando1/data/test/ total 8.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 16:21 joinit.frm -rw-rw---- 1 fernando.laudares admin 7.0G Jul 31 16:27 joinit.MYD -rw-rw---- 1 fernando.laudares admin 1.3G Jul 31 16:27 joinit.MYI

The resulting MyISAM files amounted for an additional 8.3G of disk space use:

/dev/mapper/vg0-lvFernando1 50G 19G 29G 40% /media/lvFernando1

I was expecting smaller files but, of course, the result depends largely on the data types of the columns composing the table. The problem (or the consequence) is that we end up with close to the double of the initial disk space being used:

As it happens with the other solutions presented in this section that migrate the target table outside the shared tablespace, the common/safest way to reclaim the freed (unused) space inside the ibdata1 file back to the operating system is by doing a dump & restore of the full database.

There’s an alternative approach with MyISAM though, which doesn’t involve dump & restore and only requires a MySQL restart. However, you need to convert all InnoDB tables to MyISAM, stop MySQL, delete all ib* files (there should be no remaining .ibd files after you’ve converted all InnoDB tables to MyISAM), and then restart MySQL again. Upon MySQL restart, ibdata1 will be re-created with it’s default initial size (more on this below). You can then convert the MyISAM tables back to InnoDB and if you have innodb_file_per_table enabled this time then the tables will be created with their own private tablespace file.

#2) Exporting the table to a private tablespace

Once you have innodb_file_per_table enabled you can move a table residing inside ibdata1 to it’s private tablespace (it’s own .ibd file) by either running ALTER TABLE or OPTIMIZE TABLE. Both commands create a “temporary” (though InnoDB, not MyISAM) table with it’s own tablespace file inside the database directory (and not in the tmpdir, as I believed it would happen), the rows from the target table being copied over there.

Here’s showing the temporary table (#sql-4f10_1) that was created while the process was still ongoing:

$ ls -lh /media/lvFernando1/data/test total 2.2G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 20:42 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 #sql-4f10_1.frm -rw-rw---- 1 fernando.laudares admin 2.2G Jul 30 23:12 #sql-4f10_1.ibd

and the resulting .ibd file from when the process completed:

$ ls -lh /media/lvFernando1/data/test total 9.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 joinit.frm -rw-rw---- 1 fernando.laudares admin 9.3G Jul 30 23:35 joinit.ibd

Note that the new joinit.ibd file is about 9.3G. Again, the process resulted in the use of extra disk space:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#3) Dump and restore: looking at the disk space use

As pointed, the way to reclaim unused disk space inside ibdata1 back to the file system (and consequently making it shrink) is by dumping the full database to a text file and then restoring it back. I’ve started by doing a simple full mysqldump encompassing all databases:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump.sql

which created the following file:

-rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 00:02 dump.sql

I’ve then stopped MySQL, wiped out the full datadir and used the script mysql_install_db to (re-)create the system tables (though that’s not needed, I did it to level comparisons; once you have all InnoDB tables out of the system tablespace you can simply delete all ib* files along any .ibd and .frm files for related InnoDB tables), started MySQL again, and finally restored the backup:

$ mysql -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox < dump.sql

This resulted in:

/dev/mapper/vg0-lvFernando1 50G 9.4G 38G 21% /media/lvFernando1

and:

-rw-rw---- 1 fernando.laudares admin 18M Jul 31 05:05 /media/lvFernando1/data/ibdata1

ibdata1 did shrink in the process (returning to the default* initial size) and we recovered back a couple of gigabytes that were not being used by the single InnoDB table that used to live inside it.

* The default value for innodb_data_file_path for the Percona Server release we used on the tests is “ibdata1:10M:autoextend”. However, the manual states that “the default behavior is to create a single auto-extending data file, slightly larger than 10MB“. Given that innodb_autoextend_increment is set to 8M by default what we get in practice is an initialized ibdata1 file of 18M, as seen above.

#4) Compressing the table

Back to the original scenario, with the test table still living in the shared tablespace, I’ve enabled innodb_file_per_table, set Barracuda as the innodb_file_format and then ran:

mysql> ALTER TABLE test.joinit ENGINE=INNODB ROW_FORMAT=Compressed;

This resulted in an .ibd file with half the size of an uncompressed one:

-rw-rw---- 1 fernando.laudares admin 4.7G Jul 31 13:59 joinit.ibd

The process was a lengthy one but resulted in considerable less disk space use than converting the table to MyISAM:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#5) Converting the table to TokuDB

I’ve used TokuDB version 7.1.6 (not the latest one), which was bundled in Percona Server 5.5.37-rel35.1.

mysql> ALTER TABLE test.joinit ENGINE=TOKUDB;

As was the case with converting the InnoDB table to it’s own tablespace, a temporary table was created to intermediate the process, with the table definition residing in the test database directory:

$ ls -lh /media/lvFernando1/data/test/ total 24K -rw-r----- 1 fernando.laudares admin 8.5K Jul 31 14:22 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 14:28 #sql-7f51_1.frm

However, TokuDB created a temporary file in the main datadir to copy the data into, show here at some point during the process:

-rwxrwx--x 1 fernando.laudares admin 32K Jul 31 14:28 _test_sql_7f51_1_main_a_2_19.tokudb -rwxrwx--x 1 fernando.laudares admin 16K Jul 31 14:28 _test_sql_7f51_1_status_a_1_19.tokudb -rw------- 1 fernando.laudares admin 528M Jul 31 14:29 tokuldNk5W4v

Once the process completed the file tokuldNk5W4v disappeared and the data ended up in the _test_sql_7f51_1_main_a_2_19.tokudb file:

-rwxrwx--x 1 fernando.laudares admin 1.1G Jul 31 14:32 _test_sql_7f51_1_main_d_1_19_B_0.tokudb

The new .tokudb file is so small compared to the other files obtained in the previous approaches that it almost goes unnoticed when looking at the disk space use:

/dev/mapper/vg0-lvFernando1 50G 12G 36G 25% /media/lvFernando1

Curiously, that file retained the name/reference of the temporary table it used, even though the temporary table file description (#sql-7f51_1.frm) disappeared as well; only joinit.frm remained under the test database directory.

I decided to do a dump & restore of the whole database to see what that would result in this case:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump2.sql $ ls -lh dump2.sql -rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 15:07 /home/fernando.laudares/dump2.sql

Once it was restored we found the ibdata1 file back to its initial default size and the disk space used by the datadir was down to a scant 1.1G. Curiously (again), the dump & restore procedure did fixed the name of the TokuDB table file:

-rwxrwx--x 1 fernando.laudares admin 980M Jul 31 15:36 _test_joinit_main_21_1_19_B_0.tokudb

The process of converting the resident InnoDB table to TokuDB was faster than compressing it with ROW_FORMAT=Compressed and resulted in a much smaller file. This is not to say that using TokuDB is the best solution but to point that the process can be accomplished in less time and make use of lesser extra disk space than InnoDB does, which can simply come up handy if you don’t have much space left. Also, remember the test was done with a not very large (and simple in structure) test table; you may need to check if yours can be converted to TokuDB and what changes to indexes you might need to do (if any).

#6) Expanding the shared tablespace

As previously mentioned, I’ve been using the default value of  “ibdata1:10M:autoextend” for innodb_data_file_path for my tests. If you have another partition with unused space available and you look only for a solution to the “running out of disk space” problem and don’t mind keeping your big table inside the shared tablespace, then you can expand it. That can be done by means of adding a second ibdata file to the tablespace definition. Of course, that only works if you create it outside the partition already hosting ibdata1.

To do so, you need to stop MySQL and verify the size of the ibdata1 file you have (list size in bytes as “ls -lh” will round it here):

$ ls -l /media/lvFernando1/data/ibdata1 -rw-r----- 1 fernando.laudares admin 10957619200 Aug 1 16:20 /media/lvFernando1/data/ibdata1

I had available space on the /media/lvFernando2 partition so I decided to have ibdata2 created there. To do so, I’ve added the following 2 lines to my.cnf:

innodb_data_home_dir = innodb_data_file_path=/media/lvFernando1/data/ibdata1:10957619200;/media/lvFernando2/tmp/ibdata2:10M:autoextend

Then I restarted MySQL and confirmed that ibdata2 was created there upon initialization:

$ ls -lh /media/lvFernando2/tmp/ total 10M -rw-rw---- 1 fernando.laudares admin 10M Aug 1 16:30 ibdata2

Three important things to note on this approach:

  1. There can be only one ibdata file listed in innodb_data_file_path configured with “autoextend” - the last one in the list.
  2. You need to redefine ibdata1 in innodb_data_file_path using it’s current size. Any other size won’t work.
  3. From the moment you redefined ibdata1 with it’s current size and added ibdata2 configured with “autoexend”, ibdata1 won’t grow any bigger; the common tablespace will be increased through ibdata2.

What the 3rd point really means in practice is that you need to plan this maneuver accordingly: if you still have space available in the partition that hosts ibdata1 and you want to use it first then you need to delay these changes until you get to that point. From the moment you add ibdata2 to the common tablespace definition new data will start to be added into it.

Conclusion

The little experiment I did helped explain how some of the solutions proposed in my previous post for moving an InnoDB table outside the shared tablespace would work in practice and, most importantly, how much more disk space would be needed along the process. It was interesting to see that none of them made use of the tmpdir when they created temporary tables to intermediate the table conversion process; those were always created on the main datadir. The test table I’ve used had only a bit more than 10G, far away from a 1TB-sized table, so the results I’ve got may differ from what you would find in a larger environment.

As complementary information, MySQL 5.6 allows for the creation of InnoDB tables having their private tablespaces living outside the datadir (but that must be defined at table creation time and can’t be changed by means of an ALTER TABLE) as well as transportable tablespaces (which allow for the copy of private tablespaces from one database server to another).

The post When (and how) to move an InnoDB table outside the shared tablespace appeared first on MySQL Performance Blog.

When (and how) to move an InnoDB table outside the shared tablespace

Sat, 23/08/2014 - 00:29

In my last post, “A closer look at the MySQL ibdata1 disk space issue and big tables,” I looked at the growing ibdata1 problem under the perspective of having big tables residing inside the so-called shared tablespace. In the particular case that motivated that post, we had a customer running out of disk space in his server who was looking for a way to make the ibdata1 file shrink. As you may know, that file (or, as explained there, the set of ibdata files composing the shared tablespace) stores all InnoDB tables created when innodb_file_per_table is disabled, but also other InnoDB structures, such as undo logs and data dictionary.

For example, when you run a transaction involving InnoDB tables, MySQL will first write all the changes it triggers in an undo log, for the case you later decide to “roll them back”. Long standing, uncommited transactions are one of the causes for a growing ibdata file. Of course, if you have innodb_file_per_table disabled then all your InnoDB tables live inside it. That was what happened in that case.

So, how do you move a table outside the shared tablespace and change the storage engine it relies on? As importantly, how does that affects disk space use? I’ve explored some of the options presented in the previous post and now share my findings with you below.

The experiment

I created a very simple InnoDB table inside the shared tablespace of a fresh installed Percona Server 5.5.37-rel35.1 with support for TokuDB and configured it with a 1GB buffer pool. I’ve used a 50G partition to host the ‘datadir’ and another one for ‘tmpdir’:

Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg0-lvFernando1 50G 110M 47G 1% /media/lvFernando1 # datadir /dev/mapper/vg0-lvFernando2 50G 52M 47G 1% /media/lvFernando2 # tmpdir

Here’s the table structure:

CREATE TABLE `joinit` ( `i` int(11) NOT NULL AUTO_INCREMENT, `s` varchar(64) DEFAULT NULL, `t` time NOT NULL, `g` int(11) NOT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I populated it with 134 million rows using the following routine:

INSERT INTO joinit VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 ))); INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; # repeat until you reach your target number of rows

which resulted in the table below:

mysql> show table status from test like 'joinit'G *************************** 1. row *************************** Name: joinit Engine: InnoDB Version: 10 Row_format: Compact Rows: 134217909 Avg_row_length: 72 Data_length: 9783214080 Max_data_length: 0 Index_length: 0 Data_free: 1013972992 Auto_increment: 134872552 Create_time: 2014-07-30 20:42:42 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

The resulting ibdata1 file was showing to have 11G, which accounted in practice for 100% of the datadir partition use then. What follows next is a few experiences I did by converting that table to use a different storage engine, moving it outside the shared tablespace, compressing it, and dumping and restoring the database back to see the effects in disk space use. I haven’t timed how long running each command took and focused mostly on the generated files size. As a bonus, I’ve also looked at how to extend the shared table space by adding an extra ibdata file.

#1) Converting to MyISAM

Technical characteristics and features apart, MyISAM tables are know to occupy less disk space than InnoDB’s ones. How much less depends on the actual table structure. Here I made the conversion in the most simplest way:

mysql> ALTER TABLE test.joinit ENGINE=MYISAM;

which created the following files (the .frm file already existed):

$ ls -lh /media/lvFernando1/data/test/ total 8.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 16:21 joinit.frm -rw-rw---- 1 fernando.laudares admin 7.0G Jul 31 16:27 joinit.MYD -rw-rw---- 1 fernando.laudares admin 1.3G Jul 31 16:27 joinit.MYI

The resulting MyISAM files amounted for an additional 8.3G of disk space use:

/dev/mapper/vg0-lvFernando1 50G 19G 29G 40% /media/lvFernando1

I was expecting smaller files but, of course, the result depends largely on the data types of the columns composing the table. The problem (or the consequence) is that we end up with close to the double of the initial disk space being used:

As it happens with the other solutions presented in this section that migrate the target table outside the shared tablespace, the common/safest way to reclaim the freed (unused) space inside the ibdata1 file back to the operating system is by doing a dump & restore of the full database.

There’s an alternative approach with MyISAM though, which doesn’t involve dump & restore and only requires a MySQL restart. However, you need to convert all InnoDB tables to MyISAM, stop MySQL, delete all ib* files (there should be no remaining .ibd files after you’ve converted all InnoDB tables to MyISAM), and then restart MySQL again. Upon MySQL restart, ibdata1 will be re-created with it’s default initial size (more on this below). You can then convert the MyISAM tables back to InnoDB and if you have innodb_file_per_table enabled this time then the tables will be created with their own private tablespace file.

#2) Exporting the table to a private tablespace

Once you have innodb_file_per_table enabled you can move a table residing inside ibdata1 to it’s private tablespace (it’s own .ibd file) by either running ALTER TABLE or OPTIMIZE TABLE. Both commands create a “temporary” (though InnoDB, not MyISAM) table with it’s own tablespace file inside the database directory (and not in the tmpdir, as I believed it would happen), the rows from the target table being copied over there.

Here’s showing the temporary table (#sql-4f10_1) that was created while the process was still ongoing:

$ ls -lh /media/lvFernando1/data/test total 2.2G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 20:42 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 #sql-4f10_1.frm -rw-rw---- 1 fernando.laudares admin 2.2G Jul 30 23:12 #sql-4f10_1.ibd

and the resulting .ibd file from when the process completed:

$ ls -lh /media/lvFernando1/data/test total 9.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 joinit.frm -rw-rw---- 1 fernando.laudares admin 9.3G Jul 30 23:35 joinit.ibd

Note that the new joinit.ibd file is about 9.3G. Again, the process resulted in the use of extra disk space:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#3) Dump and restore: looking at the disk space use

As pointed, the way to reclaim unused disk space inside ibdata1 back to the file system (and consequently making it shrink) is by dumping the full database to a text file and then restoring it back. I’ve started by doing a simple full mysqldump encompassing all databases:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump.sql

which created the following file:

-rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 00:02 dump.sql

I’ve then stopped MySQL, wiped out the full datadir and used the script mysql_install_db to (re-)create the system tables (though that’s not needed, I did it to level comparisons; once you have all InnoDB tables out of the system tablespace you can simply delete all ib* files along any .ibd and .frm files for related InnoDB tables), started MySQL again, and finally restored the backup:

$ mysql -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox < dump.sql

This resulted in:

/dev/mapper/vg0-lvFernando1 50G 9.4G 38G 21% /media/lvFernando1

and:

-rw-rw---- 1 fernando.laudares admin 18M Jul 31 05:05 /media/lvFernando1/data/ibdata1

ibdata1 did shrink in the process (returning to the default* initial size) and we recovered back a couple of gigabytes that were not being used by the single InnoDB table that used to live inside it.

* The default value for innodb_data_file_path for the Percona Server release we used on the tests is “ibdata1:10M:autoextend”. However, the manual states that “the default behavior is to create a single auto-extending data file, slightly larger than 10MB“. Given that innodb_autoextend_increment is set to 8M by default what we get in practice is an initialized ibdata1 file of 18M, as seen above.

#4) Compressing the table

Back to the original scenario, with the test table still living in the shared tablespace, I’ve enabled innodb_file_per_table, set Barracuda as the innodb_file_format and then ran:

mysql> ALTER TABLE test.joinit ENGINE=INNODB ROW_FORMAT=Compressed;

This resulted in an .ibd file with half the size of an uncompressed one:

-rw-rw---- 1 fernando.laudares admin 4.7G Jul 31 13:59 joinit.ibd

The process was a lengthy one but resulted in considerable less disk space use than converting the table to MyISAM:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#5) Converting the table to TokuDB

I’ve used TokuDB version 7.1.6 (not the latest one), which was bundled in Percona Server 5.5.37-rel35.1.

mysql> ALTER TABLE test.joinit ENGINE=TOKUDB;

As was the case with converting the InnoDB table to it’s own tablespace, a temporary table was created to intermediate the process, with the table definition residing in the test database directory:

$ ls -lh /media/lvFernando1/data/test/ total 24K -rw-r----- 1 fernando.laudares admin 8.5K Jul 31 14:22 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 14:28 #sql-7f51_1.frm

However, TokuDB created a temporary file in the main datadir to copy the data into, show here at some point during the process:

-rwxrwx--x 1 fernando.laudares admin 32K Jul 31 14:28 _test_sql_7f51_1_main_a_2_19.tokudb -rwxrwx--x 1 fernando.laudares admin 16K Jul 31 14:28 _test_sql_7f51_1_status_a_1_19.tokudb -rw------- 1 fernando.laudares admin 528M Jul 31 14:29 tokuldNk5W4v

Once the process completed the file tokuldNk5W4v disappeared and the data ended up in the _test_sql_7f51_1_main_a_2_19.tokudb file:

-rwxrwx--x 1 fernando.laudares admin 1.1G Jul 31 14:32 _test_sql_7f51_1_main_d_1_19_B_0.tokudb

The new .tokudb file is so small compared to the other files obtained in the previous approaches that it almost goes unnoticed when looking at the disk space use:

/dev/mapper/vg0-lvFernando1 50G 12G 36G 25% /media/lvFernando1

Curiously, that file retained the name/reference of the temporary table it used, even though the temporary table file description (#sql-7f51_1.frm) disappeared as well; only joinit.frm remained under the test database directory.

I decided to do a dump & restore of the whole database to see what that would result in this case:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump2.sql $ ls -lh dump2.sql -rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 15:07 /home/fernando.laudares/dump2.sql

Once it was restored we found the ibdata1 file back to its initial default size and the disk space used by the datadir was down to a scant 1.1G. Curiously (again), the dump & restore procedure did fixed the name of the TokuDB table file:

-rwxrwx--x 1 fernando.laudares admin 980M Jul 31 15:36 _test_joinit_main_21_1_19_B_0.tokudb

The process of converting the resident InnoDB table to TokuDB was faster than compressing it with ROW_FORMAT=Compressed and resulted in a much smaller file. This is not to say that using TokuDB is the best solution but to point that the process can be accomplished in less time and make use of lesser extra disk space than InnoDB does, which can simply come up handy if you don’t have much space left. Also, remember the test was done with a not very large (and simple in structure) test table; you may need to check if yours can be converted to TokuDB and what changes to indexes you might need to do (if any).

#6) Expanding the shared tablespace

As previously mentioned, I’ve been using the default value of  “ibdata1:10M:autoextend” for innodb_data_file_path for my tests. If you have another partition with unused space available and you look only for a solution to the “running out of disk space” problem and don’t mind keeping your big table inside the shared tablespace, then you can expand it. That can be done by means of adding a second ibdata file to the tablespace definition. Of course, that only works if you create it outside the partition already hosting ibdata1.

To do so, you need to stop MySQL and verify the size of the ibdata1 file you have (list size in bytes as “ls -lh” will round it here):

$ ls -l /media/lvFernando1/data/ibdata1 -rw-r----- 1 fernando.laudares admin 10957619200 Aug 1 16:20 /media/lvFernando1/data/ibdata1

I had available space on the /media/lvFernando2 partition so I decided to have ibdata2 created there. To do so, I’ve added the following 2 lines to my.cnf:

innodb_data_home_dir = innodb_data_file_path=/media/lvFernando1/data/ibdata1:10957619200;/media/lvFernando2/tmp/ibdata2:10M:autoextend

Then I restarted MySQL and confirmed that ibdata2 was created there upon initialization:

$ ls -lh /media/lvFernando2/tmp/ total 10M -rw-rw---- 1 fernando.laudares admin 10M Aug 1 16:30 ibdata2

Three important things to note on this approach:

  1. There can be only one ibdata file listed in innodb_data_file_path configured with “autoextend” - the last one in the list.
  2. You need to redefine ibdata1 in innodb_data_file_path using it’s current size. Any other size won’t work.
  3. From the moment you redefined ibdata1 with it’s current size and added ibdata2 configured with “autoexend”, ibdata1 won’t grow any bigger; the common tablespace will be increased through ibdata2.

What the 3rd point really means in practice is that you need to plan this maneuver accordingly: if you still have space available in the partition that hosts ibdata1 and you want to use it first then you need to delay these changes until you get to that point. From the moment you add ibdata2 to the common tablespace definition new data will start to be added into it.

Conclusion

The little experiment I did helped explain how some of the solutions proposed in my previous post for moving an InnoDB table outside the shared tablespace would work in practice and, most importantly, how much more disk space would be needed along the process. It was interesting to see that none of them made use of the tmpdir when they created temporary tables to intermediate the table conversion process; those were always created on the main datadir. The test table I’ve used had only a bit more than 10G, far away from a 1TB-sized table, so the results I’ve got may differ from what you would find in a larger environment.

As complementary information, MySQL 5.6 allows for the creation of InnoDB tables having their private tablespaces living outside the datadir (but that must be defined at table creation time and can’t be changed by means of an ALTER TABLE) as well as transportable tablespaces (which allow for the copy of private tablespaces from one database server to another).

The post When (and how) to move an InnoDB table outside the shared tablespace appeared first on MySQL Performance Blog.

A closer look at the MySQL ibdata1 disk space issue and big tables

Fri, 22/08/2014 - 01:24

A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big frustration for many MySQL users over the years.

There’s a very old bug (“InnoDB ibdata1 never shrinks after data is removed,” Sept. 8 2003) documenting user dissatisfaction. Shortly before that issue celebrated its 10th anniversary, James Day, MySQL senior principal support engineer at Oracle, posted a comment explaining why things haven’t changed and he also offered possible alternative solutions. Maybe we’ll see it fixed in a future release of MySQL. We can only be sure that any new storage engine aiming to warrant the sympathy of MySQL users can’t make that same mistake again.

One general misunderstanding that exacerbates the problem is the belief that if we enable innodb_file_per_table then all InnoDB tables will live in their own tablespace (a “private” .ibd file), even though the manual is clear about the role this variable plays. The truth is that when you enable innodb_file_per_table it will only immediately affect how new InnoDB tables are created – it won’t magically export tables currently living in the shared tablespace into their own separate .ibd files. That can be manually accomplished at any time afterwards by running ALTER TABLE or OPTIMIZE TABLE on the target table. The “gotcha” here is that by doing so you’ll actually be using additional disk space – as much as the table size,  for the newly created .ibd file. ibdata1 won’t automatically shrink: the space previously used by that table will be marked as being “free” (for internal InnoDB use) but won’t be returned to the file system.

Note: Throughout this post I make a common reference between ibdata1 and the shared (also called system) tablespace. In fact, the latter can be composed by a list of file definitions (ibdata1;ibdata2, …). Each file can have a fixed size or be specified with the autoextend parameter and have a cap limiting how big they can grow (well, actually only the last file defined in that list can). The default setting for the variable ruling how the shared tablespace is defined has it living on a file located in the datadir, named ibdata1, and configured with autoextend, hence the popular reference of a “growing ibdata1″.

A big table scenario

The shared tablespace contains more than data and indexes from InnoDB tables created inside it, such as the rollback segment for running transactions (a.k.a. “undo logs”). My colleague Miguel Angel Nieto wrote a blog post last year that explains in detail what is stored inside ibdata files, why they can grow bigger than the sum of data from the tables it hosts and won’t “shrink,” and explains the only real way to reclaim unused disk space to the file system.

But there’s one scenario where the ibdata1 file grows in a “legitimate” way: When it’s storing a big table. If there’s a big table living inside the shared tablespace accounting for most of its size then there’s no “shrinking” it. We can argue this would run counter to best practices (or not) but let’s remember that innodb_file_per_table used to be disabled by default. The first releases of MySQL 5.5 had it enabled by default but then the later ones had it disabled. We find the exact opposite happening with MySQL 5.6 - it is enabled in the later versions.

If a developer who is not a database specialist much less a MySQL expert creates a successful product around a single table and was “unlucky” enough to be using a MySQL release that had innodb_file_per_table disabled by default, he or she might soon find a Terabyte table living inside ibdata1 (BTW, I recommend Bill Karwin’s excellent “How to Avoid Common (but Deadly) MySQL Development Mistakes” recorded webinar and slides for all developers using MySQL).

What to do then ?

We recently helped a customer with such a problem. They found themselves running out of disk space with a steadily growing ibdata1 file containing data from a 1.5 Terabyte table. The replicas weren’t able to keep up either, and binary logs where pilling up, thus contributing to more disk space use. They had access to a network mounted partition in their main master server but the storage was quite slow. We started looking for possible solutions to their case and the team came up with the following list of “solutions”:

1) Add one or more disks to the system. Hopefully they had the datadir lying inside a LVM partition using the XFS filesystem so the perspective of simply increasing it’s size by adding more disk space was a good one to contemplate.

2) Re-arranging files in different partitions. The binary logs were stored inside the datadir and got to use a considerable amount of disk space so moving them to another partition would free a few hundreds gigabytes already. The complicated part here was that the only available alternative storage was the slow network one, and that would contribute to make the replicas delay yet more. Later we contemplated adding a second file (ibdata2) to the shared tablespace to be located in a different partition but, again, the only one with space available was the slow network mount.

3) Archive data. Delete a large amount of old data from the table (possibly using pt-archiver) and then do a mysqldump dump and restore the table with less data; or simply leave it as is – even though the freed space won’t be reclaimed by the filesystem it will be made available internally for InnoDB to re-use it. Be aware that deleting data is a slow process; if you have a lot of old unused data then it might be more efficient to dump the data you want to preserve, truncate the table, and then import the data back to it.

4) Convert the table to MyISAM. This could be seem as a polemic solution: it certainly shouldn’t be taken lightly as a change of storage engine is a big deal. A giving application that works with InnoDB might simply not do with MyISAM. While we do not normally recommend MyISAM, it does take less space to store data than InnoDB. Be aware that MyISAM is not a transactional storage engine, and if you crash, you will have to perform a lengthy repair process on the table. Also keep in mind that after converting the table to MyISAM you would have to either do a dump & restore of the full database to have ibdata1 shrink OR make sure you have all InnoDB tables converted to MyISAM, stop MySQL, remove all ib* files, and start MySQL again to have it recreate ibdata1 with its default size.

5) Try compressing the table. Rebuild the InnoDB table with file per table enabled and ROW_FORMAT=Compressed. If replicas are already lagging behind, replication may not be able to keep up when compressed tables are used. Also, if you have a very high write volume, it probably won’t perform well enough. If that is the case, this solution is not a viable one. However, if you would consider switching to the new TokuDB storage engine you will count with much better compression than InnoDB overall, and also better performance for write intensive workloads.

Of course, from all the outlined solutions the first 2 look to be by far the least invasive ones. The problem is that not always the server has free slots waiting for the addition of an extra disk, plus the arrangement around RAID setups brings extra constraints. Moving the binary logs to a different partition could be handy but it’s not always practical or possible at all, as is the case of making more space available by deleting/archiving data and extending the shared tablespace by adding a new ibdata2 file residing in a different partition. Finally, the other solutions involving converting the table to a different storage engine and compressing were not an option in the case of this particular customer because they required the use of additional disk space in the process, which they didn’t had.

What the customer ended doing was dumping the whole data and importing it into another server, compressing the table in the process. And then re-importing it back into the main server. It took time to complete and it wasn’t an option they were keen to try at first. But sometimes there’s just no easy way out of a problem like this. They made changes on the replicas to allow them to keep up with replication even when using compressed tables and the problem was solved for now, even though they’re aware the disk space they recovered won’t be enough in the long term.

Conclusion

The most important take-away here is to never get yourself into the situation of getting too close to running out of disk space. The best solutions available to minimize disk space being used by InnoDB tables inside the shared tablespace usually require the (temporary, if you intend to do a dump & restore afterwards) use of yet more disk space. And the bigger the table the longer it will take for dump & restore and table conversions so if you’re running out of time it only make things more complicated.

This case made me curious to explore some of the other options mentioned above a little further. In a followup post I’ll share some of my findings.

The post A closer look at the MySQL ibdata1 disk space issue and big tables appeared first on MySQL Performance Blog.

A closer look at the MySQL ibdata1 disk space issue and big tables

Fri, 22/08/2014 - 01:24

A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big frustration for many MySQL users over the years.

There’s a very old bug (“InnoDB ibdata1 never shrinks after data is removed,” Sept. 8 2003) documenting user dissatisfaction. Shortly before that issue celebrated its 10th anniversary, James Day, MySQL senior principal support engineer at Oracle, posted a comment explaining why things haven’t changed and he also offered possible alternative solutions. Maybe we’ll see it fixed in a future release of MySQL. We can only be sure that any new storage engine aiming to warrant the sympathy of MySQL users can’t make that same mistake again.

One general misunderstanding that exacerbates the problem is the belief that if we enable innodb_file_per_table then all InnoDB tables will live in their own tablespace (a “private” .ibd file), even though the manual is clear about the role this variable plays. The truth is that when you enable innodb_file_per_table it will only immediately affect how new InnoDB tables are created – it won’t magically export tables currently living in the shared tablespace into their own separate .ibd files. That can be manually accomplished at any time afterwards by running ALTER TABLE or OPTIMIZE TABLE on the target table. The “gotcha” here is that by doing so you’ll actually be using additional disk space – as much as the table size,  for the newly created .ibd file. ibdata1 won’t automatically shrink: the space previously used by that table will be marked as being “free” (for internal InnoDB use) but won’t be returned to the file system.

Note: Throughout this post I make a common reference between ibdata1 and the shared (also called system) tablespace. In fact, the latter can be composed by a list of file definitions (ibdata1;ibdata2, …). Each file can have a fixed size or be specified with the autoextend parameter and have a cap limiting how big they can grow (well, actually only the last file defined in that list can). The default setting for the variable ruling how the shared tablespace is defined has it living on a file located in the datadir, named ibdata1, and configured with autoextend, hence the popular reference of a “growing ibdata1″.

A big table scenario

The shared tablespace contains more than data and indexes from InnoDB tables created inside it, such as the rollback segment for running transactions (a.k.a. “undo logs”). My colleague Miguel Angel Nieto wrote a blog post last year that explains in detail what is stored inside ibdata files, why they can grow bigger than the sum of data from the tables it hosts and won’t “shrink,” and explains the only real way to reclaim unused disk space to the file system.

But there’s one scenario where the ibdata1 file grows in a “legitimate” way: When it’s storing a big table. If there’s a big table living inside the shared tablespace accounting for most of its size then there’s no “shrinking” it. We can argue this would run counter to best practices (or not) but let’s remember that innodb_file_per_table used to be disabled by default. The first releases of MySQL 5.5 had it enabled by default but then the later ones had it disabled. We find the exact opposite happening with MySQL 5.6 - it is enabled in the later versions.

If a developer who is not a database specialist much less a MySQL expert creates a successful product around a single table and was “unlucky” enough to be using a MySQL release that had innodb_file_per_table disabled by default, he or she might soon find a Terabyte table living inside ibdata1 (BTW, I recommend Bill Karwin’s excellent “How to Avoid Common (but Deadly) MySQL Development Mistakes” recorded webinar and slides for all developers using MySQL).

What to do then ?

We recently helped a customer with such a problem. They found themselves running out of disk space with a steadily growing ibdata1 file containing data from a 1.5 Terabyte table. The replicas weren’t able to keep up either, and binary logs where pilling up, thus contributing to more disk space use. They had access to a network mounted partition in their main master server but the storage was quite slow. We started looking for possible solutions to their case and the team came up with the following list of “solutions”:

1) Add one or more disks to the system. Hopefully they had the datadir lying inside a LVM partition using the XFS filesystem so the perspective of simply increasing it’s size by adding more disk space was a good one to contemplate.

2) Re-arranging files in different partitions. The binary logs were stored inside the datadir and got to use a considerable amount of disk space so moving them to another partition would free a few hundreds gigabytes already. The complicated part here was that the only available alternative storage was the slow network one, and that would contribute to make the replicas delay yet more. Later we contemplated adding a second file (ibdata2) to the shared tablespace to be located in a different partition but, again, the only one with space available was the slow network mount.

3) Archive data. Delete a large amount of old data from the table (possibly using pt-archiver) and then do a mysqldump dump and restore the table with less data; or simply leave it as is – even though the freed space won’t be reclaimed by the filesystem it will be made available internally for InnoDB to re-use it. Be aware that deleting data is a slow process; if you have a lot of old unused data then it might be more efficient to dump the data you want to preserve, truncate the table, and then import the data back to it.

4) Convert the table to MyISAM. This could be seem as a polemic solution: it certainly shouldn’t be taken lightly as a change of storage engine is a big deal. A giving application that works with InnoDB might simply not do with MyISAM. While we do not normally recommend MyISAM, it does take less space to store data than InnoDB. Be aware that MyISAM is not a transactional storage engine, and if you crash, you will have to perform a lengthy repair process on the table. Also keep in mind that after converting the table to MyISAM you would have to either do a dump & restore of the full database to have ibdata1 shrink OR make sure you have all InnoDB tables converted to MyISAM, stop MySQL, remove all ib* files, and start MySQL again to have it recreate ibdata1 with its default size.

5) Try compressing the table. Rebuild the InnoDB table with file per table enabled and ROW_FORMAT=Compressed. If replicas are already lagging behind, replication may not be able to keep up when compressed tables are used. Also, if you have a very high write volume, it probably won’t perform well enough. If that is the case, this solution is not a viable one. However, if you would consider switching to the new TokuDB storage engine you will count with much better compression than InnoDB overall, and also better performance for write intensive workloads.

Of course, from all the outlined solutions the first 2 look to be by far the least invasive ones. The problem is that not always the server has free slots waiting for the addition of an extra disk, plus the arrangement around RAID setups brings extra constraints. Moving the binary logs to a different partition could be handy but it’s not always practical or possible at all, as is the case of making more space available by deleting/archiving data and extending the shared tablespace by adding a new ibdata2 file residing in a different partition. Finally, the other solutions involving converting the table to a different storage engine and compressing were not an option in the case of this particular customer because they required the use of additional disk space in the process, which they didn’t had.

What the customer ended doing was dumping the whole data and importing it into another server, compressing the table in the process. And then re-importing it back into the main server. It took time to complete and it wasn’t an option they were keen to try at first. But sometimes there’s just no easy way out of a problem like this. They made changes on the replicas to allow them to keep up with replication even when using compressed tables and the problem was solved for now, even though they’re aware the disk space they recovered won’t be enough in the long term.

Conclusion

The most important take-away here is to never get yourself into the situation of getting too close to running out of disk space. The best solutions available to minimize disk space being used by InnoDB tables inside the shared tablespace usually require the (temporary, if you intend to do a dump & restore afterwards) use of yet more disk space. And the bigger the table the longer it will take for dump & restore and table conversions so if you’re running out of time it only make things more complicated.

This case made me curious to explore some of the other options mentioned above a little further. In a followup post I’ll share some of my findings.

The post A closer look at the MySQL ibdata1 disk space issue and big tables appeared first on MySQL Performance Blog.

How to use MySQL Global Transaction IDs (GTIDs) in production

Wed, 20/08/2014 - 18:00

Reconfiguring replication has always been a challenge with MySQL. Each time the replication topology has to be changed, the process is tedious and error-prone because finding the correct binlog position is not straightforward at all. Global Transaction IDs (GTIDs) introduced in MySQL 5.6 aim at solving this annoying issue.

The idea is quite simple: each transaction is associated with a unique identifier shared by all servers in a given replication topology. Now reconfiguring replication is easy as the correct binlog position can be automatically calculated by the server.

Awesome? Yes it is! However GTIDs are also changing a lot of things in how we can perform operations on replication. For instance, skipping transactions is a bit more difficult. Or you can get bitten by errant transactions, a concept that did not exist before.

This is why I will be presenting a webinar on Aug. 27 at 10 a.m. PDT: Using MySQL Global Transaction IDs in Production.

You will learn what you need to operate a replication cluster using GTIDs: how to monitor replication status or to recover from replication errors, tools that can help you and tools that you should avoid and also the main issues that can occur with GTIDs.

This webinar is free but you can register today to reserve your seat. And a recording will be available afterwards. See you next week!

The post How to use MySQL Global Transaction IDs (GTIDs) in production appeared first on MySQL Performance Blog.