Feed aggregator

MariaDB: Selective binary logs events

MySQL Performance Blog - 6 hours 16 min ago

In the first post in a series on MariaDB features we find interesting, we begin with selectively skipping replication of binlog events. This feature is available on MariaDB 5.5 and 10.

By default when using MySQL’s standard replication, all events are logged in the binary log and those binary log events are replicated to all slaves (it’s possible to filter out some schema). But with this feature, it’s also possible to bypass some events to be replicated on the slave(s) even if they are written in the binary log. Having those event in the binary logs is always useful for point-in-time recovery.

Indeed, usually when we need to not replicate an event, we set sql_log_bin = 0 and the event is bypassed: neither written into the binlog, neither replicated to slave(s).

So with this new feature, it’s possible to just set a session variable to tag events that will be written into the binary log and bypassed on demand on some slaves.

And it’s really easy to use, on the master you do:

set skip_replication=1;

and on the slave(s) having replicate_events_marked_for_skip='FILTER_ON_MASTER' or 'FILTER_ON_SLAVE' the events skipped on the master won’t be replicated.

The valid values for replicate_events_marked_for_skip are:

  • REPLICATE (default) : skipped events are replicated on the slave
  • FILTER_ON_SLAVE : events so marked will be skipped on the slave and not replicated
  • FILTER_ON_MASTER : the filtering will be done on the master so the slave won’t even receive it and then save network bandwidth

That’s a cool feature but when this can be very useful?

Use case:

For archiving this can be very interesting. Indeed most of the time when people is archiving data, they use something like pt-archiver that deletes the data and copy the removed data on an archive server.

Thanks to this feature, instead of having an archiving server where we copy the deleted data, it’s possible to have a slave where we won’t delete the data. This will be much faster (smarter?) and allows to have an archiving server always up to date. Of course in this case sql_log_bin = 0 would have worked (if we ignore the point-in-time recovery).
But with a Galera Cluster? Yes that’s where this feature is really cool, if we would have used sql_log_bin = 0 on a Galera Cluster node, all other nodes would have ignored the delete and the result would be inconsistency between the nodes.

So if you use an asynchronous slave as an archiving server of a Galera Cluster, this feature is really mandatory.

As illustrated below, you can have a MariaDB Galera Cluster node joining a Percona XtraDB Cluster that will be used to delete historical data using pt-archiver:

pt-archiver is started with --set-vars "skip_replication=1"

The post MariaDB: Selective binary logs events appeared first on MySQL Performance Blog.

Percona Server 5.1.73-14.12 is now available

MySQL Performance Blog - Thu, 31/07/2014 - 23:52

Percona Server version 5.1.73-14.12

Percona is glad to announce the release of Percona Server 5.1.73-14.12 on July 31st, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.1.73, including all the bug fixes in it, Percona Server 5.1.73-14.12 is now the current stable release in the 5.1 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.1.73-14.12 milestone at Launchpad.

NOTE: Packages for Debian 7.0 (wheezy), Ubuntu 13.10 (saucy) and 14.04 (trusty) are not available for this release due to conflict with newer packages available in those releases.

Bugs Fixed:

  • Percona Server couldn’t be built with Bison 3.0. Bug fixed #1262439, upstream #71250.
  • Ignoring Query Cache Comments feature could cause server crash. Bug fixed #705688.
  • Database administrator password could be seen in plain text when debconf-get-selections was executed. Bug fixed #1018291.
  • If XtraDB variable innodb_dict_size was set, the server could attempt to remove a used index from the in-memory InnoDB data dictionary, resulting in a server crash. Bugs fixed #1250018 and #758788.
  • Ported a fix from MySQL 5.5 for upstream bug #71315 that could cause a server crash f a malformed GROUP_CONCAT function call was followed by another GROUP_CONCAT call. Bug fixed #1266980.
  • MTR tests from binary tarball didn’t work out of the box. Bug fixed #1158036.
  • InnoDB did not handle the cases of asynchronous and synchronous I/O requests completing partially or being interrupted. Bugs fixed #1262500 (upstream #54430).
  • Percona Server version was reported incorrectly in Debian/Ubuntu packages. Bug fixed #1319670.
  • Percona Server source files were referencing Maatkit instead of Percona Toolkit. Bug fixed #1174779.
  • The XtraDB version number in univ.i was incorrect. Bug fixed #1277383.

Other bug fixes: #1272732, #1167486, and #1314568.

Release notes for Percona Server 5.1.73-14.12 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.1.73-14.12 is now available appeared first on MySQL Performance Blog.

Testing that all projects need

Stewart Smith - Thu, 31/07/2014 - 15:50

Today, I was reminded of a Jim Starkey quote on the Random Query Generator:

“The Colonoscopy of Database Software”
- Jim Starkey

If your project does not have something that you can adapt that quote to, odds are your testing is inadequate.

Examining the TokuDB MySQL storage engine file structure

MySQL Performance Blog - Thu, 31/07/2014 - 01:07

As we know different storage engines in MySQL have different file structures. Every table in MySQL 5.6 must have a .frm file in the database directory matching the table name. But where the rest of the data resides depends on the storage engine.

For MyISAM we have .MYI and .MYD files in the database directory (unless special settings are in place); for InnoDB we might have data stored in the single table space (typically ibdata1 in the database directory) or as file per table (or better said file per partition) producing a single file with .ibd extension for each table/partition. TokuDB as of this version (7.1.7) has its own innovative approach to storing the table contents.

I have created the table in the database test having the following structure:

CREATE TABLE `mytable` ( `id` int(10) unsigned NOT NULL, `c` varchar(15) NOT NULL, `d` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `c` (`c`), KEY `d` (`d`,`c`), KEY `d_2` (`d`) ) ENGINE=TokuDB DEFAULT CHARSET=latin1

No files appear in the “test” database directory besides mytable.frm, however few files are created in the database directory:

-rwxrwx--x 1 mysql mysql 40960 Jul 29 21:01 _test_mytable_key_c_22f19b0_3_19_B_0.tokudb -rwxrwx--x 1 mysql mysql 16896 Jul 29 21:02 _test_mytable_key_d_2_22f223b_3_19_B_0.tokudb -rwxrwx--x 1 mysql mysql 16896 Jul 29 21:01 _test_mytable_key_d_22f1c9a_3_19_B_0.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:01 _test_mytable_main_22f1818_2_19.tokudb -rwxrwx--x 1 mysql mysql 65536 Jul 29 21:02 _test_mytable_status_22f1818_1_19.tokudb

As you can see the table is presented by a series of files – the “status” file, the “main” table which contains clustered fractal tree index (primary key) plus each index is stored in its own file. Note how files are named – to include the database name, file name and the key name (the name you give to the key, not the columns involved). This is followed by something like “22f1818_1_19″ which I assume is kind of internal TokuDB object identifier.

Note also (at least in my system) files are created with executable bit set. I see no reason for this and this is probably just a minor bug.

Another minor bug (or intended design limitation?) seems to be TokuDB might loose the actual table name in its file name when you alter the table. For example as I altered the table to drop one of the keys and add another one named “superkey” I see the “mytable” name is replaced with “sql_390c_247″ which looks very much like the temporary table which was used to rebuild the table:

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_c_22f6f7d_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_d_22f6f7d_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_superkey_22f6f7d_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:14 _test_sql_390c_247_main_22f6f7d_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:14 _test_sql_390c_247_status_22f6f7d_1_19.tokudb

I like the approach of storing different indexes in the different files as this makes it much easier to drop the index as well as potentially allows the placement of indexes onto different storage if it is desired for some reason. However putting all tables in the database root is a bad idea – having substantial amount of tables, especially with few indexes, each producing huge amounts of files, makes it inconvenient to work with the database directory (which often contains other files – log files, binary logs etc.) plus it might push file systems to their limits or performance limits dealing with huge amounts of files in the single directory.

Many also like having files in the data directory as it allows, in basic configurations, to use simple Unix tools such as du to see how much space given database physically takes.

Same as InnoDB and MyISAM, TokuDB will create a separate set of files for each partition, placing it in the same directory, having same table partitioned by HASH on primary key with 4 partitions I observe:

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_c_22f9f35_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_d_22f9f35_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_superkey_22f9f35_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_main_22f9f35_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_status_22f9f35_1_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_c_22f9f8e_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_d_22f9f8e_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_superkey_22f9f8e_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_main_22f9f8e_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_status_22f9f8e_1_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_c_22f9fe1_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_d_22f9fe1_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_superkey_22f9fe1_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_main_22f9fe1_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_status_22f9fe1_1_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_c_22f9ffb_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_d_22f9ffb_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_superkey_22f9ffb_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_main_22f9ffb_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_status_22f9ffb_1_19.tokudb

As you can see “P_p1″ to “P_p2″ suffixes added to each of the files.

What other files exist beyond those which come from TokuDB tables?

There are few system files

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:16 tokudb.directory -rwxrwx--x 1 mysql mysql 16384 Jul 17 19:09 tokudb.environment -rwxrwx--x 1 mysql mysql 1048576 Jul 29 21:22 tokudb.rollback

Which as their name say contain “directory” – metadata about tables and indexes in the system, rollback – contains data which is needed for transaction rollback and environment contains some kind of information about environment. I did not dig into this – I just see this file is not changed after I have started the instance unlike other 2 which are changed when table structure is changed (tokudb.directory) or when database is modified (tokudb.rollback)

Next you will see

-rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_data -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_environment -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_logs -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_recovery -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_temp

I see TokuDB really tries to lock the database instance preventing concurrent access to the directory with multiple files. I think InnoDB does it more clean way placing the lock on system tablespace which does not require extra files… though the TokuDB team might have some specific reason to do it this way. Might be Oracle holds the software patent on preventing concurrent database operation by locking the file?

Finally there is the transaction log file:

-rwx------ 1 mysql mysql 37499084 Jul 29 21:34 log000000002593.tokulog25

TokuDB transaction log files are not pre-allocated like InnoDB’s but they look more similar to MySQL binary logs – they have sequentially increasing file numbers which will increment as new files are created, file itself will grow as new data is written to the log. As I understand log rotation happens during checkpoint and you would typically see only one log file.

There is a lot more for me to learn when it comes to TokuDB file layout and purpose of individual files, yet I hope this provides you with good basic overview of the TokuDB MySQL storage engine.

Further Reading: Rich Prohaska from TokuDB team was kind enough to share some more good reading with me, for those extra curious: TokuDB Files and File Descriptors, Separate Database Directories Design

The post Examining the TokuDB MySQL storage engine file structure appeared first on MySQL Performance Blog.

Prevent MySQL downtime: Set max_user_connections

MySQL Performance Blog - Tue, 29/07/2014 - 22:05

One of the common causes of downtime with MySQL is running out of connections. Have you ever seen this error? “ERROR 1040 (00000): Too many connections.” If you’re working with MySQL long enough you surely have. This is quite a nasty error as it might cause complete downtime… transient errors with successful transactions mixed with failing ones as well as only some processes stopping to run properly causing various kinds of effects if not monitored properly.

There are number of causes for running out of connections, the most common ones involving when the Web/App server is creating unexpectedly large numbers of connections due to a miss-configuration or some script/application leaking connections or creating too many connections in error.

The solution I see some people employ is just to increase max_connections to some very high number so MySQL “never” runs out of connections. This however can cause resource utilization problems – if a large number of connections become truly active it may use a lot of memory and cause the MySQL server to swap or be killed by OOM killer process, or cause very poor performance due to high contention.

There is a better solution: use different user accounts for different scripts and applications and implement resource limiting for them. Specifically set max_user_connections:

mysql> GRANT USAGE ON *.* TO 'batchjob1'@'localhost' -> WITH MAX_USER_CONNECTIONS 10;

This approach (available since MySQL 5.0) has multiple benefits:

Security – different user accounts with only required permissions make your system safer from development errors and more secure from intruders
Preventing Running out of Connections – if there is a bug or miss-configuration the application/script will run out of connections of course but it will be the only part of the system affected and all other applications will be able to use the database normally.
Overload Protection – Additional numbers of connections limits how much queries you can run concurrently. Too much concurrency is often the cause of downtime and limiting it can reduce the impact of unexpected heavy queries running concurrently by the application.

In addition to configuring max_user_connections for given accounts you can set it globally in my.cnf as “max_user_connections=20.” This is too coarse though in my opinion – you’re most likely going to need a different number for different applications/scripts. Where max_user_connections is most helpful is in multi-tenant environments with many equivalent users sharing the system.

The post Prevent MySQL downtime: Set max_user_connections appeared first on MySQL Performance Blog.

AWS 2Q14 and why the sky is not falling

CloudPundit: Massive-Scale Computing - Mon, 28/07/2014 - 22:39

Amazon posted weaker 2Q 2014 results for Amazon Web Services, leading some to speculate about competitive pressures despite continued enormous growth in usage.

Investors are asking, probably reasonably, what’s going on here. Is the overall market for cloud computing weakening? Are competitors taking more market share? Or is this largely the result of the price cuts that went into effect at the beginning of 2Q? And if it’s the price cuts, are these price cuts temporary, or are they part of a larger trend that eventually drives the price to zero?

The TL;DR version: No, cloud growth remains tremendous. No, AWS’s market share likely continues to grow despite the fact that they’re already the dominant player. Yes, this is a result of the price cuts. No, the price cuts are permanent, and yes, cuts will eventually likely drive prices down to near-cost, but this is nevertheless not a commodity market.

The deep dive follows. Note that when I use the term “market share”, I do not mean revenue; I mean revenue-generating capacity-in-use, which controls for the fact that prices vary significantly across providers.

What’s with these price drops?

I have said repeatedly in the past that the dynamics of the cloud IaaS market (which is increasingly also convergent with the high-control PaaS market) are fundamentally those of a software market. It is a market in which customers are ultimately buying IT operations management software as a service, and as they go up-stack, middleware as a service. Yes, they are also getting the underlying compute, storage, and networking resources, but the major value is the actually in all the software that automates this stuff and delivers it as an easy-to-consume, less-effort-to-manage solution. The automation reduces a customer’s labor costs, which is where the customer sees cost-savings in services over doing it themselves.

You’ll note that in other SaaS markets, the infrastructure is effectively delivered at cost. That’s extremely likely to be true of the IaaS market over time, as well. Furthermore, like other software markets, the largest vendors with the greatest breadth and depth of capabilities are the winners, and they’re made extra-sticky by their ecosystems. (Think Oracle.) Over time, the IaaS providers will make most of their margin off higher-level services rather than the raw resources; think about the difference between what a customer pays for Redshift data warehousing, versus raw EC2 compute plus EBS storage.

The magnitude of the price drops is scaring away many rival providers. So is the pace of innovation. Few competitors have the deep pockets or willpower to pour money into engineering and data centers in order to compete at this level. Most are now scurrying to stake out a niche of the market where they believe they can differentiate.

Lowering the price expands the addressable market, as well. The cheaper it is to do it in the cloud, the more difficult it is to make a business case to do an on-premises solution, especially a private cloud. Many of Gartner’s clients tell us that even if they have a financially viable case to build a private cloud right now, their costs will be essentially static over the amortization period of 3 to 5 years — versus their expectation that the major IaaS providers will drop prices 30% every year. Time-to-value for private cloud is generally 18 to 24 months, and it typically delivers a much more limited set of features, especially where developer enablement is concerned. It’s tough for internal IT to compete, especially when the major IT vendors aren’t delivering software that allows IT to create equivalent capabilities at the speed of an AWS, Microsoft, or Google.

The size of the price cut certainly had a negative impact on AWS’s revenues this past quarter. The price cut is likely larger than AWS would have done without pressure from Google. At the same time, the slight dip in revenue, versus the magnitude of the cuts, makes it clear that AWS is still growing at a staggeringly fast pace.

The Impact of Microsoft Azure

Microsoft has certainly been on an aggressive tear in the market over the last year, especially since September 2013, and Azure has been growing impressively. Microsoft has been extremely generous with both discounts and with free credits, which has helped fuel the growth of Azure usage. But interestingly, Microsoft’s proactive evangelism to their customers has helped expand the market, particularly because Microsoft has been good at convincing mainstream adopters that they’re actually late adopters, spurring companies to action. Microsoft’s comprehensive hybrid story, which spans applications and platforms as well as infrastructure, is highly attractive to many companies, drawing them towards the cloud in general.

Some customers who have been pitched Azure will actually go to Azure. But for many, this is a trigger to look at several providers. It’s not unusual for customers to then choose AWS over Azure as the primary provider, due to AWS’s better feature set, greater maturity, and better ecosystem; those customers will probably also do things in Azure, although they may be earlier-stage and smaller projects. Of course, many existing AWS customers are also adding Azure as a secondary provider. But it may well be that Microsoft’s serious entry into this market has actually helped AWS, in terms of absolute usage gains, more than it has hurt it — for the time being, of course. Over time, Microsoft’s share gains will come at AWS’s expense.

The Impact of Google

Google has been delivering technology at an impressive pace, but there is an enormous gulf between its technology capabilities and its go-to-market prowess. They also have to overcome the perception problem that people aren’t sure if Google is serious about this market, and are therefore reluctant to commit to the platform.

While everyone is super-interested in what Google is doing, at the moment they do not seem to be winning significant customers from AWS. They’re doing reasonably well in batch-computing scenarios (more as a rival to AWS spot instances), and in scenarios where all of Google Cloud Platform, including App Engine, is of interest. But they do not seem to have really gotten market momentum, yet.

However, in this market and in many other markets, a competitor does not need to win over the market leader in order to hurt them. They merely need to change customer expectations of what the price should be — and AWS has allowed Google to set the price. As long as Google seems like a credible threat, AWS will likely continue to be competitive on price, at least for those things that Google also offers. Microsoft has already publicly pledged to be price-competitive with AWS, so that pretty much guarantees that the three providers will move in lockstep on pricing.

The Impact of Smaller Providers

Digital Ocean, in particular, is making a lot of noise in the market. They’ve clearly established themselves as a VPS provider to watch. (VPS: virtual private server, a form of mass-market hosting.) However, my expectation is that Digital Ocean’s growth comes primarily at the expense of providers like Rackspace and Media Temple (owned by GoDaddy), rather than at the expense of AWS et.al. A Digital Ocean droplet (VM) is half the price of an AWS t2.micro (the cheapest thing you can get from AWS), and they’ve been generous with free-service coupons.

VPS providers have friendly control panels, services that are simple and thus easy to use, and super-low prices; they tend to have a huge customer base, but each customer is, on the average, tiny (usually just a single VM). These days, VPS is looking more and more like cloud IaaS, but the orientation of the providers tends to be quite different, both from a technology and a go-to-market perspective. There’s increasing fluidity between the VPS and cloud IaaS market, but the impact on AWS is almost certainly minimal. (I imagine that some not-insignificant percentage of AWS’s free tier would be on VPS instead if not for the free service, though.)

There’s plenty of other noise out there. IBM is aggressively pitching SoftLayer to its customer base, but the deals I’ve seen have generally been bare metal on long-term contracts, usually as a replatform of an IBM data center outsourcing deal. Rackspace’s return to its managed hosting roots is revitalizing. CenturyLink continues to have persuasive sales. VMware customers remain curious about vCHS. And so on. But none of these providers are growing the way that AWS and Microsoft Azure are, and both providers are gaining overall market share at the expense of pretty much everyone else.

The sky is not falling

With Microsoft and Google apparently now serious about this market, AWS finally has credible competitors. Having aggressive, innovative rivals are helping to push this market forward even faster, to the detriment of most other providers in the market, as well as the IT vendors selling do-it-yourself on-premises private cloud. AWS is likely to continue to dominate this market for years, but the market direction is no longer as thoroughly in its control.


What I learned while migrating a customer MySQL installation to Amazon RDS

MySQL Performance Blog - Mon, 28/07/2014 - 21:00

Hi, I recently had the experience of assisting with a migration of a customer MySQL installation to Amazon RDS (Relational Database Service). Amazon RDS is a great platform for hosting your MySQL installation and offers the following list of pros and cons:

  • You can scale your CPU, IOPS, and storage space separately by using Amazon RDS. Otherwise you need to take downtime and upgrade physical components of a rack-mounted server.
  • Backups, software version patching, failure detection, and (some) recovery is automated with Amazon RDS.
  • You lose shell access to your DB instance
  • You lose SUPER privilege for regular users. Many SUPER-type statements and commands are provided for as a Stored Procedure.
  • It is easy to set up multiple read replicas (slaves in READ_ONLY=1 mode).
  • You can set up a secondary sychronous instance for failover in the event your primary instance fails.

While this article is written to be Amazon RDS-specific it also has implications for any sort of migration.

  1. The only way to interface with RDS is through mysql client, which means loading data must be done using SQL. This means you need to use mysqldump or mydumper, which can be a large endeavour should your dataset be north of 500GB — this is a lot of single threaded activity!  Think about not only how long dumping and loading will take, but also factor in how much time it will take for replication to catch up on the hours/days/weeks your dumping and loading procedure took.  You might need to allocate more disk space and Provisioned IOPS to your RDS node in order to improve disk throughput, along with a change to innodb_flush_log_at_trx_commit, and sync_binlog.
  2. RDS is set to UTC (system_time_zone=UTC) and this cannot be changed as in Parameter Groups you will see that default_time_zone is set as Modifiable=false. This can bite you if you are planning to use RDS as a slave for a short while before failing the application over to Amazon RDS.  If you have configured binlog_format=STATEMENT on your master and you have TIMESTAMP columns, this will lead to differences in RDS data set for absolute values ’2014-07-24 10:15:00′ vs NOW(). It is also a concern for the Developer who may not be explicitly declaring their MySQL connections to set an appropriate time zone. Often the best piece of advice can be to leave all database data in UTC no matter where the server is physically located, and deal with localization at the presentation layer.
  3. Amazon RDS by default has max_allowed_packet=1MB. This is pretty low as most other configs are 16MB so if you’re using extended-insert (by default, you are), the size of each insert statement will be close to 16MB and thus can lead to errors related to “packet too big” on Amazon RDS side, thus failing out an import.
  4. Amazon RDS does not support the SUPER privilege for regular users. For example, this becomes quite a challenge as many tools (Percona Toolkit) are authored to assume you have SUPER-level access on all nodes — simple tasks become vastly more complicated as you need to think of clever workarounds (I’m looking at you pt-table-sync!).
  5. Triggers and views thus cannot be applied using the default mysqldump syntax which includes SQL DEFINER entries — these lines are there so that a user with SUPER can “grant” another user ability to execute the trigger/view. Your load will fail if you forget this.
  6. Consider running your load with –force to the mysql client, and log to disk stderr/stdout so you can review errors later. It is painful to spend 4 days loading a 500GB database only to have it fail partially through because you forgot about SQL DEFINER issue..
  7. Consider splitting the mysqldump into two phases: –no-data so you dump schema only, and then –data-only so you get just the rows. This way you can isolate faults and solve them along the way.
  8. Skipping replication events is SLOW. You don’t have ability to do sql_slave_skip_counter (since this requires SUPER), instead you need to use an Amazon RDS function of mysql.rds_skip_repl_error. Sadly this Stored Procedure takes no argument and thus it only skips one event at a time. It takes about 2-3 seconds for each execution, so if you have a lot of events to skip, that’s a problem. Having to skip ANYTHING is indication that something went wrong in the process, so if you find yourself in the unenviable position of skipping events, know that pt-table-checksum should be able to give you an idea how widespread is the data divergence issue.
  9. pt-table-sync doesn’t work against Amazon RDS as it is written to expect SUPER because it wants to do binlog_format=STATEMENT in session, but that’s not allowed. Kenny Gryp hacked me a version to just skip this check, and Kenny also reported it for inclusion in a future release of Percona Toolkit, but in the meantime you need to work around the lack of SUPER privilege.
  10. pt-table-sync is SLOW against RDS. As pt-table-sync doesn’t log a lot of detail about where time is spent, I haven’t completely isolated the source of the latency, but I suspect this is more about network round trip than anything else.
  11. innodb_log_file_size is hardcoded to 128MB in Amazon RDS, you can’t change this.  innodb_log_files_in_group is not even showing up in Parameter Groups view but SHOW GLOBAL VARIABLES reports as 2. So you’re cookin’ on 256MB, if your writes are heavy this may become a bottleneck with little workaround available in MySQL.
  12. CHANGE MASTER isn’t available in RDS. You define RDS as a slave by calling a stored procedure where you pass the appropriate options such as CALL mysql.rds_set_external_master.

For those of you wondering about the SUPER-privilege, I was fortunate that Bill Karwin from Percona’s Support team took the time to review my post and suggested I dig into this deeper, turns out that Amazon didn’t hack MySQL to remove the SUPER privilege, but instead run the Stored Procedures with security_type of DEFINER:

mysql> select db,name,type,language,security_type,definer from proc where name = 'rds_external_master' G *************************** 1. row *************************** db: mysql name: rds_external_master type: PROCEDURE language: SQL security_type: DEFINER definer: rdsadmin@localhost 1 row in set (0.08 sec)

mysql> show grants for 'rdsadmin'@'localhost'; +------------------------------------------------------------------------------------------------------+ | Grants for rdsadmin@localhost | +------------------------------------------------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'rdsadmin'@'localhost' IDENTIFIED BY PASSWORD 'XXX' WITH GRANT OPTION | +------------------------------------------------------------------------------------------------------+ 1 row in set (0.07 sec)

So for those of you working with Amazon RDS, I hope that this list saves you some time and helps our your migration!  If you get stuck you can always contact Percona Consulting for assistance.

The post What I learned while migrating a customer MySQL installation to Amazon RDS appeared first on MySQL Performance Blog.

Monitoring MySQL flow control in Percona XtraDB Cluster 5.6

MySQL Performance Blog - Sat, 26/07/2014 - 00:41

Monitoring flow control in a Galera cluster is very important. If you do not, you will not understand why writes may sometimes be stalled. Percona XtraDB Cluster 5.6 provides 2 status variables for such monitoring: wsrep_flow_control_paused and wsrep_flow_control_paused_ns. Which one should you use?

What is flow control?

Flow control does not exist with regular MySQL replication, but only with Galera replication. It is simply the mechanism nodes are using when they are not able to keep up with the write load: to keep replication synchronous, the node that is starting to lag instructs the other nodes that writes should be paused for some time so it does not get too far behind.

If you are not familiar with this notion, you should read this blogpost.

Triggering flow control and graphing it

For this test, we’ll use a 3-node Percona XtraDB Cluster 5.6 cluster. On node 3, we will adjust gcs.fc_limit so that flow control is triggered very quickly and then we will lock the node:

pxc3> set global wsrep_provider_options="gcs.fc_limit=1"; pxc3> flush tables with read lock;

Now we will use sysbench to insert rows on node 1:

$ sysbench --test=oltp --oltp-table-size=50000 --mysql-user=root --mysql-socket=/tmp/pxc1.sock prepare

Because of flow control, writes will be stalled and sysbench will hang. So after some time, we will release the lock on node 3:

pxc3> unlock tables;

During the whole process, wsrep_flow_control_paused and wsrep_flow_control_paused_ns are recorded every second with mysqladmin ext -i1. We can then build a graph of the evolution of both variables:

While we can clearly see when flow control was triggered on both graphs, it is much easier to know when flow control was stopped with wsrep_flow_control_paused_ns. It would be even more obvious if we have had several timeframes when flow control is in effect.

Conclusion

Monitoring a server is obviously necessary if you want to be able to catch issues. But you need to look at the right metrics. So don’t be scared if you are seeing that wsrep_flow_control_paused is not 0: it simply means that flow control has been triggered at some point since the server started up. If you want to know what is happening right now, prefer wsrep_flow_control_paused_ns.

The post Monitoring MySQL flow control in Percona XtraDB Cluster 5.6 appeared first on MySQL Performance Blog.

Putting MySQL Fabric to Use: July 30 webinar

MySQL Performance Blog - Fri, 25/07/2014 - 08:13

Martin and I have recently been blogging together about MySQL Fabric (in case you’ve missed this, you can find the first post of the series here), and on July 30th, we’re going to be presenting a webinar on this topic titled “Putting MySQL Fabric to Use.”

The focus of the webinar is to help you get started quickly on this technology, so we’ll include very few slides (mostly just a diagram or two) and then jump straight into shared screen mode, with lots of live console and source code examples.

In order to make the best use of time, we won’t show you how to install and configure MySQL Fabric. However, we can point you to a few resources to help you get ready and even follow our examples as we go:

  • The official manual is an obvious starting point
  • Our second post in the series includes configuration instructions
  • This git repo contains the test environment we’ll use to run our demos. Specifically, we’ll use the sharding branch, so if you intend to follow our examples as we go, we recommend checking that one out.

If you’re interested, you can register for this webinar here, and if there’s something specific you’d like to see (we had a request for PHP examples in the comments to our last post) feel free to post that as a comment. We can’t promise we’ll be able to cover all requests during the webinar, but we’ll incorporate examples to the repo as time allows.

Hope to see you then!

The post Putting MySQL Fabric to Use: July 30 webinar appeared first on MySQL Performance Blog.

DBaaS, OpenStack and Trove 101: Introduction to the basics

MySQL Performance Blog - Thu, 24/07/2014 - 17:00

We’ll be publishing a series of posts on OpenStack and Trove over the next few weeks, diving into their usage and purpose. For readers who are already familiar with these technologies, there should be no doubt as to why we are incredibly excited about them, but for those who aren’t, consider this a small introduction to the basics and concepts.

What is Database as a Service (DBaaS)?
In a nutshell, DBaaS – as it is frequently referred to – is a loose moniker to the concept of providing a managed cloud-based database environment accessible by users, applications or developers. Its aim is to provide a full-fledged database environment, while minimizing the administrative turmoil and pains of managing the surrounding infrastructure.

Real life example: Imagine you are working on a new application that has to be accessible from multiple regions. Building and maintaining a large multiregion setup can be very expensive. Furthermore, it introduces additional complexity and strain on your system engineers once timezones start to come into play. The challenge of having to manage machines in multiple datacenters won’t simplify your release cycle, nor increase your engineers’ happiness.

Let’s take a look at some of the questions DBaaS could answer in a situation like this:

- How do I need to size my machines, and where should I locate them?
Small environments require less computing power and can be a good starting point, although this also means they may not be as well-prepared for future growth. Buying larger-scale and more expensive hardware and hosting can be very expensive and can be a big stumbling block for a brand new development project. Hosting machines in multiple DC’s could also introduce administrative difficulties, like having different SLA’s and potential issues setting up WAN or VPN communications. DBaaS introduces an abstraction layer, so these consideration aren’t yours, but those of the company offering it, while you get to reap all the rewards.

- Who will manage my environment from an operational standpoint?
Staffing considerations and taking on the required knowledge to properly maintain a production database are often either temporarily sweeped under the rug or, when the situation turns out badly, a cause for the untimely demise of quite a few young projects. Rather than think about how long ago you should have applied that security patch, wouldn’t it be nice to just focus on managing the data itself, and be otherwise confident that the layers beyond it are managed responsibly?

- Have a sudden need to scale out?
Once you’re up and running, enjoying the success of a growing use base, your environment will need to scale accordingly. Rather than think long and hard on the many options available, as well as the logistics attached to those changes, your DBaaS provider could handle this transparently.

Popular public options: Here are a few names of public services you may have come across already that fall under the DBaaS moniker:

- Amazon RDS
- Rackspace cloud databases
- Microsoft SQLAzure
- Heroku
- Clustrix DBaaS

What differentiates these services from a standard remote database is the abstraction layer that fully automates their backend, while still offering an environment that is familiar to what your development team is used to (be it MySQL, MongoDB, Microsoft SQLServer, or otherwise). A big tradeoff to using these services is that you are effectively trusting an external company with all of your data, which might make your legal team a bit nervous.

Private cloud options?
What if you could offer your team the best of both worlds? Or even provide a similar type of service to your own customers? Over the years, a lot of platforms have been popping up to allow effective management and automation of virtual environments such as these, allowing you to effectively “roll your own” DBaaS. To get there, there are two important layers to consider:

  • Infrastructure Management, also referred to as Infrastructure-as-a-Service (IaaS), focusing on the logistics of spinning up virtual machines and keeping their required software packages running.
  • Database Management, previously referred to DBaaS, transparently coordinating multiple database instances to work together and present themselves as a single, coherent data repository.

Examples of IaaS products:
- OpenStack
- OpenQRM

Ecample of DBaaS:
- Trove

Main Advantages of DBaaS
For reference, the main reasons why you might want to consider using an existing DBaaS are as follows:

- Reduced Database management costs

DBaaS removes the amount of maintenance you need to perform on isolated DB instances. You offload the system administration of hardware, OS and database to either a dedicated service provider, or in the case where you are rolling your own, allow your database team to more efficiently manage and scale the platform (public vs private DBaaS).

- Simplifies certain security aspects

If you are opting to use a DBaaS platform, the responsibility of worrying about this or that patch being applied falls to your service provider, and you can generally assume that they’ll keep your platform secure from the software perspective.

- Centralized management

One system to rule them all. A guarantee of no nasty surprises concerning that one ancient server that should have been replaced years ago, but you never got around to it. As a user of DBaaS, all you need to worry about is how you interface with the database itself.

- Easy provisioning

Scaling of the environment happens transparently, with minimal additional management.

- Choice of backends

Typically, DBaas providers offer you the choice of a multitude of database flavors, so you can mix and match according to your needs.

Main Disadvantages
- Reduced visibility of the backend

Releasing control of the backend requires a good amount of trust in your DBaaS provider. There is limited or no visibility into how backups are run and maintained, which configuration modifications are applied, or even when and which updates will be implemented. Just as you offload your responsibilities, you in turn need to rely on an SLA contract.

- Potentially harder to recover from catastrophic failures

Similarly to the above, unless your service providers have maintained thorough backups on your behalf, the lack of direct access to the host machines means that it could be much harder to recover from database failure.

- Reduced performance for specific applications

There’s a good chance that you are working on a shared environment. This means the amount of workload-specific performance tuning options is limited.

- Privacy and Security concerns

Although it is much easier to maintain and patch your environment. Having a centralized system also means you’re more prone to potential attacks targeting your dataset. Whichever provider you go with, make sure you are intimately aware of the measures they take to protect you from that, and what is expected from your side to help keep it safe.

Conclusion: While DBaaS is an interesting concept that introduces a completely new way of approaching an application’s database infrastructure, and can bring enterprises easily scalable, and financially flexible platforms, it should not be considered a silver bullet. Some big tradeoffs need to be considered carefully from the business perspective, and any move there should be accompanied with careful planning and investigation of options.

Embracing the immense flexibility these platforms offer, though, opens up a lot of interesting perspectives too. More and more companies are looking at ways to roll their own “as-a-Service”, provisioning completely automated hosted platforms for customers on-demand, and abstracting their management layers to allow them to be serviced by smaller, highly focused technical teams.

Stay tuned: Over the next few weeks we’ll be publishing a series of posts focusing on the combination of two technologies that allow for this type of flexibility: OpenStack and Trove.

The post DBaaS, OpenStack and Trove 101: Introduction to the basics appeared first on MySQL Performance Blog.

Why TokuDB hates Transparent HugePages

MySQL Performance Blog - Wed, 23/07/2014 - 20:00

If you try to install the TokuDB storage engine on a modern Linux distribution it might fail with following error message:

2014-07-17 19:02:55 13865 [ERROR] TokuDB will not run with transparent huge pages enabled.
2014-07-17 19:02:55 13865 [ERROR] Please disable them to continue.
2014-07-17 19:02:55 13865 [ERROR] (echo never > /sys/kernel/mm/transparent_hugepage/enabled)

You might be curious why TokuDB refuses to start with Transparent HugePages. Are they not a good thing… allowing smaller kernel page tables and less TLB misses when accessing data in the buffer pool? I was curious, so I asked Tim Callaghan this very question.

This problem originates with TokuDB using jemalloc memory allocator, which uses a particular trick to deal with memory fragmentation. The classical problem with memory allocators is fragmentation – if you allocated a say 2MB chunk from the operating system (typically using mmap),  as the process runs it is likely some of that 2MB memory block will become free but not all of it, hence it can’t be given back to operating system completely. jemalloc uses a clever trick being able to give back portions of memory allocated in such a way through madvise(…, MADV_DONTNEED) call.

Now what happens when you use transparent huge pages? In this case the operating system (and CPU, really) works with pages of a much larger size which only can be unmapped from the address space in its entirety – which does not work when smaller objects are freed which produce smaller free “holes.”

As a result, without being able to free memory efficiently the amount of allocated memory may grow unbound until the process starts to swap out – and in the end being killed by “out of memory” killer at least under some workloads. This is not a behavior you want to see from the database server. As such requiring to disable huge pages is a better choice.

Having said that this is pretty crude requirement/solution – disabling huge pages on complete operating system image to make one application work while others might be negatively impacted. I hope with a future jemalloc version/kernel releases there will be solution where jemalloc simply prevents huge pages usage for its allocations.

Using jemalloc and its approach to remove pages from resident space also makes TokuDB a lot different than typical MySQL instances running Innodb from the process space. With Innodb VSZ and RSS are often close. In fact we often monitor VSZ to ensure it is not excessively large to avoid danger of process starting to swap actively or be killed with OOM killer. TokuDB however often can look like this

[root@smt1 mysql]# ps aux | grep mysqld
mysql 14604 21.8 50.6 12922416 4083016 pts/0 Sl Jul17 1453:27 /usr/sbin/mysqld –basedir=/usr –datadir=/var/lib/mysql –plugin-dir=/usr/lib64/mysql/plugin –user=mysql –log-error=/var/lib/mysql/smt1.pz.percona.com.err –pid-file=/var/lib/mysql/smt1.pz.percona.com.pid
root 28937 0.0 0.0 103244 852 pts/2 S+ 10:38 0:00 grep mysqld

In this case TokuDB is run with defaults on 8GB system – it takes approximately 50% of memory in terms of RSS size, however the VSZ of the process is over 12GB – this is a lot more than memory available.

This is completely fine for TokuDB. If I would not have Transparent HugePages disabled, though, my RSS would be a lot closer to VSZ causing intense swapping or even process killed by OOM killer.

In addition to explaining this to me, Tim Callaghan was also kind enough to share some links on this issue from other companies such as Oracle, NuoDB , Splunk, SAP, SAP(2), which provide more background information on this topic.

The post Why TokuDB hates Transparent HugePages appeared first on MySQL Performance Blog.

Reference architecture for a write-intensive MySQL deployment

MySQL Performance Blog - Wed, 23/07/2014 - 00:31

We designed Percona Cloud Tools (both hardware and software setup) to handle a very high-intensive MySQL write workload. For example, we already observe inserts of 1bln+ datapoints per day. So I wanted to share what kind of hardware we use to achieve this result.

Let me describe what we use, and later I will explain why.

Server:

  • Chassis: Supermicro SC825TQ-R740LPB 2U Rackmount Chassis
  • Motherboard: Supermicro X9DRI-F dual socket
  • CPU: Dual Intel Xeon Ivy Bridge E5-2643v2 (6x 3.5Ghz cores, 12x HT cores, 25M L3)
  • Memory: 256GB (16x 16GB 256-bit quad-channel) ECC registered DDR3-1600
  • Raid: LSI MegaRAID 9260-4i 4-port 6G/s hardware RAID controller, 512M buffer
  • MainStorage: PCIe SSD HGST FlashMAX II 4.8TB
  • Secondary Storage (OS, logs): RAID 1 over 2x 3TB hard drives

Software:

When selecting hardware for your application, you need to look at many aspects – typically you’re looking for a solution for which you already have experience in working with and has also proved to be the most efficient option. For us it has been as follows:

Cloud vs Bare Metal
We have experience having hardware hosted at the data center as well as cash for upfront investments in hardware so we decided to go for physical self-hosted hardware instead of the cloud. Going this route also gave us maximum flexibility in choosing a hardware setup that was the most optimal for our application rather than selecting one of the stock options.

Scale Up vs Scale Out
We have designed a system from scratch to be able to utilize multiple servers through sharding – so our main concern is choosing the most optimal configuration for the server and provisioning servers as needed. In addition to raw performance we also need to consider power usage and overhead of managing many servers which typically makes having slightly more high-end hardware worth it.

Resource Usage
Every application uses resources in different ways so an optimal configuration will be different depending on your application. Yet all applications use the same resources you need to consider. Typically you want to plan for all of your resources to be substantially used – providing some margin for spikes and maintenance.

CPU

  • Our application processes a lot of data and uses the TokuDB storage engine which uses a lot of CPU for compression, so we needed powerful CPUs.
  • Many MySQL functions are not parallel, think executing single query or Alter table so we’re going for CPU with faster cores rather than larger amount of cores. The resulting configuration with 2 sockets giving 12 cores and 24 threads is good enough for our workloads.
  • Lower end CPUs such as Xeon E3 have very attractive price/performance but only support 32GB of memory which was not enough for our application.

Memory

  • For database boxes memory is mainly used as a cache, so depending on your application you may be better off investing in memory or storage for optimal performance. Check out this blog post for more details.
  • Accessing data in memory is much faster than even on the fastest flash storage so it is still important.
    For our workload having recent data in memory is very important so we get as much “cheap” memory as we can populating all 16 slots with 16GB dimms which have attractive cost per GB at this point.

Storage
There are multiple uses for the storage so there are many variables to consider

  • Bandwidth
    • We need to be able access data on the storage device quickly and with stable response time. HGST FlashMax II has been able to meet these very demanding needs.
  • Endurance
    • When using flash storage you need to worry about endurance – how much beating with writes flash storage can handle before it wears out. Some low cost MLC SSDs would wear out in the time frame of weeks if being written with maximum speed. HGST FlashMax II has endurance rating of 10 Petabytes written (for a random workload) – 30 Petabytes written (for a sequential workload)
    • We also use TokuDB storage engine which significantly reduces amount of writes compared to Innodb.
  • Durability
    • Does the storage provide true durability with data guaranteed to be persisted when write is acknowledged at the operating system level when power goes down or is loss possible?
      We do not want to risk database corruption in case of power failure so we were looking for storage solution which guarantees durability.
      HGST FlashMax II guarantees durability which has been confirmed by our stress tests.
  • Size
    • To scale application storage demands you need to scale both number of IO operations storage can handle and storage size. For flash storage it is often the size which becomes limiting factor.
      HGST FlashMax II 4.8 TB capacity is best available on the market which allows us to go “All Flash” and achieve very quick data access to all our data set.
  • Secondary Storage
    • Not every application need requires flash storage properties.
    • We have secondary storage with conventional drives for operating system and logs.
      Sequential read/write pattern works well with low cost conventional drives and also allow us to increase flash life time, having it handling less writes.
    • We’re using RAID with BBU for secondary storage to be able to have fully durable binary logs without paying high performance penalty.

Why PCIe SSD over SATA SSD?
There are arguments that SATA SSD provides just a good enough performance for MySQL and there is no need for PCIe. While these arguments are valid in one dimension, there are several more to consider.

First, like I said PCIe SSD still provides a best absolute response time and it is an important factor for an end user experience in SaaS systems like Percona Cloud Tools.
Second, consider maintenance operations like backup, ALTER TABLES or slave setups. While these operations are boring and do not get as much attention as a response time or throughput in benchmarks, it is still operations that DBAs performs basically daily, and it is very important to finish a backup or ALTER TABLE in a predictable time, especially on 3-4TB datasize range. And this is where PCIe SSD performs much better than SATA SSDs. For SATA SSD, especially bigger size, write endurance is another point of concern.

Why TokuDB engine?
The TokuDB engine is the best when it comes to insert operations to a huge dataset, and few more factors makes it a no-brainer:

  • TokuDB compression is a huge win. I estimate into this storage ( FlashMAX II 4.8TB) we will fit about 20-30TB of raw data.
  • TokuDB is SSD friendly, as it performs much less data writes per INSERT operation than InnoDB, which greatly extends SSD (which is, well, expensive to say the least) lifetime.

The post Reference architecture for a write-intensive MySQL deployment appeared first on MySQL Performance Blog.

Percona XtraDB Cluster 5.6.19-25.6 is now available

MySQL Performance Blog - Tue, 22/07/2014 - 05:24

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on July 21st 2014. Binaries are available from downloads area or from our software repositories. We’re also happy to announce that Ubuntu 14.04 LTS users can now download, install, and upgrade Percona XtraDB Cluster 5.6 from Percona’s software repositories.

Based on Percona Server 5.6.19-67.0 including all the bug fixes in it, Galera Replicator 3.6, and on Codership wsrep API 25.6, Percona XtraDB Cluster 5.6.19-25.6 is now the current General Availability release. All of Percona‘s software is open-source and free, and all the details of the release can be found in the 5.6.19-25.6 milestone at Launchpad.

New Features:

  • Percona XtraDB Cluster now supports storing the Primary Component state to disk by setting the pc.recovery variable to true. The Primary Component can then recover automatically when all nodes that were part of the last saved state reestablish communications with each other. This feature can be used for automatic recovery from full cluster crashes, such as in the case of a data center power outage and graceful full cluster restarts without the need for explicitly bootstrapping a new Primary Component.
  • When joining the cluster, the state message exchange provides us with gcache seqno limits. That information is now used to choose a donor through IST first, and, if this is not possible, only then SST is attempted. The wsrep_sst_donor setting is honored, though, and it is also segment aware.
  • An asynchronous replication slave thread was stopped when the node tried to apply the next replication event while the node was in non-primary state. But it would then remain stopped after the node successfully re-joined the cluster. A new variable, wsrep_restart_slave, has been implemented which controls if the MySQL slave should be restarted automatically when the node re-joins the cluster.
  • Handling install message and install state message processing has been improved to make group forming a more stable process in cases when many nodes are joining the cluster.
  • A new wsrep_evs_repl_latency status variable has been implemented which provides the group communication replication latency information.
  • Node consistency issues with foreign key grammar have been fixed. This fix introduces two new variables: wsrep_slave_FK_checks and wsrep_slave_UK_checks. These variables are set to TRUE and FALSE respectively by default. They control whether Foreign Key and Unique Key checking is done for applier threads.

Bugs Fixed:

  • Fixed the race condition in Foreign Key processing that could cause assertion. Bug fixed #1342959.
  • The restart sequence in scripts/mysql.server would fail to capture and return if the start call failed to start the server. As a result, a restart could occur that failed upon start-up, and the script would still return 0 as if it worked without any issues. Bug fixed #1339894.
  • Updating a unique key value could cause the server to hang if a slave node had enabled parallel slaves. Bug fixed #1280896.
  • Percona XtraDB Cluster has implemented threadpool scheduling fixes. Bug fixed #1333348.
  • garbd was returning an incorrect return code, ie. when garbd was already started, return code was 0. Bug fixed #1308103.
  • rsync SST would silently fail on joiner when the rsync server port was already taken. Bug fixed #1099783.
  • When gmcast.listen_addr was configured to a certain address, the local connection point for outgoing connections was not bound to the listen address. This would happen if the OS has multiple interfaces with IP addresses in the same subnet. The OS would pick the wrong IP for a local connection point and other nodes would see connections originating from an IP address which was not listened to. Bug fixed #1240964.
  • An issue with re-setting galera provider (in wsrep_provider_options) has been fixed. Bug fixed #1260283.
  • Variable wsrep_provider_options couldn’t be set in runtime if no provider was loaded. Bug fixed #1260290.
  • Percona XtraDB Cluster couldn’t be built with Bison 3.0. Bug fixed #1262439.
  • MySQL wasn’t handling exceeding the max writeset size wsrep error correctly. Bug fixed #1270920.
  • Fixed the issue which caused a node to hang/fail when SELECTs/SHOW STATUS was run after FLUSH TABLES WITH READ LOCK was used on a node with wsrep_causal_reads set to 1 while there was a DML on other nodes. Bug fixed #1271177.
  • Lowest group communication layer (evs) would fail to handle the situation properly when a large number of nodes would suddenly start recognizing each other. Bugs fixed #1271918 and #1249805.
  • Percona XtraBackup SST would fail if the progress option was used with a large number of files. Bug fixed #1294431.

NOTE: When performing an upgrade from an older 5.6 version on Debian/Ubuntu systems, in order to upgrade the Galera package correctly, you’ll need to pin the Percona repository and run: apt-get install percona-xtradb-cluster-56. This is required because older Galera deb packages have an incorrect version number. The correct wsrep_provider_version after upgrade should be 3.6(r3a949e6).

This release contains 50 fixed bugs. The complete list of fixed bugs can be found in our release notes.

Release notes for Percona XtraDB Cluster 5.6.19-25.6 are available in our online documentation along with the installation and upgrade instructions.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona XtraDB Cluster Errata can be found in our documentation.

[UPDATE 2014-07-24]: Package Percona-XtraDB-Cluster-client-56-5.6.19-25.6.824.el6.x86_64.rpm has been updated to resolve the conflict with Percona-XtraDB-Cluster-devel package.

The post Percona XtraDB Cluster 5.6.19-25.6 is now available appeared first on MySQL Performance Blog.

Percona Monitoring Plugins 1.1.4 release

MySQL Performance Blog - Tue, 22/07/2014 - 01:42

Percona is glad to announce the release of Percona Monitoring Plugins 1.1.4.

Changelog:

* Added login-path support to Nagios plugins with MySQL client 5.6 (bug 1338549)
* Added a new threshold option for delayed slaves to pmp-check-mysql-replication-delay (bug 1318280)
* Added delayed slave support to pmp-check-mysql-replication-running (bug 1332082)
* Updated Nagios plugins and Cacti script to leverage lock-free SHOW SLAVE STATUS in Percona Server (bug 1297442)
* Fixed pmp-check-mysql-replication-delay integer-float issue with MariaDB and MySQL 5.6 (bugs 1245934, 1295795)
* ss_get_rds_stats.py was not installed with 755 permissions from the package (bug 1316943)
* Cacti MySQL template item “handler_savepoint_rollback” was GAUGE type instead of DERIVE (bug 1334173)
* Fixed Zabbix running-slave check issue on some Debian systems (bug 1310723)
* Fixed paths in percona-cacti-templates package (bug 1349564)

A new tarball is available from downloads area or RPM and DEB packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. In addition as part of Percona’s Remote DBA installation and setup of these tools are included with our services. You can find links to the documentation, forums and more at the project homepage.

About Percona Monitoring Plugins
Percona Monitoring Plugins are high-quality components to add enterprise-grade MySQL monitoring and graphing capabilities to your existing in-house, on-premises monitoring solutions. The components are designed to integrate seamlessly with widely deployed solutions such as Nagios, Cacti and Zabbix and are delivered in the form of templates, plugins, and scripts which make it easy to monitor MySQL performance.

The post Percona Monitoring Plugins 1.1.4 release appeared first on MySQL Performance Blog.

A schema change inconsistency with Galera Cluster for MySQL

MySQL Performance Blog - Mon, 21/07/2014 - 17:00

I recently worked on a case where one node of a Galera cluster had its schema desynchronized with the other nodes. And that was although Total Order Isolation method was in effect to perform the schema changes. Let’s see what happened.

Background

For those of you who are not familiar with how Galera can perform schema changes, here is a short recap:

  • Two methods are available depending on the value of the wsrep_OSU_method setting. Both have benefits and drawbacks, it is not the main topic of this post.
  • With TOI (Total Order Isolation), a DDL statement is performed at the same point in the replication flow on all nodes, giving strong guarantees that the schema is always identical on all nodes.
  • With RSU (Rolling Schema Upgrade), a DDL statement is not replicated to the other nodes. Until the DDL statement has been executed on all nodes, the schema is not consistent everywhere (so you should be careful not to break replication).

You can look at the official document here.

If you read carefully the section on TOI, you will see that “[...] TOI transactions will never fail certification and are guaranteed to be executed.” But also that “The system replicates the TOI query before execution and there is no way to know whether it succeeds or fails. Thus, error checking on TOI queries is switched off.”

Confusing? Not really. It simply means that with TOI, a DDL statement will always pass certification. But if for some reason, the DDL statement fails on one of the nodes, it will not be rolled back on the other nodes. This opens the door for schema inconsistencies between nodes.

A test case

Let’s create a table on a 3-node Percona XtraDB Cluster 5.6 cluster and insert a few rows:

pxc1> create table t (id int not null auto_increment primary key, c varchar(10)); pxc1> insert into t (c) values ('aaaa'),('bbbb');

Then on node 3, let’s introduce a schema change on t that can make other schema changes fail:

pxc3> set global wsrep_OSU_method=RSU; pxc3> alter table t add d int; pxc3> set global wsrep_OSU_method=TOI;

As the schema change was done on node 3 with RSU, it is not replicated to the other nodes.

Now let’s try another schema change on node 1:

pxc1> alter table t add d varchar(10); Query OK, 0 rows affected (0,14 sec) Records: 0 Duplicates: 0 Warnings: 0

Apparently everything went well and indeed on node 1 and 2, we have the correct schema:

pxc2>show create table tG *************************** 1. row *************************** Table: t Create Table: CREATE TABLE t ( id int(11) NOT NULL AUTO_INCREMENT, c varchar(10) DEFAULT NULL, d varchar(10) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1

But on node 3, the statement failed so the schema has not been changed:

pxc3> show create table tG *************************** 1. row *************************** Table: t Create Table: CREATE TABLE t ( id int(11) NOT NULL AUTO_INCREMENT, c varchar(10) DEFAULT NULL, d int(11) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1

The error is visible in the error log of node 3:

2014-07-18 10:37:14 9649 [ERROR] Slave SQL: Error 'Duplicate column name 'd'' on query. Default database: 'repl_test'. Query: 'alter table t add d varchar(10)', Error_code: 1060 2014-07-18 10:37:14 9649 [Warning] WSREP: RBR event 1 Query apply warning: 1, 200 2014-07-18 10:37:14 9649 [Warning] WSREP: Ignoring error for TO isolated action: source: 577ffd51-0e52-11e4-a30e-4bde3a7ad3f2 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 3 trx_id: -1 seqnos (l: 17, g: 200, s: 199, d: 199, ts: 7722177758966)

But of course it is easy to miss. And then a simple INSERT can trigger a shutdown on node3:

pxc2> insert into t (c,d) values ('cccc','dddd'); Query OK, 1 row affected (0,00 sec)

will trigger this on node 3:

2014-07-18 10:42:27 9649 [ERROR] Slave SQL: Column 2 of table 'repl_test.t' cannot be converted from type 'varchar(10)' to type 'int(11)', Error_code: 1677 2014-07-18 10:42:27 9649 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 201 2014-07-18 10:42:27 9649 [Warning] WSREP: Failed to apply app buffer: seqno: 201, status: 1 at galera/src/trx_handle.cpp:apply():340 [...] 2014-07-18 10:42:27 9649 [Note] WSREP: Received NON-PRIMARY. 2014-07-18 10:42:27 9649 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 201) 2014-07-18 10:42:27 9649 [Note] WSREP: Received self-leave message. 2014-07-18 10:42:27 9649 [Note] WSREP: Flow-control interval: [0, 0] 2014-07-18 10:42:27 9649 [Note] WSREP: Received SELF-LEAVE. Closing connection. 2014-07-18 10:42:27 9649 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 201) 2014-07-18 10:42:27 9649 [Note] WSREP: RECV thread exiting 0: Success 2014-07-18 10:42:27 9649 [Note] WSREP: recv_thread() joined. 2014-07-18 10:42:27 9649 [Note] WSREP: Closing replication queue. 2014-07-18 10:42:27 9649 [Note] WSREP: Closing slave action queue. 2014-07-18 10:42:27 9649 [Note] WSREP: bin/mysqld: Terminated.

Conclusion

As on regular MySQL, schema changes are challenging with Galera. Some subtleties can create a lot of troubles if you are not aware of them. So before running DDL statement, make sure you fully understand how TOI and RSU methods work.

The post A schema change inconsistency with Galera Cluster for MySQL appeared first on MySQL Performance Blog.

Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue

MySQL Performance Blog - Sat, 19/07/2014 - 00:38

The Percona Managed Services team recently faced a somewhat peculiar client issue. We’d receive pages about their MySQL service being unreachable. However, studying the logs showed nothing out of the ordinary…. for the most part it appeared to be a normal shutdown and there was nothing in anyone’s command history nor a cron task to speak of that was suspicious.

This is one of those obscure and peculiar (read: unique) issues that triggered an old memory; I’d seen this behavior before and I had just the tool to catch the culprit in the act.

Systemtap made diagnostics of this issue possible and I can’t state enough how much of a powerful and often under-utilized tool set systemtap really is.

cat > signals.stp << EOF
probe signal.send {
if (sig_name == “SIGKILL” || sig_name == “SIGTERM”)
printf(“[%s] %s was sent to %s (pid:%d) by %s uid:%dn”,
ctime(gettimeofday_s()), sig_name, pid_name, sig_pid, execname(), uid())
}
EOF

sudo stap ./signals.stp > signals.log 2>signals.err

grep mysqld signals.log
[Wed Jun 11 19:03:23 2014] SIGKILL was sent to mysqld (pid:8707) by cfagent uid:0
[Fri Jun 13 21:37:27 2014] SIGKILL was sent to mysqld (pid:6583) by cfagent uid:0
[Sun Jun 15 05:05:34 2014] SIGKILL was sent to mysqld (pid:19818) by cfagent uid:0
[Wed Jul 9 07:03:47 2014] SIGKILL was sent to mysqld (pid:4802) by cfagent uid:0

Addendum: It had been so long since I had used this tooling that I could not remember the original source from which I derived the module above; some cursory searching to rectify this issue for this blog post found this original source by Eugene Teo of Red Hat made available under GPLv2.

From this we were able to show that cfagent was killing the mysqld process presumably via a misconfigured job; this information was returned to the client and this has continued to be run in production for two months now at the client’s request with no issues to speak of.

This is by no means the limit to what systemtap can be used to achieve; you can hook into functions though whilst you may need to install the debug packages to find what functions are available run for example:

sudo stap -L 'process("/usr/sbin/mysqld").function("*")' > /tmp/mysql_stapfunc
...
head /tmp/mysql_stapfunc
process("/usr/sbin/mysqld").function("TIME_from_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_set_hhmmss")
process("/usr/sbin/mysqld").function("TIME_set_yymmdd")
process("/usr/sbin/mysqld").function("TIME_to_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_to_ulonglong")
...

This is also true of the kernel using sudo stap -L 'kernel.function("*")' > /tmp/kernel_stapfunc however you must be booted into a debug kernel for this to function.

Systemtap is more than a worthy tool to have at your disposal with plenty of examples available.

Finally I invite you to join me July 23 at 10 a.m. Pacific time for my webinar, “What Every DBA Needs to Know About MySQL Security.” This detailed technical webinar provides insight into best security practices for either setting up a new MySQL environment or upgrading the security of an existing one. I hope to see you there!

The post Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue appeared first on MySQL Performance Blog.

Q&A: Even More Deadly Mistakes of MySQL Development

MySQL Performance Blog - Fri, 18/07/2014 - 02:57

On Wednesday I gave a presentation on “How to Avoid Even More Common (but Deadly) MySQL Development Mistakes” for Percona MySQL Webinars.  If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions.  I answered as many as we had time for  during the session, but here are all the questions with my complete answers:

Q: Disk bandwidth also not infinite

Indeed, you’re right!

We discussed in the webinar the impact on network bandwidth from using column wildcards in queries like SELECT *, but it’s also possible that using SELECT * can impact disk operations. Varchar, Blob, or Text columns can be stored on extra pages in the database, and if you include those columns in your query needlessly, it can cause the storage engine to do a lot of seeks and page reads unnecessarily.

For more details on string storage in InnoDB, see Peter Zaitsev’s blog on Blob Storage in Innodb.

Q: How many tables can be joined in a single query? What is the optimal number of joins?

MySQL has a limit of 63 table references in a given query. This limits how many JOIN operations you can do, and also limits the number of UNIONs. Actually you can go over this limit if your JOIN or UNION don’t reference any tables, that is, create a derived table of one row of expressions.

If you do join a lot of tables (or even self-join the same table many times), you’re likely to hit a practical scaling limit long before you reach 63 table references. The practical limit in your case depends on many factors, including the length of the tables, the data types, the type of join expressions in your queries, and your physical server’s capabilities. It’s not a fixed limit I can cite for you.

If you think you need dozens of table references in a single query, you should probably step back and reconsider your database design or your query design.

I often see this type of question (“what is the limit on the number of joins?”) when people try to use key/value tables, also called Entity-Attribute-Value, and they’re trying to pivot attributes from rows into columns, as if the table were stored in a conventional way with one column per attribute. This is a broken design for many reasons, and the scalability of many-way joins is just one problem with it.

Q: How many indexes can be created in a single table? Any limitation? What is the optimal number of indexes?

All MySQL storage engines support at least 16 indexes per table.

As far as the optimal number of indexes, I don’t pay attention to the number of indexes (as long as it remains lower than the max of 16). I try to make sure I have the right indexes for my queries. If you put an arbitrary cap of for example 8 or 10 indexes on a given table, then you might be running queries that lack a needed index, and the unnecessary extra cost of running that query is probably greater than the cost of maintaining the one extra index it needs.

That said, there are cases where you have such variation in query types that there’s no way to have optimal indexes to cover every possible case. Given that you can have multi-column indexes, and multi-column indexes with columns in different orders, there are n-factorial possible indexes on a table with n columns.

Q: There is a table with 3 columns: id(int), user_id(int), day(date). There is a high chance same user_id will ‘exist’ for every day. I read data by “where user_id = some_id” (very high throuhput) and delete all entries once a day by cron using “where sent_date = ’2014-01-01′ “. Have approx 6M rows per day deletion is pretty painfull. Will partitioning by column ‘day’ help me deleting those bulks faster? If yes – how much faster? How much will it slow down SELECTs? – not all entries are deleted, but only entries for some specific old day, e.g. ‘ WHERE day = ’1 week ago’

Range partitioning by date would give you the opportunity to ALTER TABLE…DROP PARTITION, so you could remove all data for a given date very quickly, much faster than deleting millions of rows. The performance of DROP PARTITION is like that of DROP TABLE, because each partition is physically stored like a separate table.

Searching for “where user_id = ?” would not be able to take advantage of partition pruning, but it would still be able to use an index on user_id. And if you drop old partitions, the benefit of searching a smaller table could be a good tradeoff.

Q: Regarding 20% selectivity as a threshold for the optimizer preferring a table-scan to an index lookup – is that a tunable?

No, it’s not tunable, it’s a fixed behavior of the query optimizer. If you search for a value and the optimizer estimates that > 20% of rows contain the value you search for, it will bypass the index and just do a table-scan.

For the same reason that the index of a book doesn’t contain very common words, because the list of pages that word appears on would be too long, and flipping back and forth from the back of the book to each listed page would actually be more work than just reading the book.

Also keep in mind my figure of 20% is approximate. Your results may vary. This is not a magic threshold in the source code, it’s just a tendency I have observed.

Q: Regarding generating synthetic test data, it sounds like a pretty easy perl script to write.

Yes, it might be easy to do that for one given table. But every table is different, and you might have hundreds of tables in dozens of applications to generate test data for. You might also want to vary the distribution of data values from one test to another.

Writing a test-data generator for one particular case is easy, so you might reasonably do it as a one-off task. Writing a general-purpose test-data generator that you can use for many cases is more work.

Q: Would love to have the set of URLs cited in the presentation without having to go back and mine them out of the presentation.

Open source message queues:

MySQL Performance Blog articles:

Open source test-data generator:

Load-testing tools for web applications:

Load-testing tools to replay query logs:

Further reading for implementing business rules:

Q: How to best use mysql query cache?

Any cache is best used if you read from it many times for each time you write to it. So we’d like to estimate the average ratio of query cache reads to writes, to estimate how much leverage it’s giving us.

mysql> SHOW GLOBAL STATUS LIKE 'Qcache%';

Check the values for QCache_hits (which are cases when a query result was read from the query cache) over QCache_inserts (which are cases when the desired query result was not in the cache, and had to be run and then the result stored in the cache). I like to see a ratio of 1000% or more (i.e. 10:1 hits to inserts).

If you have a poor ratio, for example less than 1:1 or less than 100%, then consider disabling the query cache, because it may be costing more to maintain it than the performance benefit it’s giving you.

Keep in mind that this is only a guideline, because the calculation I described is only an average. It could be that the queries served by the query cache are very expensive, so using the cached result is a great benefit even if it accounts for a small number of hits. The only way to be certain is to load-test your application under your load, and compare overall performance results with the query cache enabled or disabled, and at different sizes.

Q: How to detect when too much indexes start to affect performance?

Some people are reluctant to create indexes because they have been warned that indexes require synchronous updates when you INSERT, UPDATE, or DELETE rows. Some people also make the generalization that indexes harm writes but benefit reads. Bot of these are not true.

Your DML operations aren’t really updating indexes in real time. InnoDB includes a feature called change buffering, which defers index updates. The change buffer is gradually merged into the index over time. That way, InnoDB can handle a big spike in traffic without it hurting throughput as much. You can monitor how much content in the change buffer remains to be merged:
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_ibuf_size';

It’s also not accurate that indexes hurt writes. UPDATE and DELETE statements usually have a WHERE clause, to apply the changes to particular rows. These conditions use indexes to reduce the examined rows, just like in SELECT statements. But in UPDATE and DELETE statements, it’s even more important to use indexes, because otherwise the statement has to lock a lot of rows to ensure it locks the rows you’re changing.

So I generally say, don’t avoid indexes based only on the number of indexes you have, just make sure your indexes are being employed by the queries you run, and drop indexes that aren’t used. Here are a couple of past blog posts that show how to do this:

Thanks again for attending my webinar!  Here are some more tips:

  • Check out upcoming Percona Training classes in North America and Europe.
  • Join Percona and the MySQL community at our Percona Live.
  • Watch more webinars from Percona in the future!

The post Q&A: Even More Deadly Mistakes of MySQL Development appeared first on MySQL Performance Blog.

OpenPower firmware up on github!

Stewart Smith - Thu, 17/07/2014 - 16:07

With the whole OpenPower thing, a lot of low level firmware is being open sourced, which is really exciting for the platform – the less proprietary code sitting in memory the better in my books.

If you go to https://github.com/open-power you’ll see code for a bunch of the low level firmware for OpenPower and POWER8.

Hostboot is the bit of code that brings up the CPU and skiboot both sets up hardware and provides runtime services to Linux (such as talking to the service processor, if one is present).

Patches to https://github.com/open-power/skiboot/blob/master/doc/overview.txt are (of course) really quite welcome. It shouldn’t be too hard to get your head around the basics.

To see the Linux side of the OPAL interface, go check out linux/arch/powerpc/platforms/powernv -there you can see how we ask OPAL to do things for us.

If you buy a POWER8 system from IBM running PowerKVM you’re running this code.

Update on MySQL on POWER8

Stewart Smith - Thu, 17/07/2014 - 15:46

About 1.5 months ago I blogged on MySQL 5.6 on POWER andtalked about what I had to poke at to make modern MySQL versions run and run well on shiny POWER8 systems.

One of those bugs, MySQL bug 47213 (InnoDB mutex/rw_lock should be conscious of memory ordering other than Intel) was recently marked as CLOSED by the Oracle MySQL team and the upcoming 5.6.20 and 5.7.5 releases should have the fix!

This is excellent news for those wanting to run MySQL on SMP systems that don’t have an Intel-like memory model (e.g. POWER and MIPS64).

This was the most major and invasive patch in the patchset for MySQL on POWER. It’s absolutely fantastic that this has made it into 5.6.20 and 5.7.5 and may mean that these new versions will work out-of-the-box on POWER (I haven’t checked… but from glancing back at my patchset there was only one other patch that could be related to correctness rather than performance).

High Availability with mysqlnd_ms on Percona XtraDB Cluster

MySQL Performance Blog - Thu, 17/07/2014 - 00:11

This is the second part of my series on High Availability with mysqlnd_ms. In my first post, “Simple MySQL Master HA with mysqlnd_ms,” I showed a simple HA solution using asynchronous MySQL replication. This time we will see how to leverage an all-primary cluster where you can write to all nodes. In this post I used Percona XtraDB Cluster, but you should also be able to do the same with MySQL NDB Cluster or Tungsten Replicator.

To start with, here is the mysqlnd_ms configuration I used:mysqlnd_ms_mm.ini.  All of these files are available from my Github repository. Below, I have three Percona XtraDB Cluster nodes, all defined as masters and no slaves. I’ve configured a roundrobin filter where all connections will happen on the first node, in this case192.168.56.44 . In case the first node fails, the second node will be used and so forth until no more nodes are available. Another interesting configuration option here is the loop_before_master strategy whereby if connection or a statement to the current server fails, it will be retried silently on the remaining nodes before returning an error to the user, more on this below.

{ "primary": { "master": { "master_1": { "host": "192.168.56.44", "port": "3306" }, "master_2": { "host": "192.168.56.43", "port": "3306" }, "master_3": { "host": "192.168.56.42", "port": "3306" } }, "slave": { }, "filters": { "roundrobin": [ ] }, "failover": { "strategy": "loop_before_master", "remember_failed": true } } }

Similar to my previous post, I also used a custom INI file for PHP to use, this time aptly namedmaster-master.ini :

mysqlnd_ms.enable = 1 mysqlnd_ms.disable_rw_split = 1 mysqlnd_ms.multi_master = 1 mysqlnd_ms.force_config_usage = 1 mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_mm.ini

A new addition to this configuration ismysqlnd_ms.multi_master , when enabled it would allow you to use all nodes or just one and treat the others as passive. The PHP script I used this time is calledmaster-master.php , it is largely similar tomaster-slave-ng.phpwith a few differences:

  1. There is no need for /tmp/PRIMARY_HAS_FAILED  sentinel as all nodes were writable.
  2. There is no need for /*ms=master*/  SQL hint when validating a connection from connect_mysql function since all nodes acts as master.

So here is a quick test, first with roundrobin filter, after 4 INSERTs, I shutdown 192.168.56.44  which sends my connection to the next server in the configuration,192.168.56.43 . When I started back 192.168.56.44  again, the script resumed connections there. Pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php Last value 3564 from host 192.168.56.44 via TCP/IP and thread id 19 Last value 3565 from host 192.168.56.44 via TCP/IP and thread id 20 Last value 3566 from host 192.168.56.44 via TCP/IP and thread id 21 Last value 3567 from host 192.168.56.44 via TCP/IP and thread id 22 Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 Warning: connect_mysql(): Error while reading greeting packet. PID=23464 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 ERRROR: 192.168.56.43 via TCP/IP [2006] MySQL server has gone away on line 30 Last value 0 from host and thread id 0 Last value 3568 from host 192.168.56.43 via TCP/IP and thread id 1552 Last value 3569 from host 192.168.56.43 via TCP/IP and thread id 1553 [...] Last value 3584 from host 192.168.56.43 via TCP/IP and thread id 1568 Last value 3585 from host 192.168.56.44 via TCP/IP and thread id 18

Here’s another test using the random filter which allows you to write to all nodes, on my mysqlnd_ms_mm.ini above, I just changed roundrobin  torandom . As you can see, all three nodes were being used, of course in random, at the same time you will also see when I shutdown 192.168.56.44  around where the connect_mysql  errors and then the server was used again near the bottom after a started it back up. Still pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php Last value 3590 from host 192.168.56.42 via TCP/IP and thread id 2060 Last value 3591 from host 192.168.56.43 via TCP/IP and thread id 1569 Last value 3592 from host 192.168.56.43 via TCP/IP and thread id 1570 Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 Warning: connect_mysql(): Error while reading greeting packet. PID=23919 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 ERRROR: 192.168.56.43 via TCP/IP [2006] MySQL server has gone away on line 30 Last value 0 from host and thread id 0 Last value 3593 from host 192.168.56.42 via TCP/IP and thread id 2061 Last value 3594 from host 192.168.56.42 via TCP/IP and thread id 2062 Last value 3595 from host 192.168.56.42 via TCP/IP and thread id 2063 Last value 3596 from host 192.168.56.42 via TCP/IP and thread id 2064 Last value 3597 from host 192.168.56.43 via TCP/IP and thread id 1576 Last value 3598 from host 192.168.56.43 via TCP/IP and thread id 1577 Last value 3599 from host 192.168.56.43 via TCP/IP and thread id 1578 Last value 3600 from host 192.168.56.43 via TCP/IP and thread id 1579 Last value 3601 from host 192.168.56.42 via TCP/IP and thread id 2065 Last value 3602 from host 192.168.56.43 via TCP/IP and thread id 1581 Last value 3603 from host 192.168.56.43 via TCP/IP and thread id 1582 Last value 3604 from host 192.168.56.42 via TCP/IP and thread id 2066 Last value 3605 from host 192.168.56.44 via TCP/IP and thread id 19 Last value 3606 from host 192.168.56.43 via TCP/IP and thread id 1583 Last value 3607 from host 192.168.56.44 via TCP/IP and thread id 21

So here are some issues I’ve observed during these tests:

  1. remember_failed  during failover does not work as advertised. Supposedly, a failed node should not be used again for every connection request but in my test, this is not the case. See more from this bug. This means that if you have 2 out of 3 failed nodes in this scenario the overhead would be too big when testing both connections. Perhaps some sort of in memory shared TTL can be used to overcome this? I’m not sure.
  2. If you look closely around line 7 on my last output above the error displayed is kind of misleading. In particular it saysERRROR: 192.168.56.43 via TCP/IP , whereby it was not 192.168.56.43  that failed, it was192.168.56.43 . This is because under the hood, immediately after failure the next node will be cycled to, this is especially true since we have loop_before_master configured. I sure do have a bug on the script that should capture the host_info  properly, but this is something to always keep in mind so you don’t keep scratching your head.

So we’ve seen these two forms of possibilities and they definitely have use cases and advantages. On the other hand because of the issues we have found so far (I’ve reported 4 bugs on the PHP bugs database during the course of these tests including one crashing), I recommend to make sure you test seriously before putting this on production.

The post High Availability with mysqlnd_ms on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Syndicate content