Feed aggregator

8 common (but deadly) MySQL operations mistakes and how to avoid them

MySQL Performance Blog - Wed, 12/02/2014 - 21:00

January 22 I gave a presentation on “How to Avoid Common (but Deadly) MySQL Operations Mistakes” for Percona MySQL Webinars. If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions. I answered as many as we had time for during the session, but here are all the questions with my complete answers:

Q: What if I use rsync to sync the mysql dir to another server as a backup?

You can do this only if you shut down the mysqld daemon first. Otherwise, you have a high risk that your copy will be incomplete and impossible to restore.

If you need to create a physical backup on a running instance of MySQL, use Percona XtraBackup. This is safe because this tool carefully synchronizes copying the tablespace with the transaction log, so it assures against getting a partial copy.

Q: Is MIXED binlog-format useful as well as ROW?

The MIXED format defaults to STATEMENT, and switches to ROW for individual events that it detects are non-deterministic and therefore unsafe for statement-based replication. In theory, this should be give you the best of both worlds. But there could still be cases where MySQL replicates an event in statement format because it fails to detect a non-deterministic case.

Q: Percona Server version 5.1 — is this version compatible with pt-mext tool?

Yes, pt-mext works fine with Percona Server 5.1.

Q: We have multiple slaves and slaves to slaves and one slave sends changes to the master. Checksum is breaking the replication.

I am not sure I follow your description of your topology, or how the slave would send changes to the master.

I suggest you contact Percona Oncall and they can help you solve any issue with running checksums.

Q: To verify a restore can the check table extended command be used? This supposedly makes sure the table is consistent. I was wondering if this is still useful in to verify a restore in the latest versions of MySQL.

CHECK TABLE analyzes a table for errors, but it can’t tell if the data is different from the data in the same table on a different instance (e.g. a slave). This can check for physical corruption after a restore, but it can’t verify that the data is correct.

Q: Query optimize really helps? And for large databases like 100GB +, how that will be affected?

By choosing query optimization techniques carefully, some queries can be made to run thousands of times faster.

The larger the table, the more important it is to make sure your queries run efficiently.

Q: Is pt-online-schema-change available in 5.1? All my servers are Percona 5.1.

Yes, pt-online-schema-change works fine with Percona Server 5.1.

Q: What is the best way to perform backup?

Choosing the right tool, schedule, and method for performing backups is a complex problem. I suggest you read a white paper Percona has published on “MySQL Backup and Recovery Best Practices.”

Q: Is there a list of measurable indicators of performance somewhere?

A good tool to help you monitor many performance indicators is Percona Monitoring Plugins. This works with popular open-source monitoring frameworks like Cacti and Zabbix.

Q: How does most of this apply to Amazon’s RDS? Not having direct root access seems like a problem.

You’re right, Amazon RDS is a convenient way to deploy a preconfigured MySQL appliance, but you don’t get to log into the server, and many MySQL tuning variables are not accessible. We can still use some of the tools we are accustomed to using with MySQL, because the tools can access a remote MySQL server. But other tools require local access to the data directory.

Amazon RDS has a management console that allows you to do backups and restores, but if you want to choose specific tools, you may need to migrate to another environment such as Amazon EC2.

Q: A sales person told me that Percona XtraDB Cluster was not fully baked yet about half a year ago, is it ready to go for production now? (we have cluster instances in Amazon for low latency)

PXC is fully baked, has a beautiful golden crust, and smells delicious.

But seriously, we’ve helped many customers deploy PXC over the past year, and it’s working in many production environments.

Q: What buffer size and max_heap_size would you recommend for small 512 Mb RAM server (runs php5-fpm+nginx)?

I suggest you try the Percona Configuration Wizard for MySQL to get you started. It will suggest configuration values appropriate for your server hardware. This won’t be optimized specifically for your site’s workload, but it will be a good start.

Q: Is there any harm in in running pt-table-sync without running pt-table-checksum?

No harm. You can optionally use pt-table-sync to calculate its own checks to find out which rows need to be synchronized. You can even synchronize MySQL instances that aren’t replication master and slave.

Q: Is Percona XtraDB Cluster a viable option when MySQL servers are located in different data centers and connected via shared Internet connections?

Yes, this is a great use case for PXC. Refer to an early proof of concept test we ran to prove that multi-datacenter clusters work, and our white paper on High Level Multi-Datacenter MySQL High Availability.

Q: Can Percona XtraBackup be used to take a backup of a single table?

Yes, you can use partial backup options to make innobackupex back up only specific databases or specific tables.

Q: What methods do you recommend to replicate the binlogs outside of replication? We are working with DRBD any other recommendations?

MySQL 5.6 adds an option to the mysqlbinlog tool to backup binary logs files continously. So you can effectively keep your binlogs backed up on a separate server for safety.

Q: How will pt-table-checksum tolerate binlog-format=MIXED with GTID replication?

pt-table-checksum must use statement-based binlog events for the checksums to work, so it overrides any default binlog row format you have defined on your system.

Q: What are your thoughts on SymmetricDS for db replication over standard MySQL replication?

I have not evaluated SymmetricDS, so I can’t offer a specific opinion about it.

Most alternative solutions fit a specific type of project, and no single solution works for everyone.
So if this one works for your needs, it’s worth taking a look at it.

You should compare it with Tungsten Replicator, which is designed for a similar use case, as a highly-available solution for multi-master and multi-site replication.

Q: A question about indexes: in a table with persons, should I add an index on the column gender?

The best indexes depend on what queries you need to run, and the selectivity of data. If you never run a query that uses the gender column for searching or sorting, there would be no benefit to adding an index on it.

Furthermore, MySQL may still not use an index even if you do search on that column, if the value you search for occurs in a large (typically 20%+) of the rows of the table.

Q: I have tried Percona XtraBackup but I’m not sure about the best way to backup full server and restore only a single database from that full backup. I’m using mysqldump to backup and restore now.

Percona XtraBackup does support a method to restore individual tablespaces, but the steps to do it are laborious, and must be done one table at a time. Restoring all tables from a given database this way is possible, but involves more work that so far is mostly manual.

To be honest, using mysqldump is probably still the easier solution for this.

Q: Does Percona Xtradb Cluster have any replication drift? How can one minimize it?

PXC uses a different method of replication, not the built-in replication of standard MySQL. For purposes of replication drift and lag, you can think of it as similar to ROW based replication with semi-synchronous slaves. PXC should therefore have minimal chance of replication drift all by itself.

Q: How reliable are Percona XtraBackup incremental backups, in combination with binary logs for point in time recovery?

The incremental backups in Percona XtraBackup work very reliably, the most common problem is when you make a mistake and apply the incremental backups in an incorrect order.

Likewise, binary logs are reliable, but you must apply all the binary logs after the last incremental backup has been applied, and you must have a contiguous set of binary logs. For maximum safety, use sync_binlog=1 to assure the last events in the binlog are written to disk.

. . .

Thanks again for attending my webinar! Here are some more tips:

The post 8 common (but deadly) MySQL operations mistakes and how to avoid them appeared first on MySQL Performance Blog.

Amber Alert: Worse Than Nothing?

Baron Schwartz - Wed, 12/02/2014 - 10:00

In the last few years, there’s been a lot of discussion about alerts in the circles I move in. There’s general agreement that a lot of tools don’t provide good alerting mechanisms, including problems such as unclear alerts, alerts that can’t be acted upon, and alerts that lack context.

Yesterday and today at the Strata conference, my phone and lots of phones around me started blaring klaxon sounds. When I looked at my phone, I saw something like this (the screenshot is from a later update, but otherwise similar):

I’ve seen alerts like this before, but they were alerts about severe weather events, such as tornado watches. This one, frankly, looked like someone hacked into the Verizon network and sent out spam alarms. Seriously — what the hell, a license plate? What?

Besides, it says AMBER, which is a cautionary color. It’s not a red alert, after all. It can’t be anything serious, right?

The second time it happened I looked at the details:

This is even less informative. It’s an amber alert (not an urgent color like red). But it’s a sigificant threat to my life or property? I’m supposed to respond to it immediately? Oh wait, my response is to “monitor” and “attend to information sources.” Almost everything on this whole screen is conflicting. What a cluster-fudge of useless non-information!

Later I looked up some information online and found that an amber alert is a child abduction alert. This one turned out to be a false alarm.

All of this raises an obvious question: why on earth would someone think that making a bunch of people’s cellphones quack with a cryptic message would convey useful information? For something as critical as a child abduction, they should get to the point and state it directly. Judging by reactions around me, and people I spoke to, almost nobody knows what an amber alert is. I certainly didn’t. When I tweeted about it, only one person in my network seemed to be aware of it.

How can anyone take something like this seriously? All this does is make people like me find the preferences for alerts and disable them.

In my opinion, this is an example of complete failure in alert design. I don’t think I can overstate how badly done this is. I want to say only a politician could have dreamed up something so stupid…

But then I remember: oh, yeah. Pingdom alerts (we’ll email you that your site is down, but we won’t tell you an HTTP status code or anything else remotely useful.) Nagios alerts (we’ll tell you DISK CRITICAL and follow that with (44% inode=97%) – anyone know what that means?). And so on.

Perhaps the amber alert system was designed by a system administrator, not a politician.

PERFORMANCE_SCHEMA vs Slow Query Log

MySQL Performance Blog - Wed, 12/02/2014 - 03:03

A couple of weeks ago, shortly after Vadim wrote about Percona Cloud Tools and using Slow Query Log to capture the data, Mark Leith asked why don’t we just use Performance Schema instead? This is an interesting question and I think it deserves its own blog post to talk about.

First, I would say main reason for using Slow Query Log is compatibility. Basic Slow query log with microsecond query time precision is available starting in MySQL 5.1, while events_statements_summary_by_digest table was only added in MySQL 5.6 which was out for about a year now but which is still far from complete market domination. It is especially interesting if you look at the low-end market – users who just run some web applications using whatever MySQL Version their hosting provider installed for them. If you look at WordPress Users for example you will see MySQL 5.6 at just 1.3% as of today. As time passes and MySQL 5.6 takes a larger share of the market we surely should add support for Performance Schema based query sampling to Percona Cloud Tools.

The second reason is amount of data available. There is a fair amount of data which Performance Schema digest table providers including some which are not available in Percona Server logs:

mysql> select * from events_statements_summary_by_digest where digest_text like "%sbtest%" \G *************************** 1. row *************************** SCHEMA_NAME: sbtest DIGEST: 2062ac01bc1798df1eebd3e111a22b59 DIGEST_TEXT: SELECT c FROM sbtest WHERE id = ? COUNT_STAR: 882262 SUM_TIMER_WAIT: 933683089690000 MIN_TIMER_WAIT: 106418000 AVG_TIMER_WAIT: 1058283000 MAX_TIMER_WAIT: 1031299058000 SUM_LOCK_TIME: 60853469000000 SUM_ERRORS: 0 SUM_WARNINGS: 0 SUM_ROWS_AFFECTED: 0 SUM_ROWS_SENT: 889205 SUM_ROWS_EXAMINED: 890279 SUM_CREATED_TMP_DISK_TABLES: 0 SUM_CREATED_TMP_TABLES: 0 SUM_SELECT_FULL_JOIN: 0 SUM_SELECT_FULL_RANGE_JOIN: 0 SUM_SELECT_RANGE: 0 SUM_SELECT_RANGE_CHECK: 0 SUM_SELECT_SCAN: 0 SUM_SORT_MERGE_PASSES: 0 SUM_SORT_RANGE: 0 SUM_SORT_ROWS: 0 SUM_SORT_SCAN: 0 SUM_NO_INDEX_USED: 0 SUM_NO_GOOD_INDEX_USED: 0 FIRST_SEEN: 2014-02-10 17:39:39 LAST_SEEN: 2014-02-10 17:40:39 1 row in set (0.00 sec)

BTW – note Rows Sent here not being equal to rows examined while in reality they should be exactly the same for this benchmark. This is the approximate accounting of Performance Schema in action, though.

Now compare it to the sample for the same query in the slow query log in Percona Server

SET timestamp=1392071614; SELECT c from sbtest where id=387872; # Time: 140210 17:33:34.837517 # User@Host: root[root] @ localhost [] Id: 95 # Schema: sbtest Last_errno: 0 Killed: 0 # Query_time: 0.001000 Lock_time: 0.000054 Rows_sent: 1 Rows_examined: 1 Rows_affected: 0 # Bytes_sent: 74 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 90E34CF # QC_Hit: No Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 # InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000 # InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000 # InnoDB_pages_distinct: 1 # Log_slow_rate_type: query Log_slow_rate_limit: 100

What I believe is the most valuable here is the information about Innodb IO which instantly allows us to isolate the query patterns which are disk IO bound as well as information about Bytes Sent which allows to see which queries are responsible for generating high volumes of network traffic.

I wish Performance Schema would be enhanced to return data in something like JSON where for each digest the top waits are accounted as in reality it can be different by the query. Some queries might be waiting on IO other on Locks, yet another could be bound by some specific mutexes. Having exact information about what limits performance of queries of the specific type would be a gem.

The third reason for using Slow Query Log is using placeholders. Note in the query above has “SELECT c FROM sbtest WHERE id = ?” which is not very convenient – I can’t even run EXPLAIN for such query to see what could be the reason for its slowness. Log contains exact queries and we are able to show exact queries in reports (pt-query-digest and Percona Cloud Tools) or you can opt for seeing only query digests if you do not want to see the values for privacy/security reasons. Picking the constant for a query with worse plan usually works very well to check out worse case scenario.

This might look like very simple problem – why you can’t just come up with ID and reconstruct the query but for more complicated queries with multiple conditions it is virtually impossible to reconstruct the realistic query.

Now in theory you can look up actual query from events_statements_history_long and join the data together, however it does not really work at the high query rates as it is very likely rare queries will not have a sample available in the history table.

The forth reason is support for prepared statements. Enable prepared statements and you will not see the actual query in the digest. This may or may not be an issue for your application but it further limits usability of this feature. I can’t count on simply looking at events_statements_summary_by_digest to always find which queries are responsible for majority of the load.

The fifth reason is performance or actually not much of a reason. I really think Performance Schema overhead is reasonable for most workloads. In simple queries benchmark which I’ve done:

sysbench --test=oltp --oltp-test-mode=simple --num-threads=16 --max-requests=0 --max-time=60 --mysql-user=root --oltp-table-size=1000000 run

On my old server I got some 20.2K QPS with Performance Schema Disabled and 19.4 QPS with Performance Schema enabled which is overhead of less than 5%.

For most workloads paying 5% to have insight about what is happening with the system is a very fair trade.

The slow query log overhead actually can be much larger. The moderate level of details “microtime” resulted in 15.1K queries and log_slow_verbosity=”full” takes this value down to 11.6K having over 40% overhead. Note I designed this test as worse case scenario and for more complicated queries the overhead is likely to be less (while overhead with Performance Schema can stay the same or even increase depending on what queries are doing).

Some people set long_query_time to some non zero value to reduce amount of queries logged. This is bad idea because the workload logged will be very different from your real one – chances are majority of your load comes from simple quick queries which will be very unrepresented with non zero long query time with only outliers logged.

A much better idea is to enable Sampling which is available in latest version of Percona Server – this way only one out of every so many queries will be logged:

mysql> set global log_slow_rate_limit=100; Query OK, 0 rows affected (0.00 sec) mysql> set global log_slow_rate_type="query"; Query OK, 0 rows affected (0.00 sec)

This will get one out ever 100 queries randomly logged which should give you good idea of your workload without such skew. It works well unless you have some rare and very complicated queries which impact your workload disproportionally and which you can’t ignore for performance analyses. To deal with this situation we added slow_query_log_always_write_time option to Percona Server, which allows you to always log such queries in the log even if they would not be selected because of sampling.

Enabling sampling 1/100 queries for this workload with full level of details I get 19.8K queries giving us overhead less than 2% which is even less than Performance Schema and selecting 1/1000 queries to be logged I can get overhead to about 1%. So with Slow Query Log I can make a choice between accuracy and overhead.

I wish Performance Schema would offer something similar – instead of figuring out what probes and statistic tables I need (not all of them are enabled by default) I could just chose to get the sampled data and play with accuracy vs overhead instead of missing the data all together.

Summary: There is a great amount of data in Performance Schema in MySQL 5.6 and Percona Server 5.6 though there are a number of reasons that you also might not want to discard the old and proven tools based on the slow query log just yet.

P.S., If you’re interested in seeing Query Performance Analyses in action please join Vadim’s webinar on Wednesday Feb 12. If you missed it check out recording at that same link.

The post PERFORMANCE_SCHEMA vs Slow Query Log appeared first on MySQL Performance Blog.

WITHer Recursive Queries?

MySQL Performance Blog - Tue, 11/02/2014 - 21:00

Over the past few years, we’ve seen MySQL technology advance in leaps and bounds, especially when it comes to scalability. But by focusing on the internals of the storage engine for so long, MySQL has fallen behind regarding support for advanced SQL features.

SQLite, another popular open-source SQL database, just released version 3.8.3, including support for recursive SQL queries using the WITH RECURSIVE syntax, in compliance with SQL:1999.

Why is this significant? It means that MySQL is now the only widely-used SQL implementation that does not support recursive queries. Fifteen years after it was defined in the SQL standard, almost every other SQL database of note has supported this feature:

Only Informix among common RDBMS brands lacks support for WITH RECURSIVE, though Informix still supports recursive queries with the non-standard CONNECT BY syntax.

MySQL has been requested to support common table expressions using the WITH syntax for a long time:

The CTE-style queries would allow us to share more advanced SQL queries with those that are being used by other brands of database, and do it with standard SQL instead of proprietary functions or tricks.

The most common example of a query solved with a recursive CTE is to query a tree of unknown depth. But there are quite a few other useful applications of this form of query, all the way up to fancy stunts like a query that generates a fractal design like the Mandelbrot Set. Recursion is powerful.

Is it time for the MySQL community to raise the priority of the CTE feature requests for MySQL? Visit the links I gave above at bugs.mysql.com, and add your voice by clicking the Affects Me button.

The post WITHer Recursive Queries? appeared first on MySQL Performance Blog.

Bloom Filters Made Easy

Baron Schwartz - Tue, 11/02/2014 - 10:00

I mentioned Bloom Filters in my talk today at Strata. Afterwards, someone told me it was the first time he’d heard of Bloom Filters, so I thought I’d write a little explanation of what they are, what they do, and how they work.

But then I found that Jason Davies already wrote a great article about it. Play with his live demo. I was able to get a false positive through luck in a few keystrokes: add alice, bob, and carol to the filter, then test the filter for candiceaklda.

Why would you use a Bloom filter instead of, say…

  • Searching the data for the value? Searching the data directly is too slow, especially if there’s a lot of data.
  • An index? Indexes are more efficient than searching the whole dataset, but still too costly. Indexes are designed to minimize the number of times some data needs to be fetched into memory, but in high-performance applications, especially over huge datasets, that’s still bad. It typically represents random-access to disk, which is catastrophically slow and doesn’t scale.

Generating test data for MySQL tables

MySQL Performance Blog - Mon, 10/02/2014 - 18:00

One of the common tasks requested by our support customers is to optimize slow queries. We normally ask for the table structure(s), the problematic query and sample data to be able to reproduce the problem and resolve it by modifying the query, table structure, or global/session variables. Sometimes, we are given access to the server to test the queries on their live or test environment. But, more often than not, customers will not be able to provide us access to their servers or sample data due to security and data privacy reasons. Hence, we need to generate the test data ourselves.

A convenient way of generating test data is visiting http://generatedata.com which provides a web form where you can provide the columns and its corresponding data types, and turn them into test data. The website is capable of generating data in various formats such as Excel, HTML and JSON but for MySQL, you can choose either the CSV or SQL format.

The website limits you to generate up to a maximum of 100 rows. Fortunately, the web application is available for download so you can install it in your test server to generate data up to 100,000 rows instead.

Below are instructions on installing generatedata on CentOS 6:

1. Install Apache, PHP, MySQL and wget

# yum -y install mysql-server mysql httpd php php-mysql wget

2. Ensure Apache and MySQL runs on startup

# chkconfig httpd on # chkconfig mysqld on

3. Start Apache and MySQL

# service httpd start # service mysqld start

4. Create a MySQL user and database for the application

# mysql mysql> GRANT ALL PRIVILEGES ON generatedata.* TO generatedata@localhost IDENTIFIED BY ‘my-weak-password-please-do-not-use-this-password-in-a-production-environment’; mysql> FLUSH PRIVILEGES; mysql> CREATE DATABASE generatedata; mysql> quit;

5. Download generatedata source and place it in the web directory

# wget https://github.com/benkeen/generatedata/archive/3.0.8.tar.gz -O 3.0.8.tar.gz # tar xzvf 3.0.8.tar.gz # mv generatedata-3.0.8 /var/www/html/generatedata

6. Make the application readable and writable by Apache

# chown -R apache:apache /var/www/html/generatedata

7. Open your browser to access the web application. In my case, it’s http://192.168.1.91/generatedata. It will prompt you to supply mysql credentials. Supply the MySQL credentials and click the “Continue” button to proceed.

8. The next screen will prompt you to create a settings.php file which will be used to store the custom settings for the application which include access credentials to MySQL. Click the “Create File” button to proceed.

9. You can setup an anonymous account, single account with login or multiple accounts. For testing purposes, select the default option and click the “Continue” button.

10. The next step is to install the plugins used for generating data and exporting them in a particular format. Click the “Install Plugins” button to continue.

11. Once the plugins are generated, click the “Continue” button.

12. Finally, generatedata is installed. Click “Go to script” button to continue.

Using generatedata

Below are samples of what generatedata can produce:

  • Names: “Karleigh K. Valencia”, “Claire W. Woodard”, “Yvonne Wyatt”
  • Date/Time: “2013-11-29 09:19:38″, “2013-11-29″, “11:05:53″
  • Phone/Fax: “(191) 919-9508″, “(847) 807-6360″, “(366) 902-0912″
  • Street Address: “P.O. Box 345, 8566 Mi St.”, “587-8731 Ultrices. Avenue”, “4612 Eu St.”
  • Latitude/Longitude: “37.5863, -20.25932″, “-48.29183, -69.31125″, “-78.67594″
  • Text: “Fusce aliquet magna a neque.”, “Nullam ut nisi a odio”, “mollis nec, cursus a, enim.”
  • Currency: “$3364.88″, “$7849.22″, “1217.18″
  • Alphanumeric: “RU384″, “GL941″, “HI144″
  • AutoIncrement: 1, 2, 3, 4, 5, 6, 7
  • Number Range: 1, 3, 8, 2, 14, 7
  • Custom List: “Dr.”, “Mr.”, “Mrs.”, “Ms.”

For our example, let’s use the employees table structure from the Employees sample database.

1. Create the employees table under the test database

# mysql test mysql> CREATE TABLE employees ( emp_no INT NOT NULL, birth_date DATE NOT NULL, first_name VARCHAR(14) NOT NULL, last_name VARCHAR(16) NOT NULL, gender ENUM ('M','F') NOT NULL, hire_date DATE NOT NULL, PRIMARY KEY (emp_no) );

2. Enter the columns and corresponding datatypes in the web form. In this example, using the appropriate data type for each column is straightforward:

  • emp_no: AutoIncrement, Start at 10000, Increment 1
  • birth_date: Date, From 01/01/1960, 12/31/1990, Format Y-m-d
  • first_name: Names, Alex(any gender)
  • last_name: Names, Smith(surname)
  • gender: Custom List, Exactly 1, Values M|F
  • hire_date: Date, From 01/01/2000, To 12/11/2014, Format code Y-m-d

3. Export the data in MySQL Insert format. Under the EXPORTS TYPES section, select SQL tab and enter the following information:

  • Database table: employees
  • Database Type: MySQL
  • Remove the tick on: Include CREATE TABLE query and Include DROP TABLE query

4. Click the “Generate” button to generate the desired data. You can now copy the generated output and paste this in the MySQL console. On the other hand, you can also choose to download the generated rows as a file.

CSV output
You can also opt to generate CSV output and use LOAD DATA INFILE to import the CSV data to your table. Under the EXPORT TYPES section, select the CSV tab. Inside the Generate frame, select “Prompt to download” and click the “Generate” button.

Upload the CSV file under the /tmp directory of your MySQL server and import it via the MySQL console:

# mysql test LOAD DATA INFILE '/tmp/dataDec-11-2013.csv' INTO TABLE employees FIELDS TERMINATED BY '|' IGNORE 1 LINES;

Generating more than 100,000 rows at a time
Suppose you want to generate more than 100,000 rows, you can modify settings.php under /var/www/html/generatedata/settings.php and append $maxGeneratedRows = <maxGeneratedRows>; to the configuration file. The example below increases the maximum number of generated rows to 150,000.

<?php $dbHostname = 'localhost'; $dbName = 'generatedata'; $dbUsername = 'generatedata'; $dbPassword = 'my-weak-password-please-do-not-use-this-password-in-a-production-environment'; $dbTablePrefix = 'gd_'; $encryptionSalt = 'ZPd'; $maxGeneratedRows = 150000;

Final Notes
There is just one major caveat to generating test data. Nothing compares to the actual data for reproducing query performance problems because its data and index cardinality will be entirely different from the generated test data. So, even if you’re able to fix the problem in your test environment, the solution may or may not apply to your production environment.

The post Generating test data for MySQL tables appeared first on MySQL Performance Blog.

Legacy isn't a dirty word

Mark Callaghan - Mon, 10/02/2014 - 13:52
Pardon the rant but the winter storm has kept me in a hotel away from home for a few days. I exchanged email this week with someone pitching a solution to a problem I don't have (MySQL failover). But by "I" I really mean the awesome operations team with whom I share an office. The pitch got off to a bad start. It is probably better to compliment the supposed expertise of the person to whom you are pitching than to suggest they are no different than COBOL hackers working on Y2K problems.

Unfortunately legacy is a bad word in my world. Going off topic, so is web scale. I hope we can change this. The suggestion that MySQL was a legacy technology was conveyed to me via email, x86, Linux and a laptop. Most of those have been around long enough to be considered legacy technology. DNA and the wheel are also legacy technology. Age isn't the issue. Relevance is determined by utility and efficiency.

Remember that utility is measured from a distance. It is easy to show that one algorithm can do one narrow operation much faster than as implemented in existing products. But an algorithm shouldn't be confused with a solution. A solution requires user eduction, documentation, skilled operations, trust, client libraries, backup, monitoring and more.

Followup questions to ‘What’s new in Percona XtraDB Cluster 5.6′ webinar

MySQL Performance Blog - Sat, 08/02/2014 - 00:00

Thanks to all who attended my webinar yesterday.  The slides and recording are available on the webinar’s page.  I was a bit overwhelmed with the amount of questions that came in and I’ll try to answer them the best I can here.

Q: Does Percona XtraDB Cluster support writing to multiple master?

Yes, it does.  However, locks are not held on all nodes while a transaction is in progress, so there is a possibility for conflicts between simultaneous writes on different nodes.  Certification is the process that Galera uses to protect your data integrity, and the problem is pushed back to the application to handle in the form of deadlock errors.   As long as you understand this and the conditions where you may see more deadlocks (and your application can properly handle them), multi-node writing is fine.  See more detail on this in a blog post I did a while ago.

Q: Is there any limitation to scale writes?

Yes, there is no sharding in Percona XtraDB Cluster, all data goes to all nodes.  You are limited by replication throughput, or more specifically, transaction apply throughput, just like with standard MySQL replication.  However, PXC offers true parallel apply, so many transactions can be applied in parallel, which should generally give better throughput than conventional master/slave.

Q: Are the WAN segments feature only used to reduce network bandwidth? Would DB write performance be the same before Galera 3.x since each commit still has to be ack by all the servers across the WAN?

If you listened closely to something Alexey said during the webinar, it’s actually possible that WAN segments will improve commit times because the master node won’t have spend extra time sending the transaction to each remote node.  I would expect overall write performance to vary somewhat compared to not using segments, but there are probably factors that may influence it either way including number of remote nodes, bandwidth, latency, etc.

With or without segments, all nodes must acknowledge the transaction, this is a vital part of Galera’s implementation and cannot be relaxed.

Q: What is the max number of cluster servers across a WAN recommended before you start seeing a diminishing return in performance b/c of sync replication?

Good question, I don’t know.  This would be fun to test and would make a good blog post.  I’ll put it on my list.  I’d bet that segments may increase such a limit, but it probably depends on a lot of factors.

Practically I haven’t seen a cluster with more than a half-dozen nodes, but that doesn’t mean that’s the limit.  I’d expect a big cluster would be around 10-12 nodes, but in reality that’s just a gut feeling more than any hard evidence.

Q: Should I be worried about the auto_increment bug you mentioned? I wasn’t planning to upgrade our cluster to Percona XtraDB Cluster 5.6 soon.

Not unless you regularly add AUTO_INCREMENT columns to existing tables using ALTER TABLE.  Note that the bug was also fixed in 5.5.34.

Q: Does Percona XtraDB Cluster support SphinxSE Storage Engine?

Nope.  Currently the Codership-mysql patches are for Innodb only.  Any other transactional storage engine can theoretically be supported provided you can implement prioritized transactions in it.

However SphinxSE is not transactional, so its support would be similar to MyISAM at best (which is very rudimentary and not likely to change).  It would be easy to add such support and it’s possible that the Maria Galera Cluster guys are already considering it since Maria ships with that storage engine.

UPDATE: It was pointed out to me that SphinxSE doesn’t store any actual data, but instead just acts as a client to a distinct Sphinx index (which I just wasn’t thinking of).  In that sense, it doesn’t need Galera support:  each node can just have a SphinxSE table, to which any DML will update the Sphinx index directly, no replication required by the cluster.  SphinxSE should work fine with PXC.

Q: To convert from mysql to Percona XtraDB Cluster 5.6, do you now recommend first upgrading to mysql 5.6 and then converting to PXC?

It depends, but I’d consider migrating to PXC to be at least equivalent to the risk of upgrading a major MySQL version and it should be tested thoroughly.  To limit your risk, breaking the upgrade into smaller pieces may be prudent, but it is not strictly necessary.  The further away you are from 5.6 (like say you are on 5.1 or 5.0), the more likely I’d recommend separate steps.

Q: Do “WAN segments” effect the quorum in any way?

No, group communication continues as before, it’s just that the routing of the actual writeset content is different.

Q: Since each galera node is identical, they all have the same storage footprint. What are best practices for expanding storage on galera nodes when we are low on free space?

On a cluster in steady state, it should be easy to do rolling changes.  That is, take one node out of the cluster, add some extra storage, and put it back in, repeat.  Ideally such rolling changes are done quickly enough and are non-destructive to the datadir so you can IST on restart.

Q: Is there any change to wsrep_retry_autocommit behavior in Percona XtraDB Cluster 5.6, or any plans to apply this setting to explicit transactions in order to avoid cluster “deadlocks”?

wsrep_retry_autocommit has not changed to my knowledge.  The reasoning behind not applying this methodology to explicit transactions was that it was generally assumed that explicit transactions may have application logic being applied between the statements and could we assume it was safe to simply retry the same transaction if the data changed underneath?  For example:

BEGIN; SELECT * FROM USERS WHERE ID=100 FOR UPDATE; # Application applies some business logic based on the row contents here??? UPDATE USERS SET AUTHORIZED=1 WHERE ID=100; COMMIT;

If the UPDATE USERS was an autocommit, then wsrep_retry_autocommit would simply re-broadcast the same writeset (unmodified) with the new RBR row if there was a conflict.  It does not re-run the statement.  Would it be safe to do this if the explicit transaction got a cluster deadlock?  We don’t know.  If the user record was modified (which is what a cluster deadlock would indicate), should they still be authorized?

Q: I heard about xtrabackup-v2. What is the difference between the previous one?

This was actually released in 5.5.  The big changes are that the new SST method allows for encrypted SST transfers, compression, rate limiting, progress metering, and other goodness like that.  It is not backwards compatible with the old xtrabackup method, and both the donor and joiner must have the new method available (i.e., be running a PXC version that has both) for it to work.

Q: Can a cluster which use rsync to be switched to xtrabackup in a rolling like mode?

Yes, no problem.  The SST method is dictated by the JOINER, meaning whatever the joiner node’s wsrep_sst_method is, that’s what is used (the donor obeys this even if its wsrep_sst_method is different).  You can just go change the wsrep_sst_method in all the config files and it will be used next time an SST happens (since SST would only happen on a restart anyway).  Just be careful that you test xtrabackup first, since it requires proper mysql credentials to be set in wsrep_sst_auth.

Q: Do you saw or installed or recommended Percona XtraDB Cluster 5.6 for production now or wait for a while ?

Now that 5.6 is GA, we’ve just passed a milestone where I’d start to recommend 5.6 as a starter for new Percona XtraDB Cluster deployments going forward.  Certainly 5.5 is more battle-tested and understood and there still may be good reasons to use it, but from now forward, I’m expecting that the need for 5.5 will only diminish, not increase.

For existing PXC 5.5 deployments, I’d probably recommend waiting a bit unless there is some great need for the new release, but I don’t have any specific bugs or issues I’m thinking of, just the general newness of 5.6.

Q: are there any specific warnings or cautions to be aware of with PXC 5.6 when the db uses triggers and/or procedures, beyond the cautions in MySQL itself?

Nothing specific, these should work fine with Percona XtraDB Cluster.  In RBR these only run on the master node (the node taking the write) and the changes done by them are incorporated into the transactions writeset.

Q: So all the certifications come directly from the applying nodes back to the node that sent the data?  Or does it relay back through the relay node?

Actually, certification results are not shared on the cluster.  Certification is deterministic (or should be) and all nodes are expected to either pass or fail a transaction without any mixed results, hence there is no communication about pass/failure and only the master node (the node that originated the transaction) actually does anything about certification failure (i.e., increments a lcf failure counter and deadlock error for the client).   Past bugs in PXC have resulted in non-deterministic certification in some edge-cases, and this can then obviously lead to node inconsistencies.

What is sent back is an acknowledgement of receipt of the transaction (which is much smaller) at the replication stage (pre-certification) and my understanding is that all nodes will reply back to the originating node more or less directly.  I tend to think of this communication as “out-and-back”, but in reality it’s more nuanced than this; for example an acknowledgement may be piggy-backed with a new transaction from that node.

The act of replication delivers the transaction payload to all nodes, and all nodes acknowledge the transaction to each other, AND within this process a consistent GTID for the transaction is established efficiently.  HOW precisely this happens is, as Alexey would say, is an implementation detail.   Segments simply modify the method of transaction delivery, but I believe most of the other details are more or less the same.

Q: we are on Percona XtraDB Cluster 5.5 and have bursts of a large number of simultaneous updates to the same table which often triggers lock wait timeouts. Could binlog_row_image=minimal help reduce the frequency of these lock waits?

Lock wait timeouts in the cluster will be happening to transactions on a single node while it waits for other transactions on that same node to commit.   Any way that you can reduce the commit time should, in theory, reduce the lock_waits.

Since part of that commit wait is the synchronous replication, it stands to reason that this is perhaps the bulk of the wait time.  I haven’t measured actual commit rates comparing full vs minimal row images, so I can’t tell you if this would decrease replication time or not.  I can imagine a scenario where a very large transaction would benefit from minimal row images (i.e., by being much smaller and thereby taking less time to transmit over the network), but I’d expect that when the transactions are already small and single-row to begin with, it would make less of an impact.

Are you guys using innodb_flush_log_at_trx_commit=1?  Relaxing that (and relying on the cluster replication for durability) may improve your commit times a lot.

Q: Any Percona XtraDB Cluster 5.6 or Galera 3.x settings to increase write performance across the cluster as well as clusters in different geographic locations?

I’m not aware of anything that is necessarily Galera 3 specific that is tunable, though there are things baked into Galera 3 that may help, such as the certification improvements.   As I mentioned above, minimal row images may help by simply reducing the volume of data that needs transmitting.

The post Followup questions to ‘What’s new in Percona XtraDB Cluster 5.6′ webinar appeared first on MySQL Performance Blog.

Calvin Sun on MySQL at Twitter, Percona Live 2014 and Facebook

MySQL Performance Blog - Thu, 06/02/2014 - 18:00

Twitter’s Calvin Sun

Twitter’s Calvin Sun (@Calvinsun2012) is looking forward to the fast-approaching Percona Live MySQL Conference and Expo this April 1-4 in Santa Clara, Calif. He’ll be speaking, yes, but he’s also looking forward to learning from his peers – particularly those at Facebook. Both companies, he explained, are in the rather unique positions of unprecedented rapid growth and ever-expanding performance demands.

Calvin is a senior engineering manager at Twitter, where he manages MySQL development. Prior to that, he managed the InnoDB team at Oracle. Calvin also worked for MySQL Inc. from 2006 to 2008, managing MySQL storage engines development. He has over 15+ years of database development experience.

He said MySQL is the data storage technology behind most Twitter data: the social graph, timelines, users, direct messages, as well as tweets themselves. With more than 230 million monthly active users and 500 million tweets per day, Twitter pushes MySQL and InnoDB a lot further than most companies. I had an opportunity to chat with Calvin via Skype the other day. Here’s that interview.

Tom: Aside from your talk, “MySQL at Twitter,” which we’ll get to soon, what are you looking forward to the most at this year’s Percona Live MySQL Conference and Expo?

Calvin: This year I’m particularly eager to attend the Facebook sessions because Twitter faces some of the same challenges as they do. Learning from each other within the MySQL community is very valuable and I look forward to this event every April.  For me, and my team, we always get the most excited about the opportunities to learn from the experiences of other vendors and other MySQL users.

We are also very pleased to see Oracle join this conference for the second year in a row. Their attendance is very beneficial to the MySQL community because we want to better understand what’s going on in MySQL development at Oracle. And hopefully we’ll also have some visibility and perhaps even some influence into their roadmap in terms of upcoming releases.

Multi-datacenter support and fast failover are also areas we are interested in. And of course since I’m from the InnoDB team within MySQL, I’d like to see what’s new there, too.

And finally, I’m looking forward to participating in a panel Q&A with our DBA team. The session is called “MySQL at Twitter” and in it we’ll field questions on how MySQL is used within Twitter, along with the challenges we have now and have faced in the past. It will be led by my colleague Will Gunty, a senior MySQL DBA.

Tom: Let’s take a look at some of those Facebook sessions you said you’re interested in – they have several spread out over each day of the conference:

Wow, that is a fantastic collection of sessions just from Facebook alone!

Calvin: Facebook is probably the largest MySQL installation in the world and they are in a lot of ways facing some of the same challenges we are facing. Facebook has a bigger MySQL team than we do and are also a little bit ahead of us, especially on MySQL 5.6 and InnoDB compression, so we can learn new things from those sessions – in addition to having direct conversations with members of the Facebook team outside of those sessions, as we always do. They have created a lot of innovative solutions to address their problems.

Tom: You’ll be leading a session titled “Scaling Twitter with MySQL” along with a couple members of your team.  With Twitter’s explosive growth since its founding in 2006, and as that activity continues to grow with Twitter being used in new ways, how has your team kept pace? And how are you preparing for the future?

Calvin: It is a huge scale. During my session I’ll talk about how MySQL evolved within Twitter over the years. Many small companies just starting out, including Twitter in the early days, use master-slave replication. But as Twitter started to grow, that sort of setup could no longer meet our needs, so over the years we starting developing our own sharding and replication framework. We still use MySQL as our basic data storage, though, and working creatively like this enabled us to scale fast.

For us, usage is in real-time and is event driven.  For example, the World Cup or the Olympics or the Super Bowl – those types of very popular events send peaks up several times higher than average. For that reason we have to always be prepared for whatever sort of usage levels we might encounter.

Tom: Who should make it a point to attend your session?

Calvin:
I would say people working at both large companies as well as startups, especially Internet companies that want to use MySQL and who need to learn how to prepare for growth. Also anyone who needs to ensure that their company’s architecture, their infrastructure, is in fact prepared for rapid growth if needed. They’ll learn how Twitter prepared for and addressed specific issues.

Tom: What will they walk away with?

Calvin: They’ll walk away with 1) how to build a scalable infrastructure using MySQL and 2) how to plan for growth.

Tom: What are some of the bigger projects or challenges you and your team face in 2014?

Calvin: One of the bigger things we are in the process of doing is MySQL 5.6 migration. But for Twitter overall, I think that as it grows, ensuring reliability and scalability has always been a priority for us. We are also looking to do a better job at failover – above and beyond failover between our data centers. So we’ll be looking to improve that kind of operational efficiency this year.

Another area is optimization. We’re looking into the structure of our MySQL installation within Twitter to see which areas we can improve; make more efficient. One area is data compression, which we currently don’t use in production. We want to see what we can achieve in reducing data sizes and saving disk spaces.

Tom: Before we started the interview you mentioned that you’ve been attending these MySQL conferences since 2006. Why do you feel it is important to be there personally?

Calvin: Because I strongly support the MySQL community in general and try to do my best to participate and serve the community. For example last year I gave presentations on MySQL/InnoDB in Beijing and Seoul, plus OSCON and MySQL Connect. Those in addition to Percona Live and related events give me a chance to share my own experiences and lessons learned with the community. As I’ve already mentioned it also gives me the chance to learn.

Tom: Thanks very much for speaking with me today, Calvin. I look forward to seeing you in April! And as a small token of my appreciation, here’s a discount code you can share with your friends and colleagues (everyone reading this can take advantage of it, too). Just have them enter SeeMeSpeak on the Percona Live 2014 registration page and they’ll automatically save 10 percent.

Calvin: Thanks, Tom. We really appreciate the hard work Percona and the Conference Committee does in terms of organizing these MySQL conferences with all of the excellent sessions. See you in April!

The post Calvin Sun on MySQL at Twitter, Percona Live 2014 and Facebook appeared first on MySQL Performance Blog.

Ghosts of MySQL Past: Part 4, A million features for Enterprise

Stewart Smith - Thu, 06/02/2014 - 10:31

Continuing on from Part 3….

SAP is all about Enterprises and as such, used all the Enterprise features of databases. This is a much different application than every user of MySQL so far (which were often web applications, and increasingly being used at scale). This was “MySQL focuses on Enterprise”.

This is 2003: before VIEWS, before stored procedures, before triggers, before cursors, before prepared statements, before precision math, before SQL access to logs, before any real deadlock info, before you could use a / in a table name, before any performance statistics, before full unicode and before partitioning, row level binary logs, fractional seconds in temporal types and before EXPLAIN for INSERT/UPDATE/DELETE (which we still don’t have). There was a long way to go before MySQL would fit with SAP. A long, long way.

But how were we going to run our cell phone networks on MySQL? There was an answer to that as MySQL AB acquired an Ericsson spinoff, Alzato for it’s clustered database server, NDB, which would become MySQL Cluster. This is separate from replication and originally designed for GSM HLR databases. Early versions had two types of availability: five nines (99.999%) or zero nines (0.000%) – it either worked or exploded. Taking a database engine that wasn’t fully mature and integrating it with MySQL is not a task for the faint of heart. It wouldn’t be the last time this was attempted and it would prove to be the most successful.

In October of 2004, MySQL 4.1 went GA – the first version with MySQL Cluster and incidentally, just before I joined MySQL AB on the MySQL Cluster team.

By this time, MySQL AB had hired just about all outside contributors, as, well, what better job interview than a patch? The struggle to have a good relationship with outside contributors started a long time ago, it’s certainly nothing new.

2005 was the year of MySQL 5.0 instability. At the time, 5.0 was the development version and it was plagued by instability. It was really hard for development teams (such as the SAP team) to work on deliverables when every time they’d rebase their work on a new 5.0 trunk version, the server would explode in new and interesting ways.

In fact, the situation was so bad that Continuous Integration was re-invented. Basically, it was feature++; stability–. The fact that at the same time even less stable development trees existed did not bode well.

Analyze MySQL Query Performance with Percona Cloud Tools: Feb. 12 webinar

MySQL Performance Blog - Thu, 06/02/2014 - 00:00

Next week (Wednesday, February 12 at 10 a.m. Pacific) I will host a webinar titled “Analyze MySQL Query Performance with Percona Cloud Tools.” Percona Cloud Tools, currently in beta, reveals new insights about MySQL performance enabling you to improve your database queries and applications. (You can request access to the free beta here).

For webinar attendees, Percona will raffle five (5) 60-minute MySQL query consulting sessions with me to analyze your Percona Cloud Tools query data and provide feedback and performance suggestions. Read below for how to win.

In the webinar I will cover:

  • Setting up Percona Cloud Tools Query Analytics
  • Deciphering advanced database query metrics
  • Identifying the best opportunities to achieve significant query performance improvements

Set up Percona Cloud Tools

Getting started with Percona Cloud Tools takes only a few steps (after creating your account). Users of Percona Toolkit can even skip step 1.

  1. Install Percona Toolkit (2.2.6 or newer) on each MySQL database server
  2. Run “pt-agent –install” on each database server and paste your API key when asked
  3. Log in to https://cloud.percona.com and enable Query Analytics

Win a query consulting session

As I mentioned earlier, Percona will raffle five (5) 60-minute MySQL query consulting sessions with me to analyze your Percona Cloud Tools / Query Analytics data and provide feedback and performance suggestions. To qualify for the raffle, you must complete the following by 12 a.m. UTC on Thursday, February 13:

Good luck and see you on February 12th!

In the meantime, here are some related posts I’ve written about Percona Cloud Tools:

 

About Percona Cloud Tools
Percona Cloud Tools is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications.

The post Analyze MySQL Query Performance with Percona Cloud Tools: Feb. 12 webinar appeared first on MySQL Performance Blog.

16000 active connections – Percona Server continues to work when others die

MySQL Performance Blog - Wed, 05/02/2014 - 10:59

We just published results with improvements in Thread Pool in Percona Server:
Percona Server: Thread Pool Improvements for Transactional Workloads
Percona Server: Improve Scalability with Thread Pool

What I am happy to see is that Percona Server is able to handle a tremendous amount of user connections. From our charts you can see it can go to 16000 active connections without a decline in throughput.

In fact, in our experiments we used 25000 connections and even more, so I think now we face OS limitations in handling connections as opposed to Percona Server’s limitations. From the posts above, both MySQL and MariaDB drop to the floor somewhere along the way.

You may ask a valid question if 16000 connections are really necessary. I would say that in regular workloads we do see this, but I propose to think about this as an electrical fuse, one that provides overcurrent protection. See my post SimCity outages, traffic control and Thread Pool for MySQL.

Even you do not have many connections on regular days, one mention from Yahoo’s homepage or from Hacker News will change that in a moment. Thread Pool will act as protection from overloading a database.

There is also another application for Thread Pool. I often see that application layer creates its own connection pool (think Java), and with ~20-30 app servers it can get to ~3000 connections, and this is already problematic… but not for Percona Server!

The post 16000 active connections – Percona Server continues to work when others die appeared first on MySQL Performance Blog.

Ghosts of MySQL Past: Part 3

Stewart Smith - Wed, 05/02/2014 - 10:24

See Part 1 and Part 2.

We rejoin our story with a lawsuit. While MySQL suing Progress NuSphere is not perhaps the first GPL lawsuit that comes to mind, it was the first time that the GPL was tested in court. Basically, the GEMINI storage engine was a proprietary storage engine bundled with a copy of MySQL. Guess what? The GPL was found to be valid and GEMINI was eventually GPLed, and it didn’t really go anywhere after that. Why? Probably some business reasons and also, InnoDB was actually rather good and there wasn’t a lawsuit to enforce the GPL there, making business relationships remarkably easier.

In 2003 there was a second round of VC funding. The development team increased in size. One thing that MySQL AB did was invest heavily in technology. I think this is what gave the company a lot of value, you need to spend money developing technology if you wish to be seen as a giant and if you wish to be able to provide a high level of quality service to customers.

MySQL 4.0 went GA in March 2003 while at the same time there were 4.1 and 5.0 development trees. Three concurrent development trees may seem too many – and of course, it was. But these were heady days of working on features that MySQL was missing and ever wanting to gain users and market share. Would all these extra features be able to be added to MySQL? Time would tell…

The big news of 2003 for MySQL? A partnership with SAP. There was this idea: “run SAP on MySQL” which would push the MySQL Server in a bit of an odd direction. For a start, the bootstrap SQL script for SAP created something like 10,000 tables and loaded gigabytes of data – before you even started setting it up. In 2003, on MySQL 4.0, this didn’t go so well. Why was SAP interested? Well, then you’d be able to run SAP without paying Oracle licenses!

MySQL, SQL, NoSQL, Open Source And Beyond: a Google Tech Talk

Baron Schwartz - Wed, 05/02/2014 - 10:00

I’ve been invited to give a Tech Talk at Google next Thursday, February 13th, from 11:00 to 12:30 Pacific time. Unfortunately the talk won’t be streamed live, nor is it open to the general public, but it will be recorded and hosted on YouTube afterwards. I’ve also been told that a small number of individuals might be allowed to attend from outside Google. If you would like me to try to get a guest pass for you, please tweet that to @xaprb.

The topic is, roughly, databases. Officially,

MySQL, SQL, NoSQL, and Open Source in 2014 and Beyond

Predictions are hard to get right, especially when they involve the future. Rather than predict the future, I’ll explain how I view the relational and NoSQL database worlds today, especially the MySQL product and community, but including open-source and proprietary data management technologies about which I know enough to get in trouble. I’ll explain how my self-image as a practitioner and contributor has changed, especially as I’ve moved from consulting (where I tell people what they should do) into being a company founder (where I sometimes wish someone would tell me what to do). As for the future, I’ll express my preferences for specific outcomes, and try to be careful what I wish for.

I am excited and a bit nervous. A Google Tech Talk! Wow! Thanks for inviting me, Google!

ClusterControl for Percona XtraDB Cluster Improves Management and Monitoring

MySQL Performance Blog - Wed, 05/02/2014 - 00:03

ClusterControl for Percona XtraDB Cluster is now available in three different versions thanks to our partnership with Severalnines. ClusterControl will make it simpler to manage and monitor Percona XtraDB Cluster, MySQL Cluster, MySQL Replication, or MySQL Galera.

I am very excited about our GA release of Percona XtraDB Cluster 5.6 last week. As Vadim described in his blog post announcing the release, we have brought together the benefits of Percona Server 5.6, Percona XtraBackup and Galera 3 to create a drop-in compatible, open source, state-of-the-art high availability MySQL clustering solution. We could not have done it without the close cooperation and help of our partners at Codership. To learn more, join us on February 5th for a webinar entitled “What’s New in Percona XtraDB Cluster 5.6” presented by Jay Janssen.

While these new features in Percona XtraDB Cluster are highly valuable, we regularly receive feedback from users that they would like more tools to make it easier to manage and monitor their MySQL clusters. In response to that feedback, I’m pleased to announce our partnership with Severalnines to make ClusterControl more available to Percona XtraDB Cluster users and, in fact, all of our Support customers who use MySQL clustering solutions.

Percona XtraDB Cluster users now have three choices:

On the last option, Percona ClusterControl is a privately branded version of ClusterControl supplied to us by Severalnines. With a Percona Support contract covering Percona XtraDB Cluster, MySQL Cluster, or MySQL Galera, we provide support for your deployment and use of Percona ClusterControl. Percona has worked directly with Severalnines to ensure full compatibility of ClusterControl Community Edition with Percona XtraDB Cluster. The Percona Support team will provide assistance with the configuration and administration of Percona ClusterControl. We will help you get setup and then we will help you make the most of your new management and monitoring tools for the high availability solution of your choice.

We will also provide support for Severalnines ClusterControl Enterprise Edition when it is purchased from Severalnines or Percona and used with a cluster deployment covered by a Percona Support subscription that includes coverage of Percona XtraDB Cluster, MySQL Galera, or MySQL Cluster. ClusterControl Enterprise Edition is the de-facto management solution for Galera-based clusters and it provides the highest levels of monitoring and management capabilities for Percona XtraDB Cluster users.

Percona ClusterControl will provide our Support customers broad flexibility to deploy, monitor, and manage Percona XtraDB Clusters, supporting on-premise and Amazon Web Services deployments, as well as deployments spread across multiple cloud vendors. Percona ClusterControl can be used to automate the deployment of Percona XtraDB Cluster in cross-region, multi-datacenter setups. Percona ClusterControl offers a full range of management functionality for clusters, including:

  • Node and cluster recovery
  • Configuration management
  • Performance health checks
  • Online backups
  • Upgrades
  • Scale out
  • Cloning of clusters
  • Deploying HAProxy load balancers

Percona ClusterControl can be used with multiple clustering solutions including MySQL Cluster, MySQL Replication, MySQL Galera, MariaDB Galera Cluster, and MongoDB. Percona Support subscribers that elect coverage for these clustering solutions receive Percona ClusterControl with their subscriptions.

Just as we made Percona XtraDB Cluster a much better solution by partnering with Codership, we now expect it will be a much easier to manage and monitor solution thanks to our partnership with Severalnines. Whether you need Percona Support or not, I hope you will take advantage of these powerful management and monitoring solutions which are now available for Percona XtraDB Cluster. To learn more about using ClusterControl to manage Percona XtraDB Cluster, please join us on February 19th for a webinar entitled “Performance Monitoring and Troubleshooting of Percona XtraDB Cluster” presented by Peter Boros.

The post ClusterControl for Percona XtraDB Cluster Improves Management and Monitoring appeared first on MySQL Performance Blog.

Ghosts of MySQL Past: Part 2

Stewart Smith - Tue, 04/02/2014 - 10:39

This continues on from my post yesterday and also contains content from my linux.conf.au 2014 talk (view video here).

Way back in May in the year 2000, a feature was added to MySQL that would keep many people employed for many years – replication. In 3.23.15 you could replicate from one MySQL instance to another. This is commonly cited as the results of two weeks of work by one developer. The idea is simple: create a log of all the SQL queries that modify the database and then replay them on a slave. Remember, this is before there was concurrency and everything was ISAM or MyISAM, so this worked (for certain definitions of worked).

The key things to remember about MySQL replication are: it was easy to use, it was easy to set up and it was built into the MySQL Server. This is why it won. You have to fast forward to September in 2010 before PostgreSQL caught up! It was only with PostgreSQL 9.0 that you could have queryable read-only slaves with stock standard PostgreSQL.

If you want to know why MySQL was so much bigger than PostgreSQL, this built in and easy to use replication was a huge reason. There is the age of a decent scotch between read-only slaves for MySQL and PostgreSQL (although I don’t think I’ve ever pointed that out to my PostgreSQL friends when having scotch with them… I shall have to!)

In 2001, when space was an odyssey, the first GA (General Availability) release of MySQL 3.23 hit the streets (quite literally, this was back in the day of software that came in actual physical boxes, so it quite probably was literally hitting the streets).

For a good piece of trivia, it’s 3.23.22-beta that is the first release in the current bzr tree, which means that it was around this time that BitKeeper first came into use for MySQL source code.

We also saw the integration of InnoDB in 2001. What was supremely interesting is that the transactional storage engine was not from MySQL AB, it was from Innobase Oy. The internals of the MySQL server were certainly not set up for transactions, and for many years (in fact, to this day) we talk about how a transactional engine was shoehorned in there. Every transactional engine since has had to do the same odd things to, say, find out when a transaction was being started. The exception here is in Drizzle, where we finally cleaned up a bunch of this mess.

Having a major component of the MySQL server owned and controlled by another company was an interesting situation, and one that would prove interesting in a few years time.

We also saw Mårten Mickos become CEO in 2001, a role he would have through the Sun acquisition – an acquisition that definitively proved that you can build an open source company and sell it for a *lot* of money. It was also the year that saw MySQL AB accept its first round of VC funding, and this would (of course) have interesting implications: some good, some less ideal.

(We’ll continue tomorrow with Part 3!)

Quick installation guide for Percona Cloud Tools for MySQL

MySQL Performance Blog - Tue, 04/02/2014 - 04:54

Here in Percona Support, we’re receiving several requests per day for help with Percona Cloud Tools installation steps.

So I decided to prepare a step-by-step example of the installation process with some comments based on experience.  Percona Cloud Tools is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, you’ll unlock new information about your database and how to improve your applications. You can sign up here to request access to the free beta, currently under way.

Some notes
  • It’s recommended to do the installation under root.
  • If you’re installing pt-agent as root then .pt-agent.conf should be placed in root $HOME
  • You could became root with “sudo -s” command and in this case your homedir is still unchanged and not homedir of root user.
  • So I would strongly recommend to login as root or to “sudo -i” to become root and check if your HOME and PWD are the same: env | egrep -i ‘home=|pwd=’
Sign Up

Go to the URL and sign up (or log-in): https://cloud.percona.com/

Prepare

Copy your API key:
On this URL: https://cloud.percona.com/api-key
Or by menu: Agents -> API Key

Percona-Toolkit Download

Download Percona-Toolkit:
http://www.percona.com/downloads/percona-toolkit/LATEST/
For example:

  • DEB: for Debian/Ubuntu like systems
  • RPM: for RedHat, CentOS
Percona-Toolkit Installation

CentOS:
yum install http://www.percona.com/redir/downloads/percona-toolkit/LATEST/RPM/percona-toolkit-2.2.6-1.noarch.rpm

Ubuntu:
wget http://www.percona.com/redir/downloads/percona-toolkit/LATEST/deb/percona-toolkit_2.2.6_all.deb
dpkg -i percona-toolkit_2.2.6_all.deb

pt-agent installation

Run this command:
pt-agent --install --user={mysql username} --password={password} --api-key={API Key copied from web site}

Note: add there your username, password and API Key

pt-agent installation output

You should see this:

============================================================================= pt-agent --install --user={user} --password={password} --api-key={api-key} Step 1 of 11: Verify the user is root: OK Step 2 of 11: Check Perl module dependencies: OK Step 3 of 11: Check for crontab: OK Step 4 of 11: Verify the API key: OK Step 5 of 11: Connect to MySQL: OK Step 6 of 11: Check if MySQL is a slave: NO Step 7 of 11: Create a MySQL user for the agent: OK Step 8 of 11: Initialize /etc/percona/agent/my.cnf: OK Step 9 of 11: Initialize /root/.pt-agent.conf: OK Step 10 of 11: Create the agent: OK Step 11 of 11: Run the agent: 2014-01-29T20:12:17 INFO Starting agent pt-agent has daemonized and is running as PID 13506: --lib /var/lib/pt-agent --log /var/log/pt-agent.log --pid /var/run/pt-agent.pid These values can change if a different configuration is received. OK INSTALLATION COMPLETE The agent has been installed and started, but it is not running any services yet. Go to https://cloud.percona.com/agents#node1 to enable services for the agent. =============================================================================

Agent is installed

pt-agent configuration

Goto: https://cloud.percona.com/agents and select your newly created agent

Then enable Service:

  • Agents -> Services -> Query Analytics -> On (Push Off button and configure values)
  • Save

Done!

Now wait a few min, check Status Log: Agents -> select agent -> Status Log
You should see there: “Agent OK”

Now wait (~3 min) until pt-agent will add jobs to crontab.
You should see there: “Services OK”

Then run some slow queries and wait (~3-5 min).
If everything is ok then you should see there “Exit: 200″

Now check Query Analytics.

There is a “Help Me” button at the bottom of the page so you can ask for Support if you have any questions and our Support team will gladly help you.

Enjoy!

The post Quick installation guide for Percona Cloud Tools for MySQL appeared first on MySQL Performance Blog.

Percona Toolkit collection: pt-visual-explain

MySQL Performance Blog - Mon, 03/02/2014 - 18:00

This is the first in a series of posts highlighting a few of the seldom-used but still handy Percona Toolkit tools.

Have you ever had a problem understanding the EXPLAIN statement output? And are you the type of person who would rather use the command line than a GUI application? Then I would recommend that you use Percona’s pt-visual-explain toolkit. This is one of many Percona Toolkit tools that is useful for those who want to have a different view and an easier time understanding the EXPLAIN output aside from the usual table and vertical views.

As described in the documentation – http://www.percona.com/doc/percona-toolkit/2.2/pt-visual-explain.html#description

pt-visual-explain reverse-engineers MySQL’s EXPLAIN output into a query execution plan, which it then formats as a left-deep tree – the same way the plan is represented inside MySQL.”

Our sample query:

SELECT City.Name, Country.Name FROM City JOIN Country on (City.CountryCode=Country.Code)

In Table view the EXPLAIN output would look like:

$ mysql -Dworld -e "explain SELECT City.Name, Country.Name FROM City JOIN Country on (City.CountryCode=Country.Code)" +----+-------------+---------+------+---------------+-------------+---------+--------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+------+---------------+-------------+---------+--------------------+------+-------+ | 1 | SIMPLE | Country | ALL | PRIMARY | NULL | NULL | NULL | 183 | | | 1 | SIMPLE | City | ref | CountryCode | CountryCode | 3 | world.Country.Code | 1 | | +----+-------------+---------+------+---------------+-------------+---------+--------------------+------+-------+

While on vertical view, the EXPLAIN output would be:

$ mysql -Dworld -e "explain SELECT City.Name, Country.Name FROM City JOIN Country on (City.CountryCode=Country.Code)\G" *************************** 1. row *************************** id: 1 select_type: SIMPLE table: Country type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 183 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: City type: ref possible_keys: CountryCode key: CountryCode key_len: 3 ref: world.Country.Code rows: 1 Extra:

Using the tool…

The EXPLAIN output in pt-visual-explain tree format, I will be showing you three different ways on how to use this handy toolkit.

Piping input into pt-visual-explain, like the one shown below:

$ mysql -Dworld -e "explain SELECT City.Name, Country.Name FROM City JOIN Country on (City.CountryCode=Country.Code)" | pt-visual-explain

Parsing input to a file and then pt-visual-explain will reverse-engineer the EXPLAIN output to a query execution plan and formats it to a tree view, as shown below:

$ mysql -Dworld -e "explain SELECT City.Name, Country.Name FROM City JOIN Country on (City.CountryCode=Country.Code)" > explain.out $ cat explain.out id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE Country ALL PRIMARY NULL NULL NULL 183 1 SIMPLE City ref CountryCode CountryCode 3 world.Country.Code 1 $ pt-visual-explain explain.out

Create a file containing the query and then parse the file while pt-visual-explain connects to the mysql instance, like the example shown below:

$ echo "SELECT City.Name, Country.Name FROM City JOIN Country on (City.CountryCode=Country.Code)" > select.out $ pt-visual-explain -Dworld --connect select.out

The tree structure below would be the result if you run any of the three methods shown above.

JOIN +- Bookmark lookup | +- Table | | table City | | possible_keys CountryCode | +- Index lookup | key City->CountryCode | possible_keys CountryCode | key_len 3 | ref world.Country.Code | rows 1 +- Table scan rows 183 +- Table table Country possible_keys PRIMARY

So how do we interpret the EXPLAIN plan? Simple. The query will do a table scan on Country table for 183 rows. For each row, it will do an index lookup into the City.CountryCode index with the value from Country.Code, then do a bookmark lookup into the City table.

Please read related posts about limits and errors, and Extended EXPLAIN here and here for more in-depth understanding of MySQL EXPLAIN plans.

And here’s another example using an EXPLAIN DELETE statement (works on MySQL version 5.6.3 and above) :

$ mysql -Dworld -e "EXPLAIN DELETE from Cities where CountryCode IN (select Code from Country where LifeExpectancy < 70.0)" | pt-visual-explain DEPENDENT SUBQUERY +- Filter with WHERE | +- Bookmark lookup | +- Table | | table Country | | possible_keys PRIMARY | +- Unique subquery | key Country->PRIMARY | possible_keys PRIMARY | key_len 3 | ref func | rows 1 +- Filter with WHERE +- Table scan rows 581 +- Table table Cities

Alternatively, one can use - -format=dump option when parsing input into pt-visual-explain and it will create a Data::Dumper output which I think may be harder to understand but may be useable in some other cases.

Conclusion

The next time you want to check your execution plan information you can use pt-visual-explain to compare the result from the output of a regular EXPLAIN statement. You can also use MySQL Workbench, a GUI application that can produce beautified representation of the visual explain with colors, numbers on lines and other information.

For further reading please consult the sections under Modules and Algorithm on the pt-visual-explain documentation.

Other helpful information: pt-visual-explain is derived from Maatkit specifically mk-visual-explain. You can download Percona Toolkit from our download page – http://www.percona.com/downloads/ or get pt-visual-explain only by using $wget percona.com/get/pt-visual-explain.

The post Percona Toolkit collection: pt-visual-explain appeared first on MySQL Performance Blog.

Past, Present and future of MySQL and variants Part 1: Ghosts of MySQL Past

Stewart Smith - Mon, 03/02/2014 - 10:39

You can watch the video of my linux.conf.au 2014 talk here: http://mirror.linux.org.au/linux.conf.au/2014/Wednesday/28-Past_Present_and_future_of_MySQL_and_variants_-_Stewart_Smith.mp4

But let’s talk about things in blog form rather than video form :)

Back in 1979, there was UNIREG. A text UI to records (rows) in a database (err, table). The reason I mention UNIREG is that it had FoRMs which as you may have guessed by my capitalization there is where the FRM file comes from.

In 1986, UNIREG came to UNIX. That’s right kids, the 80×24 VT100 interface to ISAM (Index Sequential Access Method – basically rows are written in insert order and indexes point to them) came to UNIX. There was no generic query language, just FoRMs and reports. In fact, to this day, that 80×24 text interface is stored in the FRM file by MySQL and never ever used (I’ve written about this before).

Then there was this mSQL thing around the 1990s, which was a small SQL server (with source) but not FOSS. Originally, Monty W plugged in his ISAM engine but it wasn’t quite the right fit… so in 1995, we had MySQL 1.0 and MySQL AB was founded.

Fast forward a bit and in 1996 we had MySQL 3.19 and development continued. It managed to gain features, performance, ports to different operating systems and CPU architectures and, of course, stability.

It wasn’t until the year 2000 that MySQL adopted the GPL. This turned out to be a huge step in the right direction for increased adoption. At the time, this was a huge risk for the company, essentially risking all the revenue of the company on making the software more free.

This was the birth of the dual licensing business model. You see, the client library (libmysql) was also GPL, which meant it was easy to use if your application was also GPL, but if you were going to distribute your application and it wasn’t under a GPL compatible license (there was also a FOSS exception so that things like PHP could use it) then you needed a license.

Revenue from licensing was to be significant throughout the entire history of MySQL AB.

(We’ll continue this in part 2 tomorrow)

The accurately named “best tofu dish ever”: Mapo Tofu

Stewart Smith - Sun, 02/02/2014 - 21:08

I have been having a craving for Mapo Tofu for a few days now and I finally could be bothered attempting to make it for the first time. It’s odd that for a dish I really do quite like, this was my very first time attempting to make it.

Well… I was googling around and found this recipe: http://www.kitchenriffs.com/2011/03/best-darn-tofu-dish-ever-mapo-tofu.html. I didn’t have fermented black beans, so I left that out. I used the broth from the dried shiitake mushrooms (with a tiny bit of Massel stock added) and did take the option of adding pepper (chilli) flakes.

It was absolutely delicious and really quite simple to make.

The simmering of the tofu (I just used the Woolworths Macro Classic Tofu that you can pick up at any Woolworths store… because I’m often just plain lazy when it comes to acquiring tofu) was something I haven’t done before. It turns out that makes the outside of it more likely to keep its shape, which it rather did. Neat trick!

Did I mention this was delicious? I wish there was more!

Syndicate content