Good practice / Bad practice

Posted on August 31, 2018 by Arjen Lentz

My oldest still-open MySQL bug

a bug This morning I received an update on a MySQL bug, someone added a comment on an issue I filed in November 2003, originally for MySQL 4.0.12 and 4.1.0. It’s MySQL bug#1956 (note the very low number!), “Parser allows naming of PRIMARY KEY”:

[25 Nov 2003 16:23] Arjen Lentz
Description:
When specifying a PRIMARY KEY, the parser still accepts a name for the index even though the MySQL server will always name it PRIMARY.
So, the parser should NOT accept this, otherwise a user can get into a confusing situation as his input is ignored/changed.

How to repeat:
CREATE TABLE pk (i INT NOT NULL);
ALTER TABLE pk ADD PRIMARY KEY bla (i);

'bla' after PRIMARY KEY should not be accepted.

Suggested fix:
Fix grammar in parser.

Most likely we found it during a MySQL training session, training days have always been a great bug catching source as students would be encouraged to try lots of quirky things and explore.

It’s not a critical issue, but one from the “era of sloppiness” as I think we may now call it, with the benefit of hindsight. At the time, it was just regarded as lenient: the parser would silently ignore things it couldn’t process in a particular context. Like creating a foreign key on a table using an engine that didn’t support foreign keys would see the foreign keys silently disappear, rather than reporting an error. Many if not most of those quirks have been cleaned up over the years. Some are very difficult to get rid of, as people use them in code and thus essentially we’re stuck with old bad behaviour as otherwise we’d break to many applications. I think that one neat example of that is auto-casting:

SELECT "123 apples" + 1;
+------------------+
| "123 apples" + 1 |
+------------------+
|              124 |
+------------------+
1 row in set, 1 warning (0.000 sec)

SHOW WARNINGS;
+---------+------+-----------------------------------------------+
| Level   | Code | Message                                       |
+---------+------+-----------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: '123 hello' |
+---------+------+-----------------------------------------------+

At least that one chucks a warning now, which is good as it allows developers to catch mistakes and thus prevent trouble. A lenient parser (or grammar) can be convenient, but it tends to also enable application developers to be sloppy, which is not really a good thing. Something that creates a warning may be harmless, or an indication of a nasty mistake: the parser can’t tell. So it’s best to ensure that code is clean, free even of warnings.

Going back to the named PRIMARY KEY issue… effectively, a PRIMARY KEY has the name ‘PRIMARY’ internally. So it can’t really have another name, and the parser should not accept an attempt to name it. The name silently disappears so when you check back in SHOW CREATE TABLE or SHOW INDEXES, you won’t ever see it.
CREATE TABLE pkx (i INT PRIMARY KEY x) reports a syntax error. So far so good.
But, CREATE TABLE pkx (i INT, PRIMARY KEY x (i)) is accepted, as is ALTER TABLE pkx ADD PRIMARY KEY x (i).

MySQL 8.0 still allows these constructs, as does MariaDB 10.3.9 !

The bug is somewhere in the parser, so that’s sql/sql_yacc.yy. I’ve tweaked and added things in there before, but it’s been a while and particularly in parser grammar you have to be very careful that changes don’t have other side-effects. I do think it’s worthwhile fixing even these minor issues.

Posted on August 31, 2018 by Arjen Lentz

Posted on July 18, 2018July 18, 2018 by Arjen Lentz

Australian government encryption folly

In IDG’s CIO magazine (17 July 2018): Wickr, Linux Australia, Twilio sign open letter against govt’s encryption crackdown ‘mistake’. Not just those few, though, 76 companies and organisations signed that letter.

Learning Lessons

Encryption is critical to the whole “online thing” working, both for individuals as well as companies. Let’s look back at history:

In most countries’ postal legislation, there’s a law against the Post Office opening letters and packages.
Similarly, a bit later, telephone lines couldn’t get tapped unless there’s a very specific court order.

This was not just a nice gesture or convenience. It was critical for

trusting these new communication systems and their facilitating companies, and
enabling people to communicate at greater distances and do business that way, or arrange other matters.

Those things don’t work if there are third parties looking or listening in, and it really doesn’t matter who or what the third party is. Today’s online environment is really not that different.

Various governments have tried to nobble encryption at various points over the past few decades: from trying to ban encryption outright, to requiring super keys in escrow, to exploiting (and possibly creating) weaknesses in encryption mechanisms. The latter in particular is very damaging, because it’s so arrogant: the whole premise becomes that “your people” are the only bright ones that can figure out and exploit the weakness. Such presumptions are always wrong, even if there is no public proof. A business or criminal organisation who figures it out can make good use of it for their own purposes, provided they keep quiet about it. And companies are, ironically, must better than governments at keeping secrets.

Best Practice

Apps and web sites, and facilitating infrastructures, should live by the following basic rules:

First and foremost, get people on staff or externally who really understand encryption and privacy. Yes, geeks.
Use proper and current encryption and signature mechanisms, without shortcuts. A good algorithm incorrectly utilised is not secure.
Only ask for and store data you really need, nothing else.
Be selective with collecting metadata (including logging, third party site plugins, advertising, etc). It can easily be a privacy intrusion and also have security consequences.
Only retain data as per your actual needs (and legal requirements), no longer.
When acting as an intermediary, design for end-to-end encryption with servers merely passing on the encrypted data.

Most of these aspects interact. For example: if you just pass on encrypted data, but meanwhile collect and store an excess of metadata, you’re not actually delivering a secure environment. On the other hand, by not having data, or keys, or metadata, you’ll be neither a target for criminals, nor have anything to hand over to a government.

See also our earlier post on Keeping Data Secure.

But what about criminals?

Would you, when following these guidelines, enable criminals? Hardly. The proper technology is available anyway, and criminal elements who are not technically capable can hire that knowledge and skill to assist them. Fact is, some smart people “go to the dark side” (for reasons of money, ego, or whatever else). You don’t have to presume that your service or app is the one thing that enables these people. It’s just not. Which is another reason why these government “initiatives” are folly: they’ll fail at their intended objective, while at the same time impairing and limiting general use of essential security mechanisms. Governments themselves could do much better by hiring and listening to people who understand these matters.

Posted on July 18, 2018July 18, 2018 by Arjen Lentz

Posted on July 4, 2018July 6, 2018 by Arjen Lentz

A reminder for Symantec certificate users

icon security low Using Chrome or Chromium? Browse to PayPal, right-click while on the page, select Inspect, and click on the Console tab. Bit of an exercise, but it’ll let you see the following notice:

The SSL certificate used to load resources from https://www.paypal.com will be distrusted in M70.
Once distrusted, users will be prevented from loading these resources.
See https://g.co/chrome/symantecpkicerts for more information.

Google hasn’t really been hiding this, they have been publicly talking about it since last year.

“Symantec’s PKI business, which operates a series of Certificate Authorities under various brand names, including Thawte, VeriSign, Equifax, GeoTrust, and RapidSSL, had issued numerous certificates that did not comply with the industry-developed CA/Browser Forum Baseline Requirements.”

The M70 release of Chrome will be in beta from September, and GA in October 2018. Yes, that’s quite soon!

All of this would be fine, if sites and companies would get a move on and deploy new certificates that don’t originate from this problematic source. One would expect PayPal to have done this months ago, but apparently not. It’s important to be aware of this, because come October it’ll be much more than a little inconvenience particularly for e-commerce sites: online payments via PayPal will start failing – unless and until PayPal and others update their certificates.

Posted on July 4, 2018July 6, 2018 by Arjen Lentz

Posted on June 29, 2018 by Arjen Lentz

EFF STARTTLS Everywhere project: safer hops for email

Safe and secure online infrastructure is a broad topic, covering databases, privacy, web applications, and much more, and over the years we’ve specifically addressed many of these issues with information and recommendations.

The Electronic Frontier Foundation (EFF) announced the launch of STARTTLS Everywhere, their initiative to improve the security of the email ecosystem. Thanks to previous EFF efforts like Let’s Encrypt (that we’ve written about earlier on the Open Query blog), and the Certbot tool, as well as help from the major web browsers, there have been significant wins in encrypting the web. Now EFF wants to do for email what they’ve done for web browsing: make it simple and easy for everyone to help ensure their communications aren’t vulnerable to mass surveillance.

STARTTLS is an addition to SMTP, which allows one email server to say to the other, “I want to deliver this email to you over an encrypted communications channel.” The recipient email server can then say “Sure! Let’s negotiate an encrypted communications channel.” The two servers then set up the channel and the email is delivered securely, so that anybody listening in on their traffic only sees encrypted data. In other words, network observers gobbling up worldwide information from Internet backbone access points (like the NSA or other governments) won’t be able to see the contents of messages while they’re in transit, and will need to use more targeted, low-volume methods.

STARTTLS Everywhere provides software that a sysadmin can run on an email server to automatically get a valid certificate from Let’s Encrypt. This software can also configure their email server software so that it uses STARTTLS, and presents the valid certificate to other email servers. Finally, STARTTLS Everywhere includes a “preload list” of email servers that have promised to support STARTTLS, which can help detect downgrade attacks.

The net result: more secure email, and less mass surveillance.

This article is based on the announcement in EFFector, EFF’s newsletter.

Posted on June 29, 2018 by Arjen Lentz

Posted on June 25, 2018June 25, 2018 by Arjen Lentz

Using FastCGI to separate web frontend from application space

server rack FastCGI has many advantages, and it’s our preferred interface when it’s available for the required script language (such as PHP). However, we generally see environments where the php-fpm processes (for instance) are run on the same system as the web server, even though that’s not necessary. In FastCGI space, the web server (say nginx) passes a request through a socket or TCP/IP address:port, and the application delivers back an response (web page, web service JSON result, etc). Obviously sockets only work locally, but the port can be on another machine.

So the question is, how do you arrange your virtual server rack? While splitting things out like that tends to add a few ms of latency, the advantages tend to justify this approach:

Not running PHP on your web-server, and only having connectivity web-server -> application, is good for security:
- should the web server get compromised, it can still only make application requests, not reconfigure the application server in any way;
- should the application get compromised, it will not be able to gain control over the web-server as there is no connectivity path in that direction.
nginx is exceedingly good at “smart proxy”-style tasks, and that’s essentially already what it’s doing with FastCGI anyway;
- you could have multiple application servers, rather than just one. Optionally scaled dynamically as traffic needs change;
- this effectively turns nginx into a kind of load-balancer for the application server back-ends; the way you specify this is through
  1. a single name that resolves to multiple address which are then used in a round-robin fashion (not necessarily optimal as some requests may take longer than others), or
  2. a server group as provided by the ngx_http_upstream_module module.
- nginx can detect when a back-end is unresponsive, and connect to an alternative;
- thinking back to the fact that some requests take longer than others, that’s usually URL-specific. That is, it’s relevant for queries in certain areas (such as reporting, or drill-down type searches) of a web interface. Now, you can tell nginx to use a different FastCGI destination for those, and thus separate (insulate, really) the handling of that application traffic from the rest of the application.
This all fits very neatly with containers, should your infrastructure use those or if you’re considering moving towards containers and micro-services.

The possibilities are quite extensive, but naturally the details and available options will depend on your needs and what you already have in place. We regularly help our clients with such questions. Solutions Architecture, both for evolving existing environments as well as for greenfield projects.

Posted on June 25, 2018June 25, 2018 by Arjen Lentz

Posted on June 4, 2018 by Arjen Lentz

GitHub acquired by Microsoft

Microsoft has just acquired GitHub for $7.5bn. Good or bad?

Injected VC capital was $350m, so ROI for the VCs = 21.4x = very happy VCs.

Microsoft has done excellent work on OSS software in recent years, including on the Linux kernel, PHP, and many others. Just like Oracle continues to put very good effort into MySQL after the Sun Microsystems acquisition many years ago.

But Microsoft is not an Open Source software company. The open source development model is not something they have built into their business “DNA” – processes (actually many companies that only do OSS haven’t got that either). So why GitHub? Combine it with LinkedIn (acquired by Microsoft earlier), and you have developers’ resumes. That’s valuable. It’s a strategically smart move, for Microsoft.

Will GitHub users benefit, and if so, how?

Well, I expect there’ll be more hoovering of “useful” (meta)data by a corporation, which some LinkedIn users will find handy, but I think it’s mainly beneficial to Microsoft rather than users, and this type of gathering and combining data is fundamentally incompatible with basic privacy. It will bite, at some point down the line. It always does.

Fundamentally, GitHub and its use is self-contradictory. Git explicitly enables distributed source code control and truly distributed development, whereas GitHub is very much centralised. Don’t just walk away to something else now, that won’t address the actual problem. Solving it properly will include having bug tracking as part of a repository, and by design not relying on a single central location, or company. The developer community (and companies) must resolve this external dependency.

Posted on June 4, 2018 by Arjen Lentz

Posted on May 22, 2018May 22, 2018 by Arjen Lentz

How not to respect your users’ privacy

Privacy You just run the usual online frameworks, with their extensive plugin range, CDN, Google Analytics, NewRelic, Twitter, Facebook and LinkedIn widgets, and the rest. Then, you display a notice to your users that your site uses cookies and passes some data to third parties (such as Google Analytics and NewRelic) “to enhance the user experience”.

There. Easy, right? You probably didn’t need to change anything at all. Most companies, sites and applications do this. Now tell me: given that you probably agree with at least some of the above, how come you display a notice to your users explaining how you respect their privacy? It can’t both be true.

So yes, this was a test. And most of us fail, including us. Why is this?

Are you asking for and storing more data than you actually require for delivering the product or service that you provide? You can probably only test this by working out the minimum data requirements, questioning each item, and then comparing that list with what you currently actually collect. There’s likely to be a (large) discrepancy.
Are you using multiple analytics and trackers? Why? It does in fact affect the user experience of your site, both in terms of speed as well as privacy. And you probably don’t actually use all that data. So think about what you actually use, and get rid of the rest. That’s a good exercise and an excellent step.
Does your site deliver pixel images for Facebook and others? If so, why?
Does your site show a “site seal” advertising your SSL certificate’s vendor? If so, why?
Does your site set one or more cookies for every user, rather than only logged-in users? If so, why?
Most CMS and frameworks actually make it difficult to not flood users with cookies and third-party tracking. They have become the new bloat. Example: you use a component that includes a piece of javascript or css off a vendor-provided CDN. Very convenient, but you’ve just provided site-usage data to that vendor as well as your users’ IP address.
Respecting privacy is not “business as usual” + a notice. It’s just not.

So, privacy is actually really hard, and for a large part because our tools make it so. They make it so not for your users’ convenience, or even your convenience, but for the vendors of said tools/components. You get some benefit, which in turn could benefit your users, but I think it’s worthwhile to really review what’s actually necessary and what’s not.

A marketing or sales person might easily say “more data is better”, but is it, really? It affects site speed and user experience. And unless you’ve got your analytics tools really well organised, you’re actually going to find that all that extra data is overhead you don’t need in your company. If you just collect and use what you really need, you’ll do well. Additionally, it’ll enable you to tell your users/clients honestly about what you do and why, rather than deliver a generic fudge-text as described in the first paragraph of this post.

A few quick hints to check your users’ privacy experience, without relying on third-party sites.

Install EFF’s Privacy Badger plugin. It uses heuristics (rather than a fixed list) to identify suspected trackers and deal with them appropriately (allow, block cookies, block completely). Privacy Badger provides you with an icon on the right of your location bar, showing a number indicating how many trackers the current page has. If you click on the icon, you can see details and adjust. And as a site-owner, you’ll want to adjust the site it rather than badger!
If you click on the left hand side of your location bar, on the secure icon (because you are already offering https, right?), you can also see details on cookies: both how many and to which domains. If you see any domains which are not yours, they’re caused by components (images, javascript, css) on your page that retrieve bits from elsewhere. Prepare to be shocked.
To see in more detail what bits an individual page uses, you can right-click on a page and select “Inspect” then go to the “Sources” tab. Again, prepare to be shocked.

Use that shock well, to genuinely improve privacy – and thereby respect your users.

Aside from the ethics, I expect that these indicates (cookies, third-party resource requests, trackers, etc) will get used to rank sites and identify bad players. So there’ll be a business benefit in being ahead of this predictable trend. And again, doing a clean-up will also make your site faster, as well as easier to use.

Posted on May 22, 2018May 22, 2018 by Arjen Lentz

Posted on May 7, 2018May 7, 2018 by Arjen Lentz

SSL certificates – not optional

We made a stuff-up over the weekend. Historically we have different SSL certificates for different services in our realm, and last Saturday the certificate for the main website expired. Of course we noticed at that point, but we should have had an internal notification earlier and somehow that had failed. Fixed, but it would have been much better if the front-end hadn’t been temporarily inaccessible. It was, because of HTTPS Strict Transport Security (HSTS). Any browser that had previously talked with our website (rightfully) refuses to talk to it if it doesn’t see a valid certificate. Going back to non-HTTPS is not an option either, for this reason as well as others mentioned below. However, we do have different certificates for different services, so it was only our frontend website that was affected (bad enough), the various other services for our clients fortunately were unaffected.

Going forward though, keeping up-to-date with the certificates and automatically renewing them is much easier now than it used to be. Let’s Encrypt® has been around for a while, but a few months ago they started supporting wildcard certificates. With non-wildcard, one of the ways Let’s Encrypt can verify that you own the site is by doing a challenge/response on port 443 of the website address; Certbot will temporarily listen there and give the appropriate answers. For a wildcard, that doesn’t work, because you can have an infinite number of subdomains and Let’s Encrypt needs to be certain that you actually control these. So in v2 of the API there’s support for DNS based validation. Through special TXT records for which Let’s Encrypt provides the token on every domain request, you can prove that you are in control of the DNS servers for the domain. That’s pretty slick!

There are integrations for many hosting providers as well as Cloudflare, which through a secure mechanism allow Let’s Encrypt to update those records in your DNS, and then validate. As Let’s Encrypt certificates are only valid for 3 months, this is important for the automation. If you run your own DNS servers, you can still automate the DNS based verification process by setting up RFC-2136 remote updates in your DNS server (Bind9 can do it, it’s been around for many years – that said, being an older system, it can be rather finicky to set up).

Let’s Encrypt’s Certbot can take care of the entire updating process, including reloading your webserver’s or reverse proxy’s config. Do make sure you use a recent Certbot, as all the appropriate support is quite recent. We had to grab Certbot from Github the first time as the Debian release hadn’t updated quite far enough yet – it has now.

We think that the EFF has done brilliantly with setting up Let’s Encrypt, and helping everyone move towards a fully encrypted web. Particularly with the cost-factor removed, there’s no reason to not offer HTTPS to users – whether for a website, or an API. Respecting one’s users and their online privacy is really a must-do. Companies that don’t, increasingly look bad. See it this way: going fully HTTPS is an opportunity to make a good first impression. And did you know it also affects your ranking in search engines? Now there’s a good incentive for the PHB…

Do you need an EV certificate? Probably not, as they actually have very little meaning – and even less so with various CAs having distinctly flawed verification processes.

Do you need a site seal from your CA (Certificate Authority)? Really not. It just advertises the CA, and actually enables them to track your users – if you get the seal served from the CA’s URL, that’s every single user. Not cool. So just don’t.

Final hint: if you do get a wildcard certificate from Let’s Encrypt, make sure you include both the wildcard and non-wildcard in the certificate domain names, otherwise it won’t work. So, *.example.com as well as example.com. You may not have noticed that your wildcard certificate always contains these, as many CAs automatically include the appropriate extra item. Certbot just does exactly what you tell it to, so it’s something to be aware of.

Posted on May 7, 2018May 7, 2018 by Arjen Lentz

Posted on April 24, 2018 by Arjen Lentz

PURGE BINARY LOGS with a relative time

Sometimes you want to reduce disk usage on certain servers by adjusting the time that binary logs are kept. Also, some installations of MySQL and MariaDB have suffered from a very-hard-to-catch bug where the binary logs end up not getting automatically expired (basically, the expire_logs_days option doesn’t always work effectively).

A workaround can be scripted, but typically the script would specify the exact datetime to which the logs need to be kept. The reference manual and examples all do this too, quite explicitly, noting:

The datetime expression is in the format ‘YYYY-MM-DD hh:mm:ss’.

However, the actual command syntax is phrased as follows:

PURGE { BINARY | MASTER } LOGS { TO ‘log_name’ | BEFORE datetime_expr }

and that indicates much more flexibility in the parser: “datetime_expr” means that you can put in an arbitrary temporal expression!

So let’s test that, with a functional equivalent of expire_logs_days=14:

FLUSH BINARY LOGS;
PURGE BINARY LOGS BEFORE (NOW() – INTERVAL 14 DAY);

And yep, that works (and the extra parenthesis around the expression are not required, I just did that to show clearly what the expression is).

Now, I’m not the first person to use this construct, there are several posts online from recent years that use an expression with PURGE BINARY LOGS. I’m not sure whether allowing datetime_expr is a modification that was made in the parser at some point, or whether it was always possible. Fact is, the reference manual text (MariaDB as well as MySQL) only provide examples with an absolute ISO datetime: ‘YYYY-MM-DD HH:MM:SS’.

Posted on April 24, 2018 by Arjen Lentz

Posted on February 27, 2018 by Arjen Lentz

Keeping Data Secure

a safe We often get asked about data security (how to keep things safe) and local regulations and certifications regarding same. Our general thoughts on this are as follows

Government regulations tend to end up becoming part of the risk/cost/benefit equations in a business, which is not particularly comforting for customers.
- Example: some years ago an Australian bank had a mail server mis-configured to allow relaying (i.e., people could send phishing emails pretending to legitimately originate from that bank). A caring tech citizen reported the issue to the bank. Somehow, it ended up with the legal department rather than a system/network administrator. The legal eagles decided that the risk to the organisation was fairly low, and didn’t forward it for action at that time. Mind that the network admin would’ve been able to fix up the configuration within minutes.
Appreciate that certifications tend to mainly give you a label to wave in front of a business partner requiring it, they do not make your business more secure.
- Data leaves footprints. For instance, some people use a separate email address for each website they interact with. Thus, when a list of email addresses leaks, saying “it didn’t come from us” won’t hold. That’s only a simple example, but it illustrates the point. Blatant denial was never a good policy, but these days it’ll backfire even faster.
Recent legislation around mandatory data retention only makes things worse, as
- companies tend to already store much more detail about their clients and web visitors than is warranted, and
- storing more activity data for longer just increases the already enlarged footprint.

business advice person So what do we recommend?

Working within the current legal requirements, we still advise to keeping as little data as possible.
- More data does not intrinsically mean more value – while it’s cheap and easy to gather and store more data, if you’re actually actually more strategic about what you collect and store, you’ll find there’s much more value in that.
Fundamentally, data that you don’t have can’t be leaked/stolen/accessed through you. That’s obvious, but still worth noting.
- Our most critical example of this is credit card details. You do not want to store credit card details, ever. Not for any perceived reason. There are sensible alternatives using tokens provided by your credit card gateway, so that clients’ credit cards never touch your system. We wrote about this (again) in our post “Your Ecommerce Site and Credit Cards” last year.
  Why? It’s fairly easy to work out from a site’s frontend behaviour whether it stores credit cards locally, and if it does, you’re much more of a target. Credit card details provide instant anonymous access to financial resources. Respect your clients.
More secure online architecture.
- We’ll do a separate post on this.
If you have a data breach, be sensible and honest about it.
- If your organisation operates in Australia and “with an annual turnover of $3 million or more, credit reporting bodies, health service providers, and TFN recipients, among others.“, the Notifiable Data Breaches (part of the Australian Privacy Act) scheme applies, which came in to force this February 2018, applies to you.

We’re happy to advise and assist. Ideally, before trouble occurs. For any online system, that’s a matter of when, not if.
(And, of course, we’re not lawyers. We’re techies. You may need both, but never confuse the two!)

Posted on February 27, 2018 by Arjen Lentz

Share this:

Learning Lessons

Best Practice

But what about criminals?

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: