Tag Archives: database

Server Ownership Legalities

As I reported via Twitter late last week, we encountered an issue that got some of our mail delivery delayed by about a day and a half. I’ll explain more about what happened as I believe in openness on these matters, and also the experience has educational content for others.

Our mail server doesn’t have direct external interaction, it’s shielded by two relays that handle both the inbound MX and the outbound queue. This setup works remarkably well in terms of exposure to spam and other malicious activity. As previously discussed, it appears that it’s more difficult to make mail server infra more resilient without expending lots more time/effort and infrastructure expenditure. Just because of the way the common tools for mail delivery and imap are built, having two or more of each in a semi-active setup gets quite complex. Complexity is in itself a risk so it has to be considered in relation to the costs and risks of the alternatives.

When our mail server becomes unavailable, incoming mail is queued, and we have backups so no mail is actually lost. The cost is the time and effort involved in getting a full replacement server up and running from a backup. That can be optimised/prepared to a point, but mail is still a lot more data than most other web infrastructure so shuffling that data around just takes a while. Some outbound queues from our online services (for instance our client services system Redmine) goes straight to the relays so there is less impact there. Apart from backups elsewhere, have redundancy for the mailserver: an identical instance on a server in the same DC (those servers are our own).

So what happened last week? Our servers resided in a rack which was leased from the DC by another company through which we “sublet” the rack space, connection and bandwidth. This is a common scenario, as small businesses don’t generally need a full rack and datacentres prefer dealing with fewer/bigger clients and set their pricing accordingly. The intermediate company became unavailable which put our servers in a temporary legal limbo. The DC only gives access to the primary lessor of the rack, so us asking for access to move our servers wasn’t straightforward. Of course we had documentation to back up our assertion as to which equipment was ours, but as you can imagine that legal avenue takes longer to resolve – fortunately the owner of the intermediate company communicated well with the operations manager at the DC and that’s how we were able to retrieve our gear relatively quickly.

We’re still in the same DC, but are now a direct client of the DC in a shared rack. That may appear odd in the context of what I wrote before, but since we first moved there several years ago the DC has improved their infrastructure management to the point where servicing smaller clients is not a resource drain and thus they have sensible plans available. That’s brilliant given the market, but it’s actually quite unusual – commonly companies aim for bigger clients rather than recognising an opportunity to server small clients.

While this was going on we were of course working on a separate replacement mailserver, built from the backups. Since normally we’d have a replacement server already set up, the “build from scratch using backups” is a slower path. As it turned out, we got our servers back online around the same time we had our replacement ready, and for various reasons it was easier to just use the original servers at that point.

From this story you can work out several useful lessons, remembering that it’s always a trade-off. At some point the cost of being able to mitigate a particular scenario is so high that it’s not worthwhile. You just have to plan for several most common possibilities, with a slower recovery from backup as the last resort.

There’s also another piece of information which is highly relevant for Australian businesses, and that’s the Australian Personal Property Securities Register. Legislation for this system was enacted in 2009, the scheme is only since January 2012 and there’s a two-year transitional period. Remember how “posession is 9/10ths of the law” ? Well, if you ignore PPS it’s now 10/10ths. It is the primary and only register and reference for ownership of items (and data!) that are in care of another legal entity. So we own some servers, that reside in a rack of another company in a DC. We register ourselves, and then our servers (short description and serial numbers and such) and associated data content with PPS, against both the intermediate company (which had legal charge over the rack they reside in) and the hosting company (where the items physically reside). This way, we have a claim that indeed the stuff is ours, but also since the PPS is the only register we have ensure that noone else (inadvertently¬†or even maliciously) claims to own something that’s actually ours. If you have a similar situation (and remember that data is as important as physical items!) you want to register it with PPS. The registration process is somewhat convoluted, but it is free – searches cost. Remember IANAL (I am not a lawyer) so do your research and get appropriate legal advice. If you’re not in Australia, other similar legislation may apply and you’ll want to check to make sure you’re safe.

 

Open Query training at Drupal DownUnder 2012

DrupalDownUnder 2012 will be held in Melbourne Australia 13-15 January. A great event, I’ve been to several of its predecessors. People there don’t care an awful lot for databases, but they do realise that sometimes it’s important to either learn more about it or talk to someone specialised in that field. And when discussing general infrastructure, resilience is quite relevant. Clients want a site to remain up, but keep costs low.

I will teach pre-conference training sessions on the Friday at DDU:

The material is made specific to Drupal developers and users. The query design skills, for instance, will help you with module development and designing Drupal Views. The two half-days can also be booked as a MySQL Training Pack for $395.

On Saturday afternoon in the main conference, I have a session Scaling out your Drupal and Database Infrastructure, Affordably covering the topics of resilience, ease of maintenance, and scaling.

I’m honoured to have been selected to do these sessions, I know there were plenty of submissions from excellent speakers. As with all Drupal conferences, attendees also vote on which submissions they would like to see.

After DDU I’m travelling on to Ballarat for LinuxConfAU 2012, where I’m not speaking in the main program this year, but will have sessions in the “High Availability and Storage” and “Business of Open Source” miniconfs. I’ll do another post on the former – the latter is not related to Open Query.

SQL Locking and Transactions – OSDC 2011 video

This recent session at OSDC 2011 Canberra is based on part of an Open Query training day, and (due to time constraints) without much of the usual interactivity, exercises and further MySQL specific detail. People liked it anyway, which is nice! The info as presented is not MySQL specific, it provides general insight in how databases implement concurrency and what trade-offs they make.

See http://2011.osdc.com.au/SQLL for the talk abstract.

LexisNexis open sources code for Hadoop alternative

On Password Strength

XKCD (as usual) makes a very good point – this time about password strength, and I reckon it’s something app developers need to consider urgently. Geeks can debate the exact amount of entropy, but that’s not really the issue: insisting on mixed upper/lower and/or non-alpha and/or numerical components to a user password does not really improve security, and definitely makes life more difficult for users.

So basically, the functions that do a “is this a strong password” should seriously reconsider their approach, particularly if they’re used to have the app decide whether to accept the password as “good enough” at all.

Update: Jeff Preshing has written an xkcd password generator. Users probably should choose their own four words, but it’s a nice example and a similar method could be used by an app to give “password suggestions” that are still safe.

The actual range and storage size of an INT

What’s the difference between INT(2) and INT(20) ? Not a lot. It’s about output formatting, which you’ll never encounter when talking with the server through an API (like you do from most app languages).

The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.

At Open Query we tend to pick on things like INT(2) when reviewing a client’s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere – even if not in the exact spot where we pick on it. So it’s a case of pattern recognition.

A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit the jackpot: INT(22) or something, which in fact was storing a unix timestamp converted to int by the application, for the purpose of, wait for this, user’s birth date. There’s a number of things wrong with this, and the result is something that doesn’t work properly.

Currently, the unix epoc/timestamp when stored in binary is a 32 bit unsigned integer, with a range from 1970-01-01 to somewhere in 2037. Note the unsigned qualifier, otherwise it already wraps around 2004.

  • if using signed, you’d currently only find out with users younger than 7 or so. You may be “lucky” to not have any, but kids are tech savvy so websites and systems in general may well have entries with kids younger than that.
  • using a timestamp for date-of-birth tells me that the developers are young ;-) well that’s relative, but in this: younger than 40. I was born in 1969, so I am very aware that it’s impossible to represent my birthdate in a unix timestamp! What dates do you test with? Your own, and people around you. ‘nuf said.
  • finally, INT(22) is still an INT, which for MySQL means 32 bits (4 bytes) and it happened to be signed also.

So, all in all, this wasn’t going to work. Exactly what would fail where would be highly app code (and date) dependent, but you can tell it needs a quick redesign anyway.

I actually suggested checking the requirements whether having just a year would suffice for the intended use (can be stored in a YEAR(4) field), this reduces the amount of personal data stored and thus removes privacy concerns. Otherwise, a DATE field which can optionally be allowed to not have a day-of-month (i.e. only ask for year/month) as that again can be sufficient for the intended purpose.

Storage Miniconf Deadline Extended!

The linux.conf.au organisers have given all miniconfs an additional few weeks to spruik for more proposal submissions, huzzah!

So if you didn’t submit a proposal because you weren’t sure whether you’d be able to attend LCA2010, you now have until October 23 to convince your boss to send you and get your proposal in.

Storage miniconf at linux.conf.au 2010

Since you were going to linux.conf.au 2010 in Wellington, NZ anyway in January of next year, you should submit a proposal to speak at the data storage and retrieval miniconf.

If you have something to say about storage hardware, file systems, raid, lvm, databases or anything else linux or open source and storage related, please submit early and submit often!

The call for proposals is open until September 28.