Posted on

Your E-Commerce site and Credit Cards

Sites that deal with credit cards can have some sloppy practices.  Not through malicious intent, but it’s sloppy nevertheless so it should be addressed.  There are potential fraud and identity theft issues at stake, and any self-respecting site will want to be seen to be respecting their clients!

First, a real-world story. Read Using expired credit cards

The key lesson from there is that simply abiding by what payment gateways, banks and other credit card providers require does not make your payment system good.  While it is hoped that those organisations also clean up their processes a bit, you can meanwhile make sure that you do the right thing by your clients regardless of that.

First of all, ensure that all pages and all page-items (CSS, images, scripts, form submit destinations, etc) as well as payment gateway communications go over HTTPS.  Having some aspects of payment/checkout/profile pages not over HTTPS will show up in browsers, and it looks very sloppy indeed. Overall, you are encouraged to just make your entire site run over HTTPS.  But if you use any external sources for scripts, images or other content, that too needs to be checked as it can cause potential leaks in your site security on the browser end.

For the credit card processing, here are a few tips for what you can do from your end:

  • DO NOT store credit card details.  Good payment gateways work with a token system, so you can handle recurring payments and clients can choose to have their card kept on file, but you don’t have it.  After all, data you don’t have, cannot be leaked or stolen.
  • DO NOT check credit card number validity before submitting to the payment gateway, i.e. don’t apply the Luhn check.  We wrote about this over a decade ago, but it’s still relevant: Lunn algorithm (credit card number check).  In a nutshell, if you do pre-checks, the payment gateway gets less data and might miss fraud attempts.
  • Check that your payment gateway requires the CVV field, and checks it.  If it doesn’t do this, the gateway will be bad at fraud prevention: have them fix it, or move to another provider.
  • Check that your payment gateway does not allow use of expired cards, not even for recurring payments using cards-on-file.  This is a bit more difficult to check (since you don’t want to be storing credit card details locally) and you may only find out over time, but try to make this effort.  It is again an issue that can otherwise harm your clients.

If you have positive confirmation that your payment gateway does the right thing, please let us know!  It will help others.  Thanks.

Posted on
Posted on

On “The Gitgub threat” and distributed development

The Github Threat” is a great write-up by Carl Chenet, reviewing the problems created by this centralised system.

Github is very convenient, but that’s not really the point…

The greatest irony in the setup is that key advantages of using distributed revision control are undermined by using a centralised repository for bugs and other key aspects of the development process.

It’s most unfortunate, but indeed ubiquity comes with lots of side-effects. People join without considering, and many people joining will not have the background or information to even be able to consider.

For an example of a distributed version control system that has its bug tracking (and other aspects) built-in: Fossil, by the author of SQLite, Richard Hipp.
The approach has specific merits that we should consider, and they can “easily” be applied with Git also.

Many of the GitHub alternatives are in themselves centralised – yes you can run your own instance, but they still split the code from the bugs and other info. Why?

There are documented cases of Github projects (ref. Gadgetbridge) being blocked due to DMCA take-down notices.  Imagine your company relying on a centralised service and that service being (even temporarily) being unavailable to your employees.  How well will your company cope?  Yes, with Git you can share changes in a different way, but your business processes will need to adjust and that can be quite hard.  How will the equivalent of pull-requests be managed, and where is your bug tracking?

Finally, it should not be necessary to have a centralised user-base at all. It would be good to have/use a distributed notification system (Mastodon might qualify) for distributed repos, using signed messages. That way even “politically endangered” projects would be able to exist effectively without an intrinsic risk of being taken out. Secondary hosts can automatically clone and broadcast availability.

As part-fixes, also see options like this idea for Gitlab (and others): Implement cross-server (federated) merge requests.

Posted on
Posted on

SSL and trust

We can all agree on this: security is important, as is trust.

Does a pretty seal from an SSL certificate provider create trust? Doubtful. The provider’s own claims aside, it’s marketing fluff.
Oh, it used to provide them with some extra Google juice (one more link to them) but Google’s algorithms doesn’t care for that any more. Good!

What Google (and others) do care about is security, all sites should use SSL. For everything.
Expensive? Not really. Let’s Encrypt is free, and updates can be fully automated (scripted). Quite shiny really.

Let’s Encrypt only does domain validation, so a user sees the green lock and a “Secure” indicator. If you want company validation, you need to use another provider and pay their fees. Do you need that? That’s up to you. We reckon that in many (if not most) cases, you don’t really. It might depend on whether your clients are informed enough to care for SSL, and then whether they know (and care) enough to discern which indicators actually have real security meaning and which are just fluff. Tech geeks aside, few people do. We’re not saying that is brilliant, but it is reality. Do people care for pretty seals, and do we want to feed that realm of misinformation and false security? We hope you don’t go that path, because if we really care for security, this just distracts without solving the real issues. Doing things you technically don’t believe in won’t create real trust, as it’s not genuine. And whatever marketing/sales types tell you, you can’t fake genuine. Increasingly, people see right through it. Which is awesome! If your users know enough and care to ensure that your site is really owned by your company, then yes, a certificate with company validation makes sense.

Actionable task

If your publicly facing web or API servers aren’t using SSL for everything yet, you’ll want to spend some time to fix this. Real security aside, it affects your search engine ranking. If web pages pull in logos, javascript or even stylesheets from third parties, make sure those too use https as otherwise browsers produce “mixed content” warnings.

References

Posted on
Posted on 2 Comments

TEXT and VARCHAR inefficiencies in your db schema

The TEXT and VARCHAR definitions in many db schemas are based on old information – that is, they appear to be presuming restrictions and behaviour from MySQL versions long ago. This has consequences for performance. To us, use of for instance VARCHAR(255) is a key indicator for this. Yep, an anti-pattern.

VARCHAR

In MySQL 4.0, VARCHAR used to be restricted to 255 max. In MySQL 4.1 character sets such as UTF8 were introduced and MySQL 5.1 supports VARCHARs up to 64K-1 in byte length. Thus, any occurrence of VARCHAR(255) indicates some old style logic that needs to be reviewed.

Why not just set the maximum length possible? Well…

A VARCHAR is subject to the character set it’s in, for UTF8 this means either 3 or 4 (utf8mb4) bytes per character can be used. So if one specifies VARCHAR(50) CHARSET utf8mb4, the actual byte length of the stored string can be up to 200 bytes. In stored row format, MySQL uses 1 byte for VARCHAR length when possible (depending on the column definition), and up to 2 bytes if necessary. So, specifying VARCHAR(255) unnecessarily means that the server has to use a 2 byte length in the stored row.

This may be viewed as nitpicking, however storage efficiency affects the number of rows that can fit on a data page and thus the amount of I/O required to manage a certain amount of rows. It all adds up, so having little unnecessary inefficiencies will cost – particularly for larger sites.

VARCHAR best practice

Best practice is to set VARCHAR to the maximum necessary, not the maximum possible – otherwise, as per the above, the maximum possible is about 16000 for utf8mb4, not 255 – and nobody would propose setting it to 16000, would they? But it’s not much different, in stored row space a VARCHAR(255) requires a 2 byte length indicator just like VARCHAR(16000) would.

So please review VARCHAR columns and set their definition to the maximum actually necessary, this is very unlikely to come out as 255. If 255, why not 300? Or rather 200? Or 60? Setting a proper number indicates that thought and data analysis has gone into the design. 255 looks sloppy.

TEXT

TEXT (and LONGTEXT) columns are handled different in MySQL/MariaDB. First, a recap of some facts related to TEXT columns.

The db server often needs to create a temporary table while processing a query. MEMORY tables cannot contain TEXT type columns, thus the temporary table created will be a disk-based one. Admittedly this will likely remain in the disk cache and never actually touch a disk, however it goes through file I/O functions and thus causes overhead – unnecessarily. Queries will be slower.

InnoDB can store a TEXT column on a separate page, and only retrieve it when necessary (this also means that using SELECT * is needlessly inefficient – it’s almost always better to specify only the columns that are required – this also makes code maintenance easier: you can scan the source code for referenced column names and actually find all relevant code and queries).

TEXT best practice

A TEXT column can contain up to 64k-1 in byte length (4G for LONGTEXT). So essentially a TEXT column can store the same amount of data as a VARCHAR column (since MySQL 5.0), and we know that VARCHAR offers us benefits in terms of server behaviour. Thus, any instance of TEXT should be carefully reviewed and generally the outcome is to change to an appropriate VARCHAR.

Using LONGTEXT is ok, if necessary. If the amount of data is not going to exceed say 16KB character length, using LONGTEXT is not warranted and again VARCHAR (not TEXT) is the most suitable column type.

Summary

Particularly when combined with the best practice of not using SELECT *, using appropriately defined VARCHAR columns (rather than VARCHAR(255) or TEXT) can have a measurable and even significant performance impact on application environments.

Applications don’t need to care, so the db definition can be altered without any application impact.

It is a worthwhile effort.

Posted on 2 Comments
Posted on

Tom Eastman on File Uploads

The awesome Tom Eastman presented a session at PyCon Australia (Melbourne) 2016 entitled

“The dangerous, exquisite art of safely handing user-uploaded files”.

Every web application has an attack surface — the exposed points of interaction where a malicious or mischievous user can commit malice, or mischief (respectively). Possibly nowhere, however, is more vulnerable than places a user is allowed to upload arbitrary files.
The scope for abuse is eye-widening: The contents of the file, the type of the file, the size and encoding of the file, even the *name* of the file can be a potent vector for attacking your system.
The scariest part? Even the best and most secure web-frameworks can’t protect you from all of it.

In this talk, Tom shows you every scary thing he knows about that can be done with a file upload, and how to protect yourself from — hopefully — most of them.

Do watch it and pick up any hints you can.  This is important stuff.

How do your web applications handle file uploads?

Posted on
Posted on

On Open Source and Business Choices

Open Source is a whole-of-process approach to development that can produce high-quality products better tailored to users’ real world needs.  A key reason for this is the early feedback cycle built into that complete process.

Simply publishing something under an Open Source license (while not applying Open Source development processes) does not yield the same quality and other benefits.  So, not all Open Source is the same.

Publishing source of a product “later” (for instance when the monetary benefit has diminished for the company) is meaningless.  In this scenario, there is no “Open Source benefit” to users whatsoever, it’s simply a proprietary product. There is no opportunity for the client to make custom modifications or improvements, or ask a third party to work on such matters – neither is there any third party opportunity to verify and validate either code quality or security.

Open Source is not a marketing gimmick.  Labels such as “Open Source”, or “Enterprise”, on their own, do not have any more positive outcome than a greasy hamburger labeled with “healthy”.  If a company “believes” in Open Source software, they’ll use the open source development model for their software development.

And now we see things like this: Uproar: MariaDB Corp. veers away from open source (by Simon Phipps, InfoWorld, August 2016)

So what does it mean when a company publishes some of their software under an open source license, and does some related products under a proprietary license?  To me, it’s generally a strong indication that the company either doesn’t believe in that model, or doesn’t understand it.  And we’ve seen it before.

It also reminds me of an interaction I had many years ago.  A Marketing VP asked me “How can we leverage our [Open Source] community?”  I answered the only possible way: “One does not ‘leverage’ the community, that’s not how it works.”  Of course that wasn’t the answer the VP wanted to hear, but that doesn’t make it less true.  They saw the community as an asset to use, rather than work with.  People don’t like getting used, and in the Open Source space that’s even more true.

Companies that have turned their back on their earlier Open Source work and who have devised some other model to (arguably) make more money, have all discovered that this fundamentally changes their market.  They’ll lose some of their users, customers and supporters, and gain some new different clients.  It’s a different market.  Whether and how that pans out in terms of commercial success is never certain.  Given that we know that the Open Source development process yields benefits in terms of quality and features users want, we can say that non-OSS products lack (some of) those benefits, so to put it bluntly, it’ll be a different product of possibly less quality and the feature set is likely to differ as well.

Naturally we cannot ascertain code quality directly as we can’t review closed code directly, bug systems of proprietary software tends to be closed, changelogs are condensed for marketing purposes, but as far back as a decade and a half there have been independent studies that worked out “lines of code per software flaw” and it came out significantly in favour of Open Source software, having proportionally much fewer bugs.  Bugs also tend to get fixed quicker in Open Source software.  None of this is new(s). see for instance Open-source vs. proprietary software bugs: Which get squashed fastest? (CNET, 2007)

For complete products (libraries are a slightly different beast) with a relatively large market scope, source code being available does not in any way diminish a company’s ability to make money.  Having the core developers, tech writers and support people gives them a significant edge in the open market, and that’s a business asset you can leverage.  You do that by focusing on those aspects in your communications – that’s basic marketing, you draw attention to the positive aspects that make your company/product stand out from the rest.  Clearly, this objective cannot not achieved by force, as you don’t make a (potential) client like or trust you by denying them choice or transparency.

There is one other known option aside from not believing or not understanding, and that’s fear. But fear is an awkward business driver, it makes for very bad decisions.

MariaDB Corp in part uses the Open Source development model, in part they’re an Open Source publisher (in-house work that’s only made available at a later stage in the development process), and now some proprietary product has been added to the mix (actually new versions of an existing product).  Looking at this I am rather unclear about what they believe in.  Of course companies can make business choices as they see fit – but they never operate in a vacuum.  In the end it doesn’t matter much what I believe personally, the market will do what it will – historically, it responds in the various ways as described above.  We’ll see how it pans out.

Open Query does not recommend (or re-sell at all) proprietary tools, as it just doesn’t make sense for us or our clients.  We often do bugfixes and improvements which we contribute upstream – for proprietary tools we can’t do that and thus it becomes a hindrance for us and our clients.  On the specific practical level, we’ve actually never used MaxScale (the product that MariaDB Corp will now sell under different conditions for future versions), and this stems from our experience with its effective predecessor MySQL Proxy.  Having a complex set of scripted logic in a proxy slows down applications and introduces a rather large extra (single) point of potential failure in to infrastructure.   So, while Simon refers to MaxScale as an essential tool for scale-able environments, we know from experience that there are other ways of achieving that desired objective, and without the downsides.

Rather than promoting a single tool for many wildly different jobs, we utilise a few different tools depending on the needs of particular client infrastructure.  We still have a couple of (now legacy) MySQL-MMM deployments, but also quite a few Galera clusters, and other setups as suit our clients’ needs.  Key is to not only make the infrastructure convenient to use for applications, but also to not introduce any more single points of failure.  We build resilience into the client’s server infrastructure, without adding significant overhead in either performance or maintenance requirements.

We believe that that’s what clients want, and since potential clients come to us asking exactly for that (and note our approach with relief) we think that we’re doing the right thing by our clients.  We’ve used this approach for over 9 years, and we’ll just keep on doing that – our basic approach doesn’t change even when our tools do.  If you’d like to talk with us about helping you with your infra, using our approach and way of working, contact us today!

Posted on
Posted on

The Australian Online Census 2016 Example of How-Not-To

error crossOne of the key problems with the 2016 online census was the architecture, but also the how the whole thing was organised and who was contracted for the job.

IBM, for the $9.6mln it got paid for the job, built something very clunky. They used Java, which is not bad per-se but the system also required Java on the client (browser) side which is just daft. The number of systems that either don’t have or can’t run client side Java is huge, and for the rest you get into version conflict mayhem. And it’s clunky, it’s a lot of code and heaviness to shuffle around which is not a great approach to build a scalable site.

If you think of the census form, the total amount of data gathered is not actually that big. It doesn’t require any particularly complicated database or storage setup.
Serving forms to clients is very light on web servers – if you then use Javascript logic to control the flow through the forms you can actually run most of the work on the client side, including intermediate local saving for the “just in case”. Then you produce a single submit with confirmation, and a transaction with a number of inserts into the database. The language used on the server end is not that important as its job is minimal. Most of the content served can be static, and might even be handled through a CDN.

The scale of the online census task is quite small, relative to many websites. Not only Twitter/Facebook/etc but many e-commerce sites have a vastly more complicated situation where they have to serve many different pages of which many are dynamic, lots of writes and shopping carts that get updated in chunks, then the whole checkout process…. and all that can work fine too. So the census is not a big or complicated problem, really. It just needs to be done right.

The fact that IBM, for $9.6mln, completely stuffed it, is a very serious indicator of where the relevant skills and innovation capability lies. For this type of job, not with IBM. Going with a big company does not guarantee good results. If you reckon this is a one-off, ask Queensland Health about their payroll debacle (SAP implemented by… IBM). Similarly, very expensive is not necessarily better. It can be just very costly, in so many respects.

ABS/IBM also declined the NextDC offer for datacenter level firewalling and DoS protection. Another serious mistake. But application architecture too affects security. When I googled for Census 2016 on census night, the first link that came up was a Census staff login. That’s just beyond astonishing. That should not be public at all. It doesn’t need to be on a public domain, and probably should be only accessible via a VPN.

The company that did the online Census 2016 load testing for another half million $ and bragged before census night about how well their team worked together with the ABS and IBM people, should also be seriously embarrassed about the shoddy job they delivered. From their own site:

“Revolution IT worked in a highly collaborative manner, and their subject knowledge, expertise and advice were key to achieve our project goals and objectives. We were impressed with how well they engaged with our e-Census solution provider (another private company). [IBM]”

Success is not defined by how well your team worked, it’s very simply proven by how well the system deals with the real world. In this case, it didn’t. At all. So, total process fail. It would have been very wise to wait with the bragging until after census night. If it holds up well, you can brag. Otherwise, you hush and no public embarrassment at least on that front. PR fail.

Their public statement (after census night) is at http://revolutionit.com.au/revolution-it-q-a-australian-bureau-of-statistics-abs-2016-census-website/ where they explain that the Census site was taken offline due to security concerns, and since security was not part of their brief, their performance was all ok and successful.  But come on now, how is security not part of any practical testing?  It is by nature an integral part of how things work online!  Implementation of security may impact performance, and obviously security aspects always impact availability – and without availability you have no performance at all.

All in all, Census 2016 is a brilliant example of “how not to” in modern online architecture.

And to prove all this again, two students at QUT in Brisbane just built the same in a few days and for about $500 which I understand was mostly pizza costs.

Read that story at http://eftm.com.au/2016/08/how-two-uni-students-built-a-better-census-site-in-just-54-hours-for-500-30752 (that write-up is rather populist simplistic, but the fact that a few students can very well design a site like this, and properly, is absolutely correct).

Posted on
Posted on

Choosing Whether to Migrate to Another Database: Uber

Posted on
Posted on

Found data leak of a company while giving a college lecture | Sijmen Ruwhof

Sijmen writes:

A few weeks ago I gave a guest lecture at the Windesheim University of Applied Sciences in The Netherlands. I graduated there and over the years I kept in contact with some of my teachers since then. One of the teachers told me recently that a lot of students wanted to learn more about IT security and hacking and asked me to give a lecture about it. Of course! And to keep it a bit juicy, I built in a hacking demonstration in my lecture.

Read the full story at http://sijmen.ruwhof.net/weblog/937-how-i-found-a-huge-data-leak-of-a-company-during-a-college-lecture

For any server that’s connected to the Internet (and these days, that’s most servers), security is very important.

Mind that as a fundamental, you have to regard any web server as compromised. Not that they necessarily are, but it’s a very useful baseline to use as these are the most visible servers and thus potentially the easiest targets. What information is present on the web server itself, and what information is on there that can be used to access other systems (and to what extent). Scary? Perhaps. But that’s no reason to not review and put sensible practices in place.

If you’d like to discuss ways to secure your online environment, or would like to see how your current setup holds up to the various security benchmarks, have a chat with us: Open Query offers a security review (ad-hoc consulting) package, and we also offer regular security check-ups for our subscription clients.

Posted on
Posted on

Web Security: SHA1 SSL Deprecated

You may not be aware that the mechanism used to fingerprint the SSL certificates that  keep your access to websites encrypted and secure is changing. The old method, known as SHA1 is being deprecated – meaning it will no longer be supported. As per January 2016 various vendors will no longer support creating certificates with SHA1, and browsers show warnings when they encounter an old SHA1 certificate. Per January 2017 browsers will reject old certificates.

The new signing method, known as SHA2, has been available for some time. Users have had a choice of signing methods up until now, but there are still many sites using old certificates out there. You may want to check the security on any SSL websites you own or run!

To ensure your users’ security and privacy, force https across your entire website, not just e-commerce or other sections. You may have noticed this move on major websites over the last few years.

For more information on the change from SHA1 to SHA2 you can read:

To test if your website is using a SHA1 or SHA2 certificate you can use one of the following tools:

Open Query also offers a Security Review package, in which we check on a broad range of issues in your system’s front-end and back-end and provide you with an assessment and recommendations. This is most useful if you are looking at a form of security certification.

Posted on