Call us today (AU) 07 3103 0809 (NZ +61-7-3103 0809)

We are an Australian based company with specific expertise in MySQL and MariaDB. Our services include consulting, scalable architecture, proactive remote DBA, system administration, vendor neutral managed services with optional emergency support, mentoring, training, and security and code reviews.

GRAPH Computation Engine - Documentation

visual representation of a graph structure (PD via wikipedia)OQGRAPH Docs Copyright © 2009-2013 by Arjen Lentz for Open Query

Documentation license: please use this online documentation - naturally it's fine to keep a copy locally (don't forget to update it!) or print it for your personal use.

 

Contacts and Links

 

Introduction

The Open Query GRAPH computation engine, or OQGRAPH as the engine itself is called, allows you to handle hierarchies (tree structures) but also complex graphs (nodes having many connections in several directions). Have you wrestled with the adjacency model, nested sets or materialised paths, or wondered how the blazes to do friend-of-a-friend style searches in your SQL relational database, result required today please? That's right, you and everybody else!

I encountered the same problem years ago when building web applications. An RDBMS "thinks" in sets, and sets just don't lend themselves well to dealing with these structures, and then SQL is no help either. But understanding how MySQL works, plus a bit of background in electronics, enabled me to come up with a solution.

Other database servers have syntax like CONNECT BY PRIOR (Oracle) and the SQL-99 standard introduces something called a "recursive union". Look, these features are functional and do what they do, but a) they're limited b) not that easy to use c) not fast and d) not available in MySQL. I realise I'm setting the bar very high by making statements on ease of use and speed, but we now have a functional "competitor" to back all of this, and more: OQGRAPH is accessed through simple single queries in standard SQL, you can use it in joins or any other supported construct, and it can handle any graph structure without breaking a sweat.

So in a nutshell, you can make your existing tree operations more convenient, but it's also enabling technology in the sense that you can now do things that simply weren't possible (inside  an SQL RDBMS) before.

 

But it's a Storage Engine, right?

Yes and no. Don't you hate answers like that? But it's the truth.

In the MySQL server infrastructure, OQGRAPH is a storage engine, just like MyISAM, InnoDB and MEMORY, to name a few. But even though it superficially looks like an engine and it's the server API we use, it's actually an altogether different product: it's a computation engine. On the inside it's a native graph, there are no rows, and some of the columns you see on the outside don't exist. And there is definitely no table.

Confused yet? No worries, OQGRAPH is really easy to use (and that's not just geek talk), but it's essential that you don't think of it as a regular storage engine. For instance, it's downright impossible to store regular data in it.

 

Table Structure

Every OQGRAPH table has the same fixed structure:

CREATE TABLE db.tblname (
    latch   SMALLINT  UNSIGNED NULL,
    origid  BIGINT    UNSIGNED NULL,
    destid  BIGINT    UNSIGNED NULL,
    weight  DOUBLE    NULL,
    seq     BIGINT    UNSIGNED NULL,
    linkid  BIGINT    UNSIGNED NULL,
    KEY (latch, origid, destid) USING HASH,
    KEY (latch, destid, origid) USING HASH
  ) ENGINE=OQGRAPH;

Looking at this structure you may already have some questions and doubts (based on your own knowledge of MySQL) but bear with us, we'll explain it all. You can't use any other structure for the columns or indexes, the engine checks this and will throw an error if it's not correct (this is not crippling but on purpose - remember, internally there is no table! We're just establishing an appropriate interface).

 

Nodes, Edges, Weight

Of the columns in the fixed table structure, the ones you can manipulate (with INSERT/UPDATE/DELETE) directly are origid, destid and weight. If you don't care about weight, you can simply do INSERT INTO tblname (origid,destid) VALUES (1,2); and this will create a graph entry connecting node 1 to node 2. You can call them node, item or vertex (plural: vertices).

Connections or links can also be called edges. Edges are directional (like a one-way street) and will only be traversed in the order specified, so (1,2) is not the same as (2,1). Right now, if you want bidirectional traversal, you either insert the two complementary rows yourself, or you set up INSERT, UPDATE and DELETE triggers (we may change this and allow specifying bidirectionality through other means).

 

Optimiser

Here we correct your cognitive dissonance, since we appreciate that some aspects of the structure look "wrong". You may think that some operations will be slow.

  • Nodes are all presented as BIGINT UNSIGNED. However you use any integer type in another table to join on it, and that join will happily use an index on your joined table. OQGRAPH informs the optimiser that tablescans are extremely "expensive" (MySQL has a cost-based optimiser) and thus we convincingly entice it to use an index.
  • The HASH index definitions are there to support the picture we want the optimiser to have of the OQGRAPH engine, we want to avoid tablescans at all cost. There is no table, and the system traversing through all nodes in the graph would not actually yield the correct result, depending on the desired operation. So, we make the optimiser do what we want, but we play within its rules. The engine API provides sufficient hooks for this.

If there are other aspects you're concerned about, please let us know and we'll add them here! There may also be bugs, and only the real world will reveal. If something appears slow or incorrect, use EXPLAIN to see what the optimiser says. And tell us.

 

Persistence and Locking

The base v2 implementation acts similar to the MEMORY engine, it uses table-level locking, does not support transactions, and has no persistence. This means that if you restart the server, the table structure will be there but all data is gone. Since you generally actually have the required data in another table also, this is not a problem. You can use the existing --init-file=<SQLfile> option in my.cnf to have mysqld execute some commands on server startup, for instance copying the data using INSERT tblname (oridgid,destid) SELECT ...

Also just like with the MEMORY engine (but perhaps you didn't know this), for temporary tables, no locking is used since they are local to the one thread anyway.

Contrary to MEMORY, there is no support for AUTO_INCREMENT. It would serve no purpose. There is also no PRIMARY KEY, and that doesn't matter.

The v3 prototype implementation acts as a shim on top of an existing storage engine (such as InnoDB), thus providing capabilities such as persistence as well as allowing larger graphs.

 

Installation

From MariaDB 5.2, OQGRAPH is built as a clean plugin, and distributed as an integral part of the MariaDB packages. However, it is not loaded by default, to reduce memory consumption for people who don't use it. So you need to install the plugin once, after that it will also be loaded automatically when the server restarts.

MariaDB [(none)]> INSTALL PLUGIN oqgraph SONAME 'ha_oqgraph.so';
Query OK, 0 rows affected
Now OQGRAPH will appear in the list of installed storage engines:
MariaDB [(none)]> SHOW ENGINES;
+------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine     | Support | Comment                                                                                          | Transactions | XA   | Savepoints |
+------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
...
| OQGRAPH    | YES     | Open Query Graph Computation Engine, stored in memory (http://openquery.com/graph)               | NO           | NO   | NO         |
+------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+

Now you are ready to create and use OQGRAPH tables in MariaDB.

 

Data Definition Language

  • CREATE TABLE - as above
  • DROP TABLE - supported
  • ALTER TABLE - you could convert a table to another engine, or vice versa. For anything you convert to OQGRAPH, you must use the correct and complete table definition or you'll get an error.

Data Manipulation Language

  • INSERT - only the origid, destid and optionally weight columns should be inserted.
    Example: INSERT INTO foo (origid,destid) VALUES (1,2),(2,3),(1,3);
    If weight is not specified, it defaults to 1.
  • UPDATE - you can update origid, destid and weight. You could update weight while traversing a graph, but don't mess with origid/destid in this manner. Imagine changing the way roads are connected while you're riding on them!

    This is ok: UPDATE foo SET weight = 2 WHERE origid=2 AND destid=4;
    As is this: UPDATE foo SET destid = 7 WHERE origid=3 AND destid=5;
    But the following would produce incorrect results: UPDATE foo SET destid = destid + 10;
    Modifying only the weight is safe: UPDATE foo SET weight = weight * 10;
  • DELETE - supported
  • SELECT - supported
  • TRUNCATE - seems to have an bug, but DELETE FROM foo (without WHERE clause) will work fine.

 

Operations Latch

latch is the magic column: through it, we tell the engine what to do, essentially passing it commands. You would normally use latch in a SELECT, but it also works in an UPDATE or DELETE (with the abovementioned caveats).

  • Query edges stored in graph engine (latch=NULL)
    SELECT * FROM foo;
    Results:
        vertex id for origin of edge in origid column.
        vertex id for destination of edge in destid column.
        weight of edge in weight column.

    Essentially this returns the values (origid,destid pairs with optional weight) you put in, in this mode OQGRAPH looks very close to a real table. But it also does nothing special, it's just store/retrieve for those columns. The other columns will be returned as NULL.
     
  • Query vertices stored in graph engine (latch=0)
    SELECT * FROM foo WHERE latch = 0;
    Results:
        vertex id in linkid column
     
  • Query out-edges for vertex (latch=0 AND origid=N)
    SELECT * FROM foo WHERE latch = 0 AND origid = 2;
    Results:
        vertex id in linkid column
        edge weight in weight column
     
  • Query in-edges for vertex (latch=0 AND destid=N)
    SELECT * FROM foo WHERE latch = 0 AND destid = 6;
    Results:
        vertex id in linkid column
        edge weight in weight column
     
  • Dijkstra's shortest path algorithm (latch=1)
    Find shortest path:
    SELECT * FROM foo WHERE latch = 1 AND origid = 1 AND destid = 6;
      Results:
        latch, origid, destid are same as input.
        vertex id of the current step in linkid column.
        weight of traversed edge in weight column.
        step counter in seq column, so you can sort and use the result (starting at step 0).
        Example: SELECT GROUP_CONCAT(linkid ORDER BY seq) ...

    Find reachable vertices:
    SELECT * FROM foo WHERE latch = 1 AND origid = 1;
      Results:
        latch, origid, destid are same as input.
        vertex id in linkid column.
        aggregate of weights in weight column.
     
    Find originating vertices:
    SELECT * FROM foo WHERE latch = 1 AND destid = 6;
      Results:
        latch, origid, destid are same as input.
        vertex id in linkid column.
        aggregate of weights in weight column.
     
  • Breadth-first search (latch=2, assumes that each vertex is weight 1)
    Find shortest path:
    SELECT * FROM foo WHERE latch = 2 AND origid = 1 AND destid = 6;
    Results:
        vertex id in linkid column.
        weight column = 1 for each hop.

    Find reachable vertices:
    SELECT * FROM foo WHERE latch = 2 AND origid = 1;
      Results:
        vertex id in linkid column.
        computed number of hops in weight column.
     
    Find originating vertices:
    SELECT * FROM foo WHERE latch = 2 AND destid = 6;
    Results:
        vertex id in linkid column.
        computed number of hops in weight column.

 

Performance and Efficiency

For graph traversal operations, there's nothing else that speaks SQL that's in the same league. But that said, it's fast in its own right anyway. No worries. You will find that the v2 implementation sees INSERTs become slower on a larger dataset (few million edges), but most other operations should remain fast. Again, all is relative.

If you have a need for persistence, more speed on larger datasets, and increased storage efficiency, please contact Open Query. You'll probably want the backend storage to be on SSD, but HD will technically work. Licensing remains GPLv2+ unless you have other needs.

 

Time Complexity

This can give you an indication of the relative time required to resolve a specific operation, dependent on the search algorithm.

  • Dijkstra's with coloring and priority queue is O ( ( V + E ) . log V )
  • Breadth-first: O (V + E)

O is the order function. O ( 1 ) is constant time, irrespective of amount of data. O ( N) is linear time, e.g. time increases with amount of data, etc. V is number of vertices (nodes), E is number of edges (links).

 

The Real World

This is new/different stuff, so will take a little getting used to. But once you wrap your head around the concept, or actually don't let yourself be hindered by what you normally expect a table to look like on the inside, you'll quickly find that it's very easy to use, versatile, and actually extremely fast - particularly considering the operations it has to perform to get you answers like "find me the shortest path" for a graph of for instance a few hundred thousands items.

 

Download and Installation


We're offering you the base implementation of the GRAPH Engine v2 as a gift. It's not crippleware, as our early demos have already shown. The license is GPLv2+. In return for our gift to you, we ask you to test (yes, please try to break it!), experiment, explore, build GRAPH enabled apps, write about OQGRAPH and otherwise spread the word, bugreport, and just simply help others with questions on the OQGRAPH forum and elsewhere. Each of you will have different opportunities and skills to contribute; by accepting our gift, you agree to do your part. Great!


 

Packages/Binaries

Packages for Debian, Ubuntu, RHEL/CentOS and generic Linux binary tarballs of MySQL 5.0.87-d10 are still available from the OurDelta project. With the packages you set up once and then you can simply apt-get or yum!  You'll want the ourdelta-sail edition.

From MariaDB 5.2 it is available as a plugin, distributed as an integral part of the MariaDB packages. 

Sources

Of course it's easiest to just grab a binary package, but since the code is GPLv2+ licensed you're most welcome to delve into the sources. Sources for the above mentioned builds are in their respective package formats in the repositories. The original OQGRAPH sources are maintained in the oqgraph project on Launchpad, but note that the current v2 implementation is maintained inside the MariaDB main code tree, and thus resides in the mariadb project on Launchpad. 

The oqgraph v2 repo at Launchpad also contains an /examples directory.

 

Help, I found a bug!

Look, it can happen. Anyone claiming their software is bug-free is lying (or just arrogant, you decide which is worse).

Please report problems on the OQGRAPH bug tracker in Launchpad, and we'll look into it!

 

Support

Naturally, Open Query offers support for OQGRAPH. Please contact us to discuss your needs. The pricing of our standard support offerings is published on the Open Query site. We value transparency and charge fair rates.

 

Spreading the Word

As you play with OQGRAPH, you will surely have an opinion on it, create cool examples, find interesting datasets, and do things we haven't even though of yet. That's great, scribble and talk away! The link to the OQGRAPH Forum is at the top of this document, if you write a blog entry or article, post a link to it (with the title and a short snippet so readers know what it's about) on the forum.

 

Code Maturity

Unlike general purpose engines such as MyISAM and InnoDB, OQGRAPH is highly specialised. It has a very limited scope, and generally only holds a copy of a subset of your data. Therefore its maturity curve can be much steeper. But in the end it's up to you (personally and collectively) to decide.

 

Examples & Links


Copyright © 2009-2013 by Arjen Lentz for Open Query

Feedback welcome, if you log in you can leave comments!