graph | Open Query

Posted on November 13, 2009 by Arjen Lentz

OQGRAPH at OpenSQL Camp 2009, Portland

Antony is travelling up to Portland for this great event that’s about to start Fri evening and going over the weekend. He’ll be showing other devs and people more about the OQGRAPH engine, and gathering useful feedback.

Open Query is, together with many others (I see Giuseppe, Facebook, Gear6, Google, Infobright, Jeremy Cole, PrimeBase Technologies, Percona, Monty Program, and lots more), sponsoring the event so that it’s accessible for everybody – reducing the key factor to getting there rather than having to worry about high conf fees.

Having acquired the world’s biggest jetlag flying to Charlottesville VA for last year’s OpenSQL Camp, I can confirm from personal experience that it’s a great event. While I can’t be there this time, I’m looking forward to hearing all about it!

Posted on November 13, 2009 by Arjen Lentz

Posted on November 12, 2009 by Arjen Lentz

OQGRAPH update: speed, maze example, 5.0 packages

Antony has done a bit of magic, considerably speeding up inserts. Since the base implementation does not have persistence, insert speed is particularly important. Copying the 2×89,051 edges for the Tree-of-Life example is now near-instant.

The delete bug has been fixed.

There’s a new Maze example in the OQGRAPH trunk on Launchpad, first introduced in my MySQL University session. I created/inserted a maze of 1 million rooms (that comes to about 3 million edges), and OQGRAPH found the shortest path (122330 steps for this particular maze) in abound one second. That’s pretty good, I think!

Last but not least, the OurDelta builds of MySQL 5.0.87-d10 have been published (for all Debian, Ubuntu, CentOS/RHEL, generic) and the -Sail edition of the packages have OQGRAPH built-in. So if you use 5.0 or just want to play, it’s now very easy to get started!

Earlier in the week we received a message from an early OQGRAPH adopter, telling how he’s using it to manage paths in his IP network: calculations that would previously require many minutes are now completed in a fraction of a second and a single query. He admitted to be pretty much an all Oracle shop, with this OQGRAPH app being his first exploration of MySQL space. He loves OQGRAPH, and I suppose that by proxy implies he likes MySQL too 😉

Posted on November 12, 2009 by Arjen Lentz

Posted on November 3, 2009November 3, 2009 by Arjen Lentz

OQGRAPH engine on MySQL University – 5 Nov 2009 10:00 UTC

MySQL University logo Only a few weeks after Walter’s session on Multi-Master Replication with MMM and thanks to the great gang at MySQL Docs (my colleagues from long ago!) I’ll be doing a MySQL University session in a few days, about the GRAPH computation engine. From talks/demos I’ve done about it so far, I’ve learnt that people love it but there are lots of interesting questions. After all, it’s a pretty new and in a way exotic thing.

MySQL University uses DimDim, an online presentation service. You’ll see slides, and hear my voice. You can also type questions in a live chat room. We actually even got desktop sharing working so a live demo is possible, we’ll see how that goes on the day (I’ll make sure to have static slides for the same also 😉

For session details and the exact link to DimDim, see the MySQL uni page for the OQGRAPH session.

To attend, please calculate the starting time for your local timezone! It’ll be very early in the morning for US people, however for Europe it will be late morning, and Asia/Pacific will be evening. If you miss the live session, there’ll be a recording online soon afterwards and of course you can contact me for questions anyway. Still, it would be be cool if lots of people attended live, that’s always extra useful. Hope to meet you there!

Posted on November 3, 2009November 3, 2009 by Arjen Lentz

Posted on October 24, 2009October 24, 2009 by Arjen Lentz — 1 Comment

OQGRAPH on Launchpad, graph examples

The MySQL 5.0 and MySQL/MariaDB 5.1 source code is now also available through Launchpad. If you were waiting for a version for 5.1 and are ok with building the plugin from source, now you can!

The repo contains a subdir for examples, we’re hoping many people will contribute little snippets and scripts to import and use interesting datasets. To give you a hint, with graph capabilities you are able to deal with RDF data sources. You just need to transform the XML to say CSV, import into a suitable structure, and copy the edge information across to an OQGRAPH table.

Roland Bouman’s tree-of-life (which uses xslt stylesheets) are a good example of that approach, and was the first entry in the examples tree, including an SQL dump of the base dataset (it was CC-NC licensed) so you don’t necessarily have to fuss with the RDF/xslt foo.

Enjoy! We want to have examples/demos, a proper testsuite (there’s a bug/wishlist for that), and more. If you can help, please do: mucking around with graphs is great fun. If you implement OQGRAPH in a “proper” app, we’d also like to hear from you. The examples are intended to get people used to what OQGRAPH can do, and thus trigger ideas for practical uses. It’s not just fun. With OQGRAPH’s capabilities and speed, you can profit.

Posted on October 24, 2009October 24, 2009 by Arjen Lentz — 1 Comment

Posted on October 21, 2009October 21, 2009 by Arjen Lentz — 3 Comments

Walking the Tree of Life in simple SQL

Antony and I are busy getting the Open Query GRAPH Engine code ready so you all can play with it, but we needed to test with a larger dataset to make sure all was fundamentally well with the system.

We have some intersting suitable dataset sources, but the first we tried in ernest because it was easy to get in (thanks to Roland Bouman for both the idea and providing xslt stylesheets to transform the set), was the Tree of Life which is a hierarchy of 89052 entries showing how biological species on earth are related to eachother.

GRAPH engine operates in a directed fashion, so I inserted the connections both ways resulting in 178102 entries. So, I inserted A->B as well as B->A for each connection. So we now have a real graph, not just a simple tree.

Just like with my previous post, we have a separate table that contains the name of the species. For query simplicity, I looked up the id the start/end name separately. By the way, latch=1 indicates we use Dijkstra’s shortest-path algorithm for our search.

# with all that explained, let’s find ourselves in the tree of life!
SELECT GROUP_CONCAT(name ORDER BY seq SEPARATOR ‘ -> ‘) AS path FROM tol_graph JOIN tol ON (linkid=id) WHERE latch=1 AND origid=1 AND destid=16421 \G
*************************** 1. row ***************************
path: Life on Earth -> Eukaryotes -> Unikonts -> Opisthokonts -> Animals -> Bilateria -> Deuterostomia -> Chordata -> Craniata -> Vertebrata -> Gnathostomata -> Teleostomi -> Osteichthyes -> Sarcopterygii -> Terrestrial Vertebrates -> Tetrapoda -> Reptiliomorpha -> Amniota -> Synapsida -> Eupelycosauria -> Sphenacodontia -> Sphenacodontoidea -> Therapsida -> Theriodontia -> Cynodontia -> Mammalia -> Eutheria -> Primates -> Catarrhini -> Hominidae -> Homo -> Homo sapiens
1 row in set (0.13 sec)

# how are we related to the family of plants containing the banana
SELECT GROUP_CONCAT(name ORDER BY seq SEPARATOR ‘ -> ‘) AS path FROM tol_graph JOIN tol ON (linkid=id) WHERE latch=1 AND origid=16421 AND destid=21506 \G
*************************** 1. row ***************************
path: Homo sapiens -> Homo -> Hominidae -> Catarrhini -> Primates -> Eutheria -> Mammalia -> Cynodontia -> Theriodontia -> Therapsida -> Sphenacodontoidea -> Sphenacodontia -> Eupelycosauria -> Synapsida -> Amniota -> Reptiliomorpha -> Tetrapoda -> Terrestrial Vertebrates -> Sarcopterygii -> Osteichthyes -> Teleostomi -> Gnathostomata -> Vertebrata -> Craniata -> Chordata -> Deuterostomia -> Bilateria -> Animals -> Opisthokonts -> Unikonts -> Eukaryotes -> Archaeplastida (Plantae) -> Green plants -> Streptophyta -> Embryophytes -> Spermatopsida -> Angiosperms -> Monocotyledons -> Zingiberanae -> Musaceae
1 row in set (0.06 sec)

Obviously, this search needs to find its way up the tree then find the appropriate other branch.

# finally, our connection retro-viruses
SELECT GROUP_CONCAT(name ORDER BY seq SEPARATOR ‘ -> ‘) AS path FROM tol_graph JOIN tol ON (linkid=id) WHERE latch=1 AND origid=16421 AND destid=57380 \G
*************************** 1. row ***************************
path: Homo sapiens -> Homo -> Hominidae -> Catarrhini -> Primates -> Eutheria -> Mammalia -> Cynodontia -> Theriodontia -> Therapsida -> Sphenacodontoidea -> Sphenacodontia -> Eupelycosauria -> Synapsida -> Amniota -> Reptiliomorpha -> Tetrapoda -> Terrestrial Vertebrates -> Sarcopterygii -> Osteichthyes -> Teleostomi -> Gnathostomata -> Vertebrata -> Craniata -> Chordata -> Deuterostomia -> Bilateria -> Animals -> Opisthokonts -> Unikonts -> Eukaryotes -> Life on Earth -> Viruses -> DNA-RNA Reverse Transcribing Viruses -> Retroviridae
1 row in set (0.06 sec)

As you can see this one has to walk all the way back to “life on earth”, we’re really not related at all.

I left in the lines that show the amount of time taken. In earlier queries it took a few seconds, and I thought that was just some slowness in the graph engine, until I found out that the join was un-indexed so MySQL was table-scanning the tol table for each item found. Quickly corrected, the numbers are as you see.

I was still curious though, and since the SELECT returns a single item (a string in this case) it was really easy to use the BENCHMARK(N,func) function. That standard MySQL function runs func N times. Simple.

# so, we do
SELECT benchmark(1000000,(SELECT GROUP_CONCAT(name ORDER BY seq SEPARATOR ‘ -> ‘) AS path FROM tol_tree JOIN tol ON (linkid=id) WHERE latch=1 AND origid=16421 AND destid=57380));
…
1 row in set (1.86 sec)

As it turns out, we were really just measuring latency before, as this shows we can do a million of these path searches through a graph in less than 2 seconds. To me, that’s not just “not bad” (the usual opinion a Dutch person would express 😉 but freaking awesome. And that is just what I wanted to tell.

Posted on October 21, 2009October 21, 2009 by Arjen Lentz — 3 Comments

Posted on October 16, 2009 by Arjen Lentz — 1 Comment

GRAPH engine – Mk.II

The GRAPH engine allows you to deal with hierarchies and graphs in a purely relational way. So, we can find all children of an item, path from an item to a root node, shortest path between two items, and so on, each with a simple basic query structure using standard SQL grammar.

The engine is implemented as a MySQL/MariaDB 5.1 plugin (we’re working on a 5.0 backport for some clients) and thus runs with an unmodified server.

Demo time! I’ll simplify/strip a little bit here for space reasons, but what’s here is plain cut/paste from a running server, no edits

-- insert a few entries with connections (and multiple paths)
insert into foo (origid, destid) values (1,2), (2,3), (2,4), (4,5), (3,6), (5,6);
-- a regular table to join on to
insert into people values (1,"pearce"),(2,"hunnicut"),(3,"potter"),
                          (4,"hoolihan"),(5,"winchester"),(6,"mulcahy");

-- find us the shortest path from pearce (1) to mulcahy (6) please
select group_concat(people.name order by seq) as path
  from foo join people on (foo.linkid=people.id)
  where latch=1 and origid=1 and destid=6;
+--------+--------+--------------------------------+
| origid | destid | path                           |
+--------+--------+--------------------------------+
|      1 |      6 | pearce,hunnicut,potter,mulcahy |
+--------+--------+--------------------------------+

-- find us all people we can get to from potter (3)
select origid,group_concat(people.name order by seq) as destinations
  from foo join people on (foo.linkid=people.id)
  where latch=1 and origid=3;
+--------+----------------+
| origid | destinations   |
+--------+----------------+
|      3 | mulcahy,potter |
+--------+----------------+

-- find us all people from where we can get to hoolihan (4)
select origid,group_concat(people.name order by seq) as origins
  from foo join people on (foo.linkid=people.id)
  where latch=1 and destid=4;
+--------+--------------------------+
| origid | origins                  |
+--------+--------------------------+
|      4 | hoolihan,hunnicut,pearce |
+--------+--------------------------+

So, there you have it. A graph (in this case a simple unidirectional tree, aka hierarchy) that looks like a table to us, as do the resultsets that have been computed.

This is still a early implementation, we’re still enhancing the storage efficiency (in memory) and speed, and adding persistence. We’re also looking for a suitable large dataset that would allow us to seriously test the system, find bugs and assess speed. If you happen to have a large hierarchical structure, but especially a social graph you could obfuscate and give to us, that would be great!

Also, if you’re interested in deploying the GRAPH engine or have questions or additional needs, we’d be happy to talk with you.

select origid,group_concat(people.name order by seq) as destinations from foo join people on (foo.linkid=people.id) where latch=1 and origid=4;
+——–+—————————–+
| origid | destinations |
+——–+—————————–+
| 4 | mulcahy,winchester,hoolihan |
+——–+—————————–+

Posted on October 16, 2009 by Arjen Lentz — 1 Comment

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: