Posted on 6 Comments

Solution for Calculating Ranking

Getting back to the ranking calculation quiz I posted last week….

I was a bit surprised about all the complicated constructs with counters, stored procedures and even views. The solution actually needs to work on a 4.1 server so I should perhaps have banned using stored procs, but I didn’t expect them to be needed – however I understand that people are very excited about MySQL 5 and all its new capabilities 😉

My own early solution (which I worked out after posting the quiz) was:

SELECT COUNT(DISTINCT score) AS ranking
  FROM points
 WHERE score >= (SELECT score FROM points WHERE id=#)

Which is, provided we index the (score) column as well, very fast (the subselect actually gets optimized away into a constant). The server will only have to use the index. It handles ex aequo positions but without skipping rankings the way it’s normally done. Supposing there are two people in 2nd place, you’d want to skip 3rd place so that the person in 4th place stays there. So, my solution was in fact incorrect 😉

Giuseppe Maxia was the first to come up with this (I’ve simplified for readability):

SELECT COUNT(*)+1 AS ranking
  FROM points
 WHERE score > (SELECT score FROM points WHERE id=#)

which is of course just as efficient as the above, and by using the count+1 and > instead of the exact count and >= it also solves the ex aequo problem very neatly.

Giuseppe’s solution works fine on its own, although he also showed a fancy version using a VIEW and another using a Stored Function. Anyway, I reckon Giuseppe earned the prize (a MySQL 10th anniversary mug + MySQL sticker). Thanks everybody for participating!

I’ve received lots of feedback that this kind of quiz idea is much liked. So we’re setting up a forum (easier to handle than blog comments) and a new corner in the MySQL Developer Zone for this. The first quiz will be posted shortly, and of course I’ll blog about it when it’s online. Stay tuned!

Posted on 6 Comments

6 thoughts on “Solution for Calculating Ranking

  1. Congratulations, Giuseppe 😉 – good job!

  2. On my readings about this topic I stumbled over your quiz from quite a while ago now.

    In the same light… I have a similar problem.

    Say your points table has approximately 1 million rows. Instead a query that returns a specific indivudual ranking , what would be the most efficient way to populate (UPDATE) a new column/field (say scoreranking) with that rows rank for each row in the table?

    I have come up against this very problem … and my current solution for a 1 million row table unfortunately takes about 20 minutes on my machine… way too slow… and i need a better more elegant solution.

    Zain

  3. I’d suggest putting the rankings in a separate table, then you can use INSERT (or REPLACE) and that should simplify the code significantly. You can probably apply one of the solutions form this thread.

  4. The solution is not optimal, rank will be calculated in O(n) which basically means you have to read every row in the table. With a lot of hackery and stored procedures it is possible to achieve O(log n), however it requires O(log n) inserts / updates … ill just describe the idea without actual code.
    Have a second table that counts rows with score between A and B. The ranges subdivide the score range into 2, 4, 8, 16 … regions (0 to 0.5, 0.5 to 1, 0.0 to 0.25, 0.25 to 0.5 …). To find the rank you sum the counts from nearest above subdivision for every level skipping ranges already covered. For 0.3 score it would be 0.5 to 1, 0.5 to 0.75 (skip because already included in 0.5 to 1), 0.375 to 0.5, …

  5. This is how we add the “place” (=rank) column to the users table according to their reputation.
    place is calculated every hour:
    BEGIN
    DECLARE i INT;
    DECLARE uid INT;
    DECLARE end_of_table BOOL DEFAULT 0;
    DECLARE users_cursor CURSOR FOR
    SELECT id FROM users WHERE is_active = 1 and house_id > 0 and reputation > 0 ORDER BY reputation DESC, cash DESC;
    DECLARE CONTINUE HANDLER FOR SQLSTATE ‘02000’ SET end_of_table = 1;
    DROP TABLE IF EXISTS `temp_users`;
    CREATE TABLE IF NOT EXISTS temp_users SELECT * FROM users WHERE is_active = 1 and reputation > 0;
    UPDATE temp_users SET place = 0;
    OPEN users_cursor;
    set i =0;
    set uid = 0;
    REPEAT
    FETCH users_cursor INTO uid;
    SET i = i +1;
    UPDATE temp_users set place = i WHERE temp_users.id = uid;
    UNTIL end_of_table END REPEAT;
    CLOSE users_cursor;
    update users set place = 0;
    update users, temp_users set users.place = temp_users.place where users.id = temp_users.id;
    DROP TABLE IF EXISTS `temp_users`;
    END

  6. Very nice idea, but why is the above not optimal? If I were writing a MySQL engine, and this table was stored as a BTREE, then indeed I could know how many rows were before me and after me in log(n) time. I don’t know if this optimization is done in MySQL though.

Comments are closed.