Here’s some homework for you: come up with an efficient way to calculate ranking.
Say you have a user table and each user has a score. Now, for any one specified user, you want to find out their overall ranking. And, in the most efficient way! There are, of course, a variety of rather crude and/or gross methods to do this, but I don’t want to hear about those 😉
Simply post your ideas/solutions in comments to this entry… *before*, let’s say, 15 Dec.
I’ll even offer a prize for the one most *efficient* entry: a MySQL 10th anniversary mug + MySQL sticker.
Good luck! And have a nice weekend.
Easiest that comes to mind is:
SELECT (SELECT count(*) FROM users WHERE score > (SELECT score FROM users WHERE user=?))/(SELECT count(*) FROM users) AS ranking
Now that should give a floating point result 0 being top, 1 being last. If you want x/y you could just run those select queries individually.
given this table:
you can find the absolute position with this query:
The number you get means the zero-based position in the scores array, where zero is the first, and count(*) is the last. This formula returns ‘0’ for user ‘Monty’ and 7 for user ‘Pete’.
If you want the rank to start with ‘1’, then change the first ‘count(*) as rank’ to ‘count(*) + 1 as rank’
ciao
Giuseppe
hi arjen
i think, this is the cleanest way:
set @rank = 0;
select *, @rank := @rank + 1 as rank from users order by score desc;
this would give the users back with a rank from 1 upwards.
greets
marcel tschopp
… it’s very similar to gmaxia’s solution with some tiny differencies:
One important thing is to create an index on the score column (if there are many rows, it will speed up calculating the rank immensely).
Then I’ve put it into a user defined function, which makes it very easy to select the position for the user (it’s a little more effort to create it, but much simpler and faster to get the results). One more difference to gmaxia’s solution … I used the >= oparator for comparison, as gmaxia’s query returns a value that’s one less than the real result.
Cheers,
Markus
Elaborating further on my previous solution, we need to take into account
cases when two users have the same score.
Neither mine nor Markus’s solution solve this problem easily. We need one more subquery to give a decent answer.
Let’s add a few more scores:
Now, if we want to know how user ‘Arjen’ ranks, we can say 2nd, but we need to know that he’s level with the 3rd and 4th.
To make things easier, we could create a view and then the final query could give us the details:
The user variable @USER_SCORE will save one subquery in the overall calculation.
To make things even easier, we can create a function:
This function will return the rank as a string. That’s customizable, but the way I see it, since I don’t know how Arjen wants to use this, is how I use to see ranks in chess tournaments, when players with the same rank are shown as ‘1/2’ meaning “sharing first and second rank”.
Let’s see some examples:
Let’s assume a simple table. The userid column could be dropped.
Solution 1, using row counter.
Solution 2, using an inner join.
You find the shipping address for the mug in the PD 🙂
Great work, Giuseppe!
The user defined function that I’ve used in the example above even helped to find a bug in the server. I first accidently created the function without a name (didn’t even realize it immediately), which was accepted by the server, but then caused the server to crash, if I (or MySQL QueryBrowser, where I first found out that there was a problem) issued a ‘show function status’ command. Here’s the full bug report: http://bugs.mysql.com/bug.php?id=15658
Even little accidents can help to find bugs and improve MySQL ;-).
Markus
Just an idea that was turning in my head. This one will work well only if there are no score duplicates. In such case, it will return the rank 500 times faster than my previous solution!
There are three components: two functions and one view.
Here’s a little change in my UDF to make it capable to work with ex aequo positions. It’s a very little change – I query the positions with < (less then) as Giuseppe did before and iterate the value by one. Then I get the number of people with a higher score plus one, which results in the correct ranking, also if there are more users with the same score:
By the way, such exercises are nice (regardless of whether there’s a prize), because you can see how other people solve a problem which gives a chance to learn from them ;-).
Markus
But I didn’t ask for a “top N” list… that’s indeed easy. I asked for the ranking of one specific person.
Great – but I think we do have to assume that by definition, some users are likely to have the same score. So any solution will have to take this into account – at least at some basic level.
By the way, I’m not sure BENCHMARK() is the best way to measure these things… but it’s clear that it’s fast, yes 😉
Have to post this in 2 parts because of some stupid length limitation.
Well, there are a few options that are good for different scenarios.
But before I begin, a helper function to render the ranking nice and readable:
scenario 1: you want the count once only for only one user
In this case you’ll probably want to just count the people having score less than the user we’re looking for:
scenario 2: you want that count many times and/or for many users (but for one user at a time)
In this case one would probably want to keep track of each users rank. If accuracy isn’t a critical need you would just order by score and enumerate the users and write out the result every once in a while (once per a couple of minutes would be fine in most usecases). That can be done by using the already proposed methods. But that wouldn’t be interesting, now would it? 🙂 So what do we do? We keep track of rankings using triggers:
Lets suppose we have users table set up like this:
Now lets create a table to hold the rankings for our users:
When a new user is inserted it needs to get a row in the rankings table:
When a users score is updated the ranking table needs to be updated too. This is a bit tricky, because we want to do it efficiently, only updating the rankings that are between old and new rankings for the updated row.
part 2:
When a user is deleted the rankings must reflect that too:
Now we can get a users ranking REALLY efficiently:
The downside with the trigger approach is that inserts and deletes get real slow pretty quickly as the table grows. If we presume random distribution of scores then on average each modification must update half of the rankings table. However if we take a bit more realistic usecase for the score field (e.g. number of posts on a forum) then newly inserted users have a low score and don’t need to update a lot of rankings, updates change the ranking only by a couple of places and deletes are probably rare and mostly on users on the low side of the rankings.
I did a couple of benches using a random database of 3000 users with a rather high amount of duplicates (frequencies of duplicates in growing order are 150,242,217,164,111,47,17,7,3,2 where 150 are scores with no duplicates). Benchmark itself was a simple couple of lines of PHP quering all users one by one in random order using a prepared statement – should have pretty much minimal overhead and good indication of query performance. Indexes were on primary keys and on the score field.
And the results are:
scenario 1: 3.669s
scenario 2: 0.341s
I tested the “Probably cheating, but it’s FAST” solution too. I even threw in correct rankings for duplicate scores as it didn’t affect the result noticeably:
Result: 3m13.978s
The thing is that for every invocation the table needs to be regenerated. The moral is: never ever base your performance data on ridiculously small amount of test data.
Some ranking systems say that if two (or more) users tie, then the upper ranks are skipped and they all get the same lower rank. So in our example, Arjen, Ringo, and Zak all tie for 5th, and there is no 2nd, 3rd or 4th place.
SELECT COUNT(*) AS rank
FROM test t1
LEFT JOIN test t2 ON (t1.score <= t2.score) GROUP BY t1.user HAVING user = 'Arjen'; You can get a ranking list by replacing the HAVING clause with ORDER BY COUNT(*). Scott Noyes
SET @rank :=0;
SELECT * FROM (SELECT @rank := @rank +1 AS rank, uid, score FROM Ranking ORDER BY score) as ranksub WHERE uid =20;
haven’t checked out the other comments, so this may be a repeat.
Sheeri Kritzer
That gets the rank of the user whose id is 20. I tested this in the following way:
CREATE TABLE `Ranking` (`uid` int(10) unsigned NOT NULL auto_increment, `score` int(10) unsigned NOT NULL default ‘0’, PRIMARY KEY (`uid`));
insert into Ranking VALUES(”,FLOOR(RAND() * (1000000+ 1)));
(repeat a bunch of times)
mysql> select * from Ranking;
+—–+——–+
| uid | score |
+—–+——–+
| 1 | 805372 |
| 2 | 721577 |
| 3 | 191770 |
| 4 | 794122 |
| 5 | 395298 |
| 6 | 594126 |
| 7 | 784734 |
| 8 | 141290 |
| 9 | 352253 |
| 10 | 337393 |
| 11 | 630206 |
| 12 | 138850 |
| 13 | 803633 |
| 14 | 601614 |
| 15 | 597169 |
| 16 | 181005 |
| 17 | 113520 |
| 18 | 24583 |
| 19 | 782357 |
| 20 | 838032 |
| 21 | 843091 |
| 22 | 701357 |
| 23 | 977515 |
| 24 | 783501 |
| 25 | 984961 |
| 26 | 574302 |
| 27 | 916627 |
| 28 | 860227 |
| 29 | 551256 |
| 30 | 175598 |
| 31 | 224224 |
| 32 | 594327 |
| 33 | 298962 |
| 34 | 711831 |
| 35 | 662265 |
+—–+——–+
35 rows in set (0.00 sec)
mysql> set @rank=0; select @rank:=@rank+1,uid,score from Ranking order by score;
Query OK, 0 rows affected (0.00 sec)
+—————-+—–+——–+
| @rank:=@rank+1 | uid | score |
+—————-+—–+——–+
| 1 | 18 | 24583 |
| 2 | 17 | 113520 |
| 3 | 12 | 138850 |
| 4 | 8 | 141290 |
| 5 | 30 | 175598 |
| 6 | 16 | 181005 |
| 7 | 3 | 191770 |
| 8 | 31 | 224224 |
| 9 | 33 | 298962 |
| 10 | 10 | 337393 |
| 11 | 9 | 352253 |
| 12 | 5 | 395298 |
| 13 | 29 | 551256 |
| 14 | 26 | 574302 |
| 15 | 6 | 594126 |
| 16 | 32 | 594327 |
| 17 | 15 | 597169 |
| 18 | 14 | 601614 |
| 19 | 11 | 630206 |
| 20 | 35 | 662265 |
| 21 | 22 | 701357 |
| 22 | 34 | 711831 |
| 23 | 2 | 721577 |
| 24 | 19 | 782357 |
| 25 | 24 | 783501 |
| 26 | 7 | 784734 |
| 27 | 4 | 794122 |
| 28 | 13 | 803633 |
| 29 | 1 | 805372 |
| 30 | 20 | 838032 |
| 31 | 21 | 843091 |
| 32 | 28 | 860227 |
| 33 | 27 | 916627 |
| 34 | 23 | 977515 |
| 35 | 25 | 984961 |
+—————-+—–+——–+
mysql> set @rank:=0; SELECT * FROM (SELECT @rank := @rank +1 AS rank, uid, score FROM Ranking ORDER BY score) as ranksub WHERE uid =20;
Query OK, 0 rows affected (0.00 sec)
+——+—–+——–+
| rank | uid | score |
+——+—–+——–+
| 30 | 20 | 838032 |
+——+—–+——–+
1 row in set (0.00 sec)
which confirms that uid 20 is indeed, rank 30. I used a MyISAM table, and of course you’d want to put an index on score. I assumed numerical scores, which work, but you can also use anything that MySQL can sort in an ‘order’. This way has the advantage of ordering by descending value, in case you want to see how far from the end someone is.
haven’t checked out the other comments, so this may be a repeat.
and indeed, the anonymous user from the Training Dept. got there before I did.
Well, given the same table format as the above post, you could also do:
I’m not sure if that’s going to be faster/slower than the other methods though, since I haven’t had a chance to write a script to populate my table with more data yet.
Darien, that query doesn’t deal with equal rankings.
I reckon your original idea was neater, and also nearly there in terms of dealing with equal rankings.
My first function would have the problem to set the lower end equal, if there are ex aequo positions, so if e.g. positions 3 and 4 would be the same, it would have ended up with 1, 2, 4, 4, 5, … instead of 1, 2, 3, 3, 5, …, as the second function does.
Markus
The above query is efficient albeit the above one gives the wrong result. A better version (with some extra goodies) would be something like:
Assuming that the table looks like:
The query deals well with equal scores and it is very efficient (check with EXPLAIN)
Darien’s answer to the first training department solutions does look quite good. The problem is that it gives the wrong results when you have equal scores. A better version (with some extra goodies) would be something like:
Assuming that the table looks like:
The query deals well with equal scores and it is very efficient (the efficiency does assume MyISAM tables though...)
My attempt is here.
—
felix
I have been triyng to run the script that was shown above and it keeps giving me an error. This is the code i have pasted in mysql
SELECT CONCAT(COUNT(*)+1,’/’,(SELECT COUNT(*) FROM score)) Ranking FROM `score` WHERE `score` > (SELECT `score` FROM `score` WHERE `username`=’max’)
It sounds like others got it to work so I don’t know what I’m doing wrong. Can someone help a mysql newb out?
You’re probably running a MySQL version older than 4.1 which doesn’t support subqueries. Like 3.23 or 4.0. You may wish to upgrade.
This one takes care of duplicate scores. You do not have to provide a userid or name, since you set the current user’s score as an alias ‘thisscore’, which allows you to do the ranking for all rows at once.
Not sure if it works for score being a non-integer.
SELECT *,
score as thisscore,
(select count(distinct(score))+1 from user where score > thisscore) as rank
from user
Regards,
acdhirr http://www.trilobiet.nl
How do you handle sub-ranking? For example:
If you have added the field: Department. And you want to have ranks within the department. How do you do this?
For example:
Here are the results I would like to get from a SELECT:
This is similar in MSSQL where you can use the rank by partition function. How do you do this in MYSQL?
Please email me as I dont get to check this forum very often.
Marc
lefebvre@iwavesolutions.com
Hi guys!
I often see that stuff:
SET @rank = 0;
and then:
SELECT @rank := @rank +1 FROM table
so my question is when I try to do this with mysql query browser or when I try to put this statement in mySQL-executer querie build in perl.
No variable @rank is then set in my SELECT value.
So I try to explain it:
SET @rank = 0;
SELECT @rank;
OUTPUT in query browser is:
NULL
What do I wrong here?
thx
greets
andré
QueryBrowser apparently does not do those queries in the same connection. It should be able to do that though, if you put them all in the query window below eachother, and execute the lot in one hit.
i’m eric. joining a couple boards and looking
forward to participating. hehe unless i get
too distracted!
eric
I know this is an old thread, but what about finding the ranking based on a table sorted by other columns? For example, say the table looks like this when sorted:
…ORDER BY warehouse_id, section, bin, tray DESC;
Now, how would I go about getting the ranking of id 6? None of the previous methods work, as all they rely on for the ranking is the current value of ‘score’, not sorted in any way.
Is there any way I could do it in SQL? Thanks for pondering this odd question.
So….. who’s the winner??
Goodness this is ages ago. The solution post was at http://arjen-lentz.livejournal.com/56292.html and the winner was Giuseppe Maxia (who was not MySQL-employed at that time, he was “merely” a wise/experienced community user 😉
select (select count(*) from users where score < (select score from users where user='username')) as rank;