Posted on 4 Comments

Complete set (an alternative solution)

Scott Noyes wrote about a question frequently asked in for instance the freenode #mysql channel: given a table with students and answers, which students have answered all of a certain set of questions.

While his solutions are interesting, they do indulge a bit. And I’m not even referring to the fact that he uses obscure string and bit functions… he can do that if he wants to 😉 But it’s a relatively simple problem that just does not need such (potentially inefficient) constructs.

For starters, it’d be good to have a WHERE clause, because even if there are many more questions in the table, you only need the As and Bs. You never want to use the HAVING as a replacement for WHERE, because having chucks away already retrieved rows from the result set, while WHERE limits what is retrieved in the first place.

Also, while it’s very good to know about the various string and bit functions, I generally suggest that as soon as you think you need to mess with bit functions, this indicates something is (relationally) wrong in either the table design or the query design.

Here is the original table and question:

SELECT * FROM quizAnswers;
+-------------+----------+
| studentName | question |
+-------------+----------+
| seekwill    | A        |
| seekwill    | B        |
| seekwill    | C        |
| roxlu       | A        |
| fury        | B        |
| fury        | B        |
+-------------+----------+

Find all the students who have answered both questions ‘A’ and ‘B’.

I reckon Scott’s original query was actually on the right track, just incomplete. So I came up with a solution of my own:

SELECT student,COUNT(DISTINCT question) AS num
 FROM quizAnswers
 WHERE question IN ('A','B')
 GROUP BY student
 HAVING num = 2

Easy huh?

  • it’s clean (short, easy to read)
  • fast (can use an index and does not retrieve more data than necessary)
  • extensible (when you have a larger set of questions to check, or if you’d have a properly normalised schema rather than this simplified example!)
  • everything is only mentioned once (make assembling the query easier)

Finally, the above is pretty much standard SQL, although not all RDBMS implement the DISTINCT option for all their aggregate functions (neither does MySQL, actually, but it does support it for the most important ones).

(Scott, I tried to reply to your blog post, but your antispam system didn’t show at all on either of my browsers – Safari and Firefox on OSX – and so wordpress didn’t want to accept).

Posted on 4 Comments

4 thoughts on “Complete set (an alternative solution)

  1. Looks good to me. I’ll save the original for the next obfuscated code competition.

    – Scott Noyes

  2. 😉

  3. That was a very helpful post, thanks.

    I am trying to implement something similar, however there is a little catch to it: Sticking to your example, I would want to have only those students who answered A and B or a subset thereof, but nothing else. I.e:

    Possible Answers: [1,2,3,4,5]
    Query Answers: [1,2,3]

    Student A: [1,3]
    Student B: [1,2,3]
    Student C: [4,5]
    Student D: [1,2,3,4]

    In such a situation, I would want students A and B returned, however not student D (as he answered an additional question 4).

    Ideally, it would be possible to search for student D with some parameter of disconnectedness (i.e. student D answered one question outside our set = -1, student C answered two: = -2).

    I managed to implement this in a rather nasty and PHP heavy way, but am not quite sure if this could be done more elegantly along the lines of your example above?

    Well, any advice would be much appreciated 🙂

    kmh

  4. That can be solved with a subquery in the HAVING clause, I suppose – and it would be fairly fast as it only needs to check for the (non-)existence of certain items.

    Alternatively, you could join the table onto itself using a left-join, with the ON clause containing (studentname AND NOT question IN (….)), then the WHERE clause would say t2.student IS NULL

    The question is similar to one like “which customers did buy product A but not product B”

Comments are closed.