A crosstab query is a specific query used to create aggregate reports on two or more fields, it’s a handy way to display summary information. At Open Query we have customers using that trick to display production schedules.
The summary table is generated from the database to extract the manufacturing date (mand), unit number (unitn), product area (pro_area), and ranking (rnk). Then we can start using that summary table to process our crosstab. The summary table looks like this:
mand | unitn | pro_area | rnk |
---|---|---|---|
2009-12-15 | 587-MWI2-PP49 | 1 | 11 |
2009-12-15 | 587-MWI2-PP50 | 1 | 10 |
2009-12-16 | 670-W1101 | 1 | 11 |
2009-12-17 | 670-W1102 | 1 | 7 |
2009-12-16 | 670-W1103 | 1 | 10 |
2009-12-16 | 683-BR107-S2-01 | 12 | 9 |
2009-12-18 | 683-BR107-S4-05a | 12 | 11 |
2009-12-18 | 683-BR107-S5-06 | 12 | 10 |
2009-12-15 | 691-02-105 | 1 | 9 |
2009-12-16 | 691-02-107 | 1 | 8 |
2009-12-17 | 691-03-101 | 1 | 6 |
2009-12-17 | 691-03-105 | 1 | 5 |
Starting from that table we are going to generate the final report, who will display the production schedule by factory and by production order (rank). Our crosstab query will look like this:
SELECT pro_area as ‘factory’,
max( case when mand = ‘2009-12-15′ then unitn else ” end ) as ’15Dec09’,
max( case when mand = ‘2009-12-16′ then unitn else ” end ) as ’16Dec09’,
max( case when mand = ‘2009-12-17′ then unitn else ” end ) as ’17Dec09’,
max( case when mand = ‘2009-12-18′ then unitn else ” end ) as ’18Dec09’
FROM pro_sched_sorted
GROUP BY pro_area, rnk
ORDER BY pro_area, rnk desc
And the result of our query will give the following report:
factory | 15Dec09 | 16Dec09 | 17Dec09 | 18Dec09 |
---|---|---|---|---|
1 | 587-MWI2-PP49 | 670-W1101 | ||
1 | 587-MWI2-PP50 | 670-W1103 | ||
1 | 691-02-105 | 691-03-106 | ||
1 | 691-02-107 | 691-04-103 | ||
1 | 670-W1102 | |||
1 | 691-03-101 | |||
1 | 691-03-105 | |||
5 | 698-078 | |||
5 | 698-079 | |||
5 | 698-080 | |||
5 | 698-081 | |||
5 | 698-082 | 698-087 | ||
5 | 698-083 | 698-088 | ||
5 | 698-084 | 698-089 | ||
5 | 698-085 | 698-090 | ||
5 | 698-086 | 698-091 | ||
9 | 692-1930-073 | 692-1930-081 | ||
9 | 692-1930-074 | 692-1930-082 | ||
9 | 692-1930-075 | 692-1930-083 | ||
9 | 692-1930-076 | 692-1930-084 | ||
10 | 692-1930-077 | 692-1930-085 | ||
10 | 692-1930-078 | 692-1930-086 | ||
10 | 692-1930-079 | 692-1930-087 | ||
10 | 692-1930-080 | 692-1930-088 | ||
12 | 683-BR107-S4-05a | |||
12 | 683-BR107-S5-06 | |||
12 | 683-BR107-S2-01 |
Now the result of our crosstab query is statistically correct, but the result layout is not satisfying because of the empty cells that are generated in the case a given factory does not have a production order for a specific date.
I have been looking into a solution for removing those empty cells on the server side. That would involve either filtering, or doing cell-shifting in SQL.
The filtering solution is not really an option since the date columns are aggregated. You cannot just add a WHERE … != ”clause because of the GROUP BY process.
For the second solution, I have thought of writing a special stored procedure as a cursor on the result set that would reorder the report by replacing empty cells by the value in the next row. Now probably MySQL stored procedure language would be somewhat limited for doing previous row/next row operations.
Any thoughts on doing cell-shifting in SQL?
Hi Guillaume,
personally, I don’t think SQL is the right tool for this job. There
are excellent reporting tools that can fire off a single efficient
query and then organize the data on the client side to show all kinds
of crosstabs. Personally, I favor Pentaho for this type of job but
there are many more.
Anyway, lets assume you still want to solve this in SQL.
You say: ” the result layout is not satisfying because of the empty
cells that are generated in the case a given factory does not have a
production order for a specific date.” Well, in the result you show
here, there are many empty cells, but there is never one entire row or
one entire column with empty cells. In other words, I don’t see how
you can remove anything without losing information.
Let’s consider what it would take to get either an empty row or an empty column:
emtpy row: you group by on pro_area, rnk, so any pro_area, rnk
combination that does not have one of the dates you test for each
column heading would result in an empty row. Since the column headings
(i.e. the dates) are known in advance (you couldn’t have written your
query otherwise) you can and in fact should restrict the results to
those dates. And you should do so in the WHERE clause:
WHERE mand IN (‘2009-12-15′,‘2009-12-16′,‘2009-12-17′,‘2009-12-18′)
empty column: in this case it doesn’t matter much if you can detect it
– the real problem is that you have the expression that generates the
column already in the query. Fortunately,the fix is easy – don’t
select that column 🙂
kind regards,
Roland
I don’t understand, what’s the desired result set?
If the answer appears to be “use WHERE, but after the GROUP BY”, that’s what HAVING is for.