Introduction
A pivot table or cross tabulation is a reporting feature that BAs love to use to summarise transaction data, such as server logs and sales figures. Spreadsheet programs such as Microsoft Excel or OpenOffice.org Calc have nifty wizards to help you create a pivot table. You can also create pivot tables in databases. For example, Microsoft Access has a TRANSFORM … PIVOT
SQL statement for generating a crosstab or pivot table.
What if you're using a database program that doesn't directly support pivot tables? For example, Sqlite 3 and MySQL don't seem to have any SQL statements for pivot tables.
All is not lost; another way to express a pivot table is to use aggregate functions, condition clauses and GROUP BY
clause in this template:
SELECT col1, col2, … <aggregate>(<condition>) … FROM table1 GROUP BY col1, col2, ….
For Sqlite 3, the aggregate functions and GROUP BY
is similar to SQL in other database programs. The condition clause we can use has this syntax: case when <expression> then <expression> end
.
In the next section, we'll demonstration how to create pivot tables in Sqlite 3 using this template. All examples will be shown using Sqlite 3's command line interface, sqlite3.exe.
Sqlite 3 Pivot Table Demonstration
First, you have to download some sample transaction data. I used the NumberGo Pivot Table Tutorial AcmeShirtsCompany.xls spreadsheet as the raw data for this demonstration.
We start sqlite3.exe and use the -column -header arguments make the output of queries more readable.
sqlite3 -column -header test.db SQLite version 3.5.9 Enter ".help" for instructions
Now we create a shirt table based on the headings in that spreadsheet:
create table shirt (Region varchar(8), Category varchar(8), Shirt_Style varchar(8), ShipDate date, Units integer, Price double, Cost double);
Next we load some transaction data into the shirt table:
insert into shirt values ('East','Boys','Tee',date('2005-01-01'),11,5.25,4.66); insert into shirt values ('East','Boys','Golf',date('2005-01-01'),12,5.26,4.57); insert into shirt values ('East','Boys','Polo',date('2005-01-01'),13,5.27,5.01); insert into shirt values ('East','Girls','Tee',date('2005-01-01'),14,5.28,5.01); insert into shirt values ('East','Girls','Golf',date('2005-01-01'),15,5.29,5.10); insert into shirt values ('East','Girls','Polo',date('2005-01-01'),16,5.30,4.67); insert into shirt values ('West','Boys','Tee',date('2005-01-01'),33,6.25,5.36); insert into shirt values ('West','Boys','Golf',date('2005-01-01'),35,6.26,6.24); insert into shirt values ('West','Boys','Polo',date('2005-01-01'),36,6.27,6.03); …
Let's begin our analysis with a simple question: How many shirts were sold in each region?
select Region, sum(Units) from shirt group by Region; Region sum(Units) ---------- ---------- East 21841 North 27275 South 29994 West 23984
Next: in each region, how many Boys' and Girls' shirts were sold? Here's where a pivot table is useful:
select Region , sum(case when Category = 'Boys' then Units end) as Boys , sum(case when Category = 'Girls' then Units end) as Girls , sum(Units) as SubTotal from shirt group by Region; Region Boys Girls SubTotal ---------- ---------- ---------- ---------- East 10586 11255 21841 North 14049 13226 27275 South 14312 15682 29994 West 10763 13221 23984
We can drill further into the data: How many of each shirt style were sold?
select Region , Category , sum(case when Shirt_Style = 'Tee' then Units end) as Tee , sum(case when Shirt_Style = 'Golf' then Units end) as Golf , sum(case when Shirt_Style = 'Polo' then Units end) as Polo , sum(Units) as SubTotal from shirt group by Region, Category; Region Category Tee Golf Polo SubTotal ---------- ---------- ---------- ---------- ---------- ---------- East Boys 3458 3096 4032 10586 East Girls 3688 3481 4086 11255 North Boys 4597 4702 4750 14049 North Girls 4196 4598 4432 13226 South Boys 5192 4670 4450 14312 South Girls 5113 5377 5192 15682 West Boys 3722 3791 3250 10763 West Girls 4472 4235 4514 13221
The pattern becomes obvious, if rather tedious, when you want to use a specific values as a new virtual column.
Discussion
In this article, I've presented a SQL template for generating pivot tables for database programs, such as Sqlite 3, that do not have explicit support for this feature. While this template is extensible, it relies on the developer knowing beforehand the possible values (e.g. Category has 'Boys' and 'Girls', or Shirt Styles has 'Tee', 'Golf' and 'Polo') to use in the condition clause of the template. If there are many possible values, then it becomes very tedious to enumerate each of them in the SQL case when … then … end
clause.
2008-06-01: I had a play with MySQL and found that I can use the same SQL statements to create the pivot tables.
I had saved the data in AcmeShirtsCompany.sql, so to set up my MySQL database, I created the shirt table using mysql.exe, exit the interpreter, then loaded the data into the database using this cmd.exe command: mysql -u root -p -D test < AcmeShirtsCompany.sql
.
2008-06-06: See also SQL Cookbook by Anthony Molinaro, O'Reilly Media.
Thank you very much - this was well written, and immediately useful.
ReplyDeleteIt's worth mentioning that PostgreSQL does this in a rather useful way.
ReplyDeletehttp://www.postgresonline.com/journal/archives/14-CrossTab-Queries-in-PostgreSQL-using-tablefunc-contrib.html
http://www.postgresql.org/docs/8.3/interactive/tablefunc.html
Thank you very much.The webs consist of many sources many pages but you summarized the subject and made unique article.
ReplyDeleteI can't access the xls file, do you still have the AcmeShirtsCompany.sql that you created?
ReplyDelete