Background
Jim & Reto developed this style guide at LoyaltyMatrix and have carried it with us ever since.
With large development teams and more folks writing code it is really important that we can all easily read & understand each other’s code. And by “code” I don’t just mean long involved load sequences. Simple queries, after all, should be simple to understand and should be written with the same care, style and pride as more complex tasks. This page essentially documents the SQL coding style I have been using for the last few years. I adapted the ideas of Steven Feuerstein from his book Oracle PL/SQL Programming, 2nd Edition from O’Reilly. (Unfortunately, the chapter is not included in the 3rd edition.)
Why we are doing this
Reading each other’s code should be a no brainer – focus should be on how it works (or not works)
All code can, and should, be re-used
A large part of the value of any organization’s value is its IP. A large part of our IP is SQL code.
Here are the fundamentals, to be expanded below
- Looks good on the screen/page – open, lots of white space, no clutter…
- If you use a generated statement, take the time to eliminate the copious clutter
- Uppercase reserved words, lowercase application words
- Vertical layout
- Right align reserved words
- Single things on single lines
- Break long statements (eg CASE) to emphasize continuation
- Maintain vertical alignment in sub-blocks
- It’s OK to be a bit verbose
- AS is required (a new rule for me!)
- Assume SQL-92 functionality
- Use JOIN/ON not FROM/WHERE
- Use consistent sequence in conditions
- Comment for clarity & explanation, not to paraphrase syntax
This standard can evolve, but only very slowly! If there is anything here you strongly object to, please hash it out with the BI team now.
Physical Layout Style
Looking Good
Jim’s creative writing teacher in college first judged a poem by looking at the page from afar to see the form on the page. Good SQL looks good on the page. It is inviting to read.
- Use plenty of white space.
- Keep line lengths to 80, or 100, characters max!
- Always uses spaces never tabs, to indent. That way you are in control, not some subsequent editor.
UPPER RESERVED WORDS. lower application words
This means all reserved words including SQL functions should be upper case.
User functions should be lower case. Table names, column names are lower case
-- Good: SELECT stuff FROM table
-- Bad:
select STUFF from TABLE
Vertical Layout
The eye should easily find the part of a statement you are looking for. Complex SQL is hard enough to understand without needing to track down the phrase of interest.
One key principle is to right align key words as in these examples:
SELECT block
SELECT <stuff>
FROM <main table>
JOIN <next table>
ON <join condition(s)>
AND ...
OR ...
WHERE <select condition(s)>
AND ...
OR ...
GROUP BY <columns>
HAVING <aggregate conditions>
AND ...
OR ...
ORDER BY <columns>
INSERT blocks
INSERT INTO <table, columns>
VALUES <stuff>
or
INSERT INTO <table, columns>
SELECT <stuff>
FROM <etc...>
UPDATE blocks
UPDATE <table>
SET <set expression(s)>
WHERE <update condition>
DELETE blocks
DELETE
FROM <table>
WHERE <delete condition(s)>
Single things on single lines
SELECT columns, UPDATE sets, etc. should each be on a single line
-- Good:
SELECT customer_id,
first_name,
last_name,
...etc...
FROM ...etc...
-- Bad:
SELECT customer_id, first_name, last_name, ...etc. FROM ...etc...
and
UPDATE customer_dim
SET first_name = 'Joe',
last_name = 'Doe'
WHERE ...etc...
This goes with the vertical alignment philosophy and, more importantly, encourages annotation with inline comments (see below)
Break long expressions
And wrap with trailing operators to emphasize continuation. Use 2-character indents on continuation lines.
SELECT 123.456 * very_long_column_name +
another_long_column_name
AS my_calc_field,
...etc...
Break CASE clauses
at each WHEN and, perhaps, at each THEN. ELSE aligns with THEN. Right align END on new line:
SELECT CASE state WHEN 'CA' THEN 'Great'
WHEN 'TX' THEN 'Bad'
ELSE 'Midwest'
END AS state_class,
state_name,
...etc...
FROM ...etc...
or, for longer WHEN terms:
SELECT CASE WHEN state IN ('CA', 'OR', 'WA', 'HI')
THEN 'Great'
WHEN state IN {'TX', 'OK', 'ND', 'SD', 'IN')
THEN 'Bad'
ELSE 'Midwest'
END AS state_class,
state_name,
...etc...
FROM ...etc...
Maintain vertical alignment within sub-blocks
For sub-blocks in your code maintain the alignment. For example, a sub-SELECT has it’s own alignment column
SELECT <stuff>
FROM (SELECT <more stuff>
FROM <table>
WHERE <sub-select condition>
) s
WHERE <condition>
Verbosity is OK in moderation
In general, less is more – as in “more better.” Some optional reserved words add to clarity.
AS when aliasing a column or expression
SELECT qty_sole * unit_price AS ext_price
FROM ...etc...
Alias the FROM & JOIN table names
Even the simplest single table queries have a way of growing more complex. Using table aliases, always!
SELECT cus.customer_key
FROM customer_dim AS cus
Special case – joining to the same table more than once (aka self-join)
In sub-selects use the standard alias with an integer suffix
SELECT <stuff>
FROM sales_fact AS sf
WHERE sf.sales_fact_identity = ( SELECT MAX(sf1.sales_fact_identity )
FROM sales_fact AS sf1
WHERE ...etc...
)
Use descriptive suffix when a table takes different roles
SELECT <stuff>
FROM sales_fact sf
JOIN time_by_day AS date_sale
ON date_sale.date_id = sf.sale_date_id
JOIN time_by_day AS date_order
ON date_order.date_id = sf.order_date_id
WHERE <select condition(s)>
Assume SQL-92 is supported
Use JOIN/ON rather than FROM/WHERE
The “new” JOIN/ON syntax clearly states just what is going on & keeps the join conditions where they belong – just after the join declaration.
SELECT <stuff>
FROM sales_fact sf
JOIN customers c
ON c.customer_id = sf.customer_id
JOIN product p
ON p.product_id = sf.product_id
…
PLEASE — no more FROM/WHERE joins!
The above example shows the basics. Here are the rules for more realistic JOINs:
- Pick the FROM table as table most important to the query. Typically a fact table
- JOIN to a table, specifying an alias using AS
- The join ON condition is of the form <new table column/expression> = <prior table column condition>
- The WHERE clause should be reserved for conditions on the entire select, not additional join conditions.
For example,
SELECT <stuff>
FROM sales_fct AS sf
JOIN customer_dim AS cus
ON cus.customer_identity = sf.customer_identity
AND cus.is_cur_flag = 1 -- just current SCD rows in join
WHERE cus.last_name LIKE 'Sm_th%'
AND sf.sales_amount > 1000
-- (very) bad:
select <stuff> from sales_fct, customer_dim where sales_amount > 1000 and last_name like 'Sm_th%' and is_cur_flag = 1 and sales_fact.customer_identity = customer_dim.customer_identity
Never, Never, Ever
SELECT * FROM …
Never in production code! Since this is a golden rule, better make sure you don’t use it in Development or QA as things have a tendency to slip into our source control system.
Assume formats or type casting
When exporting
Don’t assume the DBMS is your friend! Always use
- CAST(), or
- CONVERT()
PostgreSQL Specifics
… to be added as we go along …
WITH … SELECT instead of JOIN (SELECT …)
Defining stuff before you use it; what a concept!
:: Casting
Really handy when your result set is to be read by real people!
Window Functions
Always explicitly define and name the window
Leave a Reply