Generic SQL notes
Query
BETWEEN
- Both values are inclusive.
COUNT
COUNT(*)counts rows — it includes rows where columns areNULL.COUNT(column)counts non-NULLvalues in that column —NULLs are ignored.COUNT(DISTINCT column)counts distinct non-NULLvalues.
ORDER BY
NULLvalues will always come first.
JOIN
Here’s the SQL JOIN guide formatted in Markdown.
Key idea: joins combine rows from two tables (A and B) using a matching condition (usually A.key = B.key).
INNER JOIN
- Returns only rows where the join condition matches in both A and B.
- Use when you want rows that have related records in both tables.
- SQL example:
SELECT *
FROM A
INNER JOIN B ON A.id = B.a_id;LEFT (OUTER) JOIN
- Returns all rows from A plus matching rows from B; where B has no match, B columns are
NULL. - Use when you need every A row regardless of whether it has a match.
- SQL example:
SELECT *
FROM A
LEFT JOIN B ON A.id = B.a_id;RIGHT (OUTER) JOIN
- Mirror of LEFT JOIN: returns all rows from B plus matching rows from A; unmatched A columns are
NULL. - Less common; supported by many DBs.
- SQL example:
SELECT *
FROM A
RIGHT JOIN B ON A.id = B.a_id;FULL (OUTER) JOIN
- Returns rows that match plus non-matching rows from both A and B; unmatched side columns are
NULL. - Use when you need union of both tables with preserved unmatched rows.
- SQL example:
SELECT *
FROM A
FULL OUTER JOIN B ON A.id = B.a_id;CROSS JOIN
- Cartesian product: every row of A combined with every row of B; no join condition.
- Use rarely (e.g., combinatorial generation).
- SQL example:
SELECT *
FROM A
CROSS JOIN B;SELF JOIN
- A table joined to itself (using aliases) to compare rows within the same table.
- SQL example:
SELECT a.*, b.*
FROM employees a
JOIN employees b ON a.manager_id = b.id;ANTI-JOIN (NOT EXISTS / LEFT JOIN ... WHERE NULL)
- Returns rows in A that have no matching row in B.
- Implementations:
-- NOT EXISTS version (preferred for clarity & performance)
SELECT *
FROM A
WHERE NOT EXISTS (SELECT 1 FROM B WHERE B.a_id = A.id);
-- LEFT JOIN + IS NULL alternative
SELECT A.*
FROM A
LEFT JOIN B ON A.id = B.a_id
WHERE B.a_id IS NULL;SEMI-JOIN (EXISTS / IN)
- Returns rows from A that have at least one match in B, but does not duplicate A rows per matching B row.
- Implementations:
-- EXISTS (preferred)
SELECT *
FROM A
WHERE EXISTS (SELECT 1 FROM B WHERE B.a_id = A.id);
-- IN alternative
SELECT *
FROM A
WHERE A.id IN (SELECT a_id FROM B);JOIN - Performance & tips
- Use
INNER JOINwhen you only want matching pairs (fastest in many cases). - Prefer
EXISTS/NOT EXISTSfor semi/anti-joins;LEFT JOIN ... IS NULLcan give incorrect results ifB.a_idis nullable. - Use appropriate indexes on join keys.
- Avoid
CROSS JOINs unless intentional. FULL OUTER JOINmay be unsupported or slower; considerUNIONofLEFTandRIGHTif needed.
Transaction
Atomicity
Consistency
Isolation (opens in a new tab)
Read phenomenon
In ascending order of seriousness:
-
Dirty readRead uncommitted changes made by another transaction
-
Non-repeatable readA non-repeatable read occurs when, during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.
-
Phantom readA phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.
-
Serialization anomalyThe result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.
Isolation levels
In ascending order of strictness:
Serializable(no phenomenon)Repeatable read(phantom read)Read committed(non-repeatable read, phantom read)Read uncommitted(dirty read, non-repeatable read, phantom read)
| Isolation level | Dirty read | Non-repeatable read | Phantom read | Serialization anomaly |
|---|---|---|---|---|
Read uncommitted | Allowed, but not in PG | Possible | Possible | Possible |
Read committed | Not possible | Possible | Possible | Possible |
Repeatable read | Not possible | Not possible | Allowed, but not in PG | Possible |
Serializable | Not possible | Not possible | Not possible | Not possible |
Durability
Database Design
ID Generation
Auto Increment
UUID
-
128-bit
-
Versions
-
Version 1 & 2 (date-time & MAC address)
- The uniqueness of v1 and v2 UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error. Additionally some operating system permit the end user to customise the MAC address, notably OpenWRT.[
- Usage of the node's network card MAC address for the node ID means that a version-1 UUID can be tracked back to the computer that created it.
-
Version 3 & 5 (namespace name-based)
- v3: MD5
- v5: SHA-1
-
Version 4 (random number)
-
-
Pros
- Sharding
- Merge/Replicate
- Scaling
- Flexibility
- Custom format
- Does not need DB to generate a key
-
Cons
- Performance
-
Resources
Reference
RDBMS
Index
Clustered vs Non-clustered Index
| Clustered Index | Non-clustered Index | |
|---|---|---|
| Storage | The actual data rows are physically sorted and stored in the order of the clustered index key. | The index contains a copy of the indexed columns and a pointer to the actual data row. |
| Sorting | Only one clustered index can be created on a table. | Multiple non-clustered indexes can be created on a table. |
| Key Uniqueness | A clustered index key must be unique. | A non-clustered index key can have duplicate values. |
| Impact on Data Modification | Data modification operations (inserts, updates, deletes) can be slower since the physical order of the rows may need to be rearranged. | Data modification operations are generally faster as they only update the index , not the actual data rows. |
| Index Seek vs Scan | Clustered indexes are efficient for range queries and large data retrieval since the data is physically sorted. | Non-clustered indexes are efficient for specific lookups or single-row retrievals, as they provide a quick lookup path to the data. |
| Included Columns | Clustered indexes include all columns of the table by default | Non-clustered indexes can include additional columns to cover specific query needs. |
| Primary Key Usage | By default, a primary key constraint creates a clustered index. | A primary key can create either a clustered or non-clustered index, depending on the configuration. |
View
Materialized View
-
PostgreSQL: Documentation: 16: 41.3. Materialized Views (opens in a new tab)
Update materialized view periodically or in an event-driven manner.
CTE
| Technique | Advantages | Disadvantages | Suitable Use Case |
|---|---|---|---|
| CTEs | Temporary scope within a single query No storage or maintenance required Improves readability by modularizing code | Limited to the query in which they are defined | Organizing complex queries, temporary transformations, and breaking down multi-step operations |
| Derived Tables | Simplifies nested subqueries No need for permanent storage | Harder to read/maintain for complex queries List cannot be reused multiple times within a query | Quick, single-use transformations and aggregations within a query |
| Views | Reusable across queries Can enhance security by restricting data access | Requires maintenance and can affect multiple queries Complex views can impact performance | Long-term reusable logic and data access control |
Performance
Query Optimization
-
Resources
Databases.Illuminated 4th Edition-Chapter 10 Relational Query Optimization
-
Heuristics
-
Do selection as early as possible. If appropriate, use cascading of selects, commutativity of selects with projects, joins, and products; and distributivity of select over set union, set intersection, and set difference to move the selection as far down the query tree as possible.
-
Use
associativityof join, product, union, and intersection to rearrange relations in the query tree so that the selection operation that will produce the smallest table will be executed first. -
If a product appears as an argument for a selection, where the selection involves attributes of the tables in the product, transform the product into a join. If the selection involves attributes of only one of the tables in the product, apply the selection to that table first.
-
Do projection early. Insertion of projections can be used to reduce the size of intermediate results. If appropriate, use cascading of projects; distributivity of projection over product and set union, intersection, and difference; and commutativity of selection and projection to move the projection as far down the query tree as possible. Examine all projections to see if some are unnecessary.
-
If there is a sequence of selections and/or projections with the same argument, use commutativity or cascading to combine them into one selection, one projection, or a selection followed by a projection.
-
If a subexpression appears more than once in the query tree, and the result it produces is not too large, compute it once and save it (
materialized CTE). This technique is especially useful when querying on views, because the same subexpression must be used to construct the view each time.
-
General Tips
- Reduce the size of the intermediate tables, (use
LIMITandOFFSET)- Perform
selectionearly, only retrieve the needed rows. - Perform
projectionearly, only retrieve the needed columns.
- Perform
- Use
EXPLAINto understand how the query is executed EXPLAINwill only generate a plan for the given statement.EXPLAIN ANALYZEwill return the plan, run the query, and show the actual runtime and row count for each step.- When using
EXPLAIN ANALYZE, because the query will be executed actually, be careful withINSERT,UPDATE, andDELETE. - Minimize locks to tables
- Use materialized views to store the results of a query
- Convert a big query of multiple tables into a series of smaller queries of single tables. Cache and join results on the application level to enhance scalability.
- Replace reusable subqueries with
CTE
Query benchmark
-
PostgreSQL
-
Bruce Momjian: Postgres Blog - Timing a Query (opens in a new tab)
time PGOPTIONS="-c log_min_duration_statement=0 \ -c client_min_messages=log" psql <<END \timing SELECT 1; END
-