Generic SQL notes
Query
BETWEEN
- Both values are inclusive.
COUNT
COUNT
will count the number of times a field is notNULL
.
ORDER BY
NULL
values will always come first.
Transaction
Atomicity
Consistency
Isolation (opens in a new tab)
Read phenomenon
In ascending order of seriousness:
-
Dirty read
Read uncommitted changes made by another transaction
-
Non-repeatable read
A non-repeatable read occurs when, during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.
-
Phantom read
A phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.
-
Serialization anomaly
The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.
Isolation levels
In ascending order of strictness:
Serializable
(no phenomenon)Repeatable read
(phantom read)Read committed
(non-repeatable read, phantom read)Read uncommitted
(dirty read, non-repeatable read, phantom read)
Isolation level | Dirty read | Non-repeatable read | Phantom read | Serialization anomaly |
---|---|---|---|---|
Read uncommitted | Allowed, but not in PG | Possible | Possible | Possible |
Read committed | Not possible | Possible | Possible | Possible |
Repeatable read | Not possible | Not possible | Allowed, but not in PG | Possible |
Serializable | Not possible | Not possible | Not possible | Not possible |
Durability
Database Design
ID Generation
Auto Increment
UUID
-
128-bit
-
Versions
-
Version 1 & 2 (date-time & MAC address)
- The uniqueness of v1 and v2 UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error. Additionally some operating system permit the end user to customise the MAC address, notably OpenWRT.[
- Usage of the node's network card MAC address for the node ID means that a version-1 UUID can be tracked back to the computer that created it.
-
Version 3 & 5 (namespace name-based)
- v3: MD5
- v5: SHA-1
-
Version 4 (random number)
-
-
Pros
- Sharding
- Merge/Replicate
- Scaling
- Flexibility
- Custom format
- Does not need DB to generate a key
-
Cons
- Performance
-
Resources
Reference
RDBMS
Index
Clustered vs Non-clustered Index
Clustered Index | Non-clustered Index | |
---|---|---|
Storage | The actual data rows are physically sorted and stored in the order of the clustered index key. | The index contains a copy of the indexed columns and a pointer to the actual data row. |
Sorting | Only one clustered index can be created on a table. | Multiple non-clustered indexes can be created on a table. |
Key Uniqueness | A clustered index key must be unique. | A non-clustered index key can have duplicate values. |
Impact on Data Modification | Data modification operations (inserts, updates, deletes) can be slower since the physical order of the rows may need to be rearranged. | Data modification operations are generally faster as they only update the index , not the actual data rows. |
Index Seek vs Scan | Clustered indexes are efficient for range queries and large data retrieval since the data is physically sorted. | Non-clustered indexes are efficient for specific lookups or single-row retrievals, as they provide a quick lookup path to the data. |
Included Columns | Clustered indexes include all columns of the table by default | Non-clustered indexes can include additional columns to cover specific query needs. |
Primary Key Usage | By default, a primary key constraint creates a clustered index. | A primary key can create either a clustered or non-clustered index, depending on the configuration. |
View
Materialized View
-
PostgreSQL: Documentation: 16: 41.3. Materialized Views (opens in a new tab)
Update materialized view periodically or in an event-driven manner.
Performance
General Tips
- Only retrieve the columns you need
- Only retrieve the rows you need (use
LIMIT
andOFFSET
) - Use
EXPLAIN
to understand how the query is executed - Use
EXPLAIN ANALYZE
to understand how the query is executed and how long it takes - Reduce table scans
- Minimize locks to tables
- Use materialized views to store the results of a query
- Convert a big query of multiple tables into a series of smaller queries of single tables. Cache and join results on the application level to enhance scalability.