SQL

Generic SQL notes

Query

BETWEEN

  • Both values are inclusive.

COUNT

  • COUNT will count the number of times a field is not NULL.

ORDER BY

  • NULL values will always come first.

Transaction

Atomicity

Consistency

Isolation (opens in a new tab)

Read phenomenon

In ascending order of seriousness:

  1. Dirty read

    Read uncommitted changes made by another transaction

  2. Non-repeatable read

    A non-repeatable read occurs when, during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.

  3. Phantom read

    A phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.

  4. Serialization anomaly

    The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.

Isolation levels

In ascending order of strictness:

  1. Serializable (no phenomenon)
  2. Repeatable read (phantom read)
  3. Read committed (non-repeatable read, phantom read)
  4. Read uncommitted (dirty read, non-repeatable read, phantom read)
Isolation levelDirty readNon-repeatable readPhantom readSerialization anomaly
Read uncommittedAllowed, but not in PGPossiblePossiblePossible
Read committedNot possiblePossiblePossiblePossible
Repeatable readNot possibleNot possibleAllowed, but not in PGPossible
SerializableNot possibleNot possibleNot possibleNot possible

Durability

Database Design

ID Generation

Auto Increment

UUID

  • 128-bit

  • Versions

    • Version 1 & 2 (date-time & MAC address)

      • The uniqueness of v1 and v2 UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error. Additionally some operating system permit the end user to customise the MAC address, notably OpenWRT.[
      • Usage of the node's network card MAC address for the node ID means that a version-1 UUID can be tracked back to the computer that created it.
    • Version 3 & 5 (namespace name-based)

      • v3: MD5
      • v5: SHA-1
    • Version 4 (random number)

  • Pros

    • Sharding
    • Merge/Replicate
    • Scaling
    • Flexibility
    • Custom format
    • Does not need DB to generate a key
  • Cons

    • Performance
  • Resources

Reference

RDBMS

Index

Clustered vs Non-clustered Index

Clustered IndexNon-clustered Index
StorageThe actual data rows are physically sorted and stored in the order of the clustered index key.The index contains a copy of the indexed columns and a pointer to the actual data row.
SortingOnly one clustered index can be created on a table.Multiple non-clustered indexes can be created on a table.
Key UniquenessA clustered index key must be unique.A non-clustered index key can have duplicate values.
Impact on Data ModificationData modification operations (inserts, updates, deletes) can be slower since the physical order of the rows may need to be rearranged.Data modification operations are generally faster as they only update the index , not the actual data rows.
Index Seek vs ScanClustered indexes are efficient for range queries and large data retrieval since the data is physically sorted.Non-clustered indexes are efficient for specific lookups or single-row retrievals, as they provide a quick lookup path to the data.
Included ColumnsClustered indexes include all columns of the table by defaultNon-clustered indexes can include additional columns to cover specific query needs.
Primary Key UsageBy default, a primary key constraint creates a clustered index.A primary key can create either a clustered or non-clustered index, depending on the configuration.

View

Materialized View

Performance

General Tips

  • Only retrieve the columns you need
  • Only retrieve the rows you need (use LIMIT and OFFSET)
  • Use EXPLAIN to understand how the query is executed
  • Use EXPLAIN ANALYZE to understand how the query is executed and how long it takes
  • Reduce table scans
  • Minimize locks to tables
  • Use materialized views to store the results of a query
  • Convert a big query of multiple tables into a series of smaller queries of single tables. Cache and join results on the application level to enhance scalability.

Query Optimization