File Formats
gzip
gzip - pros and cons
-
Pros
- Fast compression and decompression speeds, ideal when spped matters
- Widely supported
-
Cons
- Lower compression ratio than
bzip2
- Not splittable
- Lower compression ratio than
bzip2
bzip2 - pros and cons
-
Pros
- Higher compression ratio than
gzip
, particularly with large files, ideal when space matters - Splittable
- Higher compression ratio than
-
Cons
- Slower than
gzip
, especially on decompression. - Consumes more CPU and memory
- Slower than
lz4
Wikipedia - LZ4 (compression algorithm) (opens in a new tab)
lz4 - pros and cons
-
Pros
- Very fast compression and decompression speeds, compression speed is similar to
lzo
, and decompression speed issignificantly faster
thanlzo
- Splittable
- Very fast compression and decompression speeds, compression speed is similar to
-
Cons
- Less compression than
gzip
andbzip2
- Less compression than
lzo
Wikipedia - Lempel–Ziv–Oberhumer (LZO) (opens in a new tab)
lzo - pros and cons
-
Pros
Higher compression speed
compared toDEFLATE
compression- Very fast decompression
- Allows the user to adjust the balance between compression ratio and compression speed, without affecting the speed of decompression
- Produces files slightly larger than
gzip
while only requiringa tenth of the CPU
use and onlyslightly higher memory utilization
. - Splittable
-
Cons
- Lower compression ratio than
gzip
andbzip2
- Lower compression ratio than
Snappy
Wikipedia - Snappy (compression) (opens in a new tab)
Snappy - pros and cons
-
Pros
- Very fast compression and decompression speeds
- Widely used in Big Data
- Default compression format for
Parquet
files
-
Cons
- Compression ratio is
20–100% lower
thangzip
- Not splittable
- Compression ratio is
xz
Wikipedia - XZ Utils (opens in a new tab)
xz - pros and cons
-
Pros
- Higher compression rates than alternatives like
gzip
andbzip2
, particularly for very large files. - Higher decompression speed than
bzip2
- Splittable
- Higher compression rates than alternatives like
-
Cons
- Slowest
- Most
resource-intensive
- Lower decompression speed than
gzip
- Compression can be much slower than
gzip
, and is slower thanbzip2
for high levels of compression
Use cases
-
gzip
Use when speed is crucial, and moderate compression is acceptable. Ideal for log files and scripts.
-
bzip2
Suited for compressing large text files or when a balance between speed and compression is needed.
-
xz
Best for archiving large datasets or software distributions where compression ratio matters the most.