File Formats
gzip
gzip - pros and cons
-
Pros
- Fast compression and decompression speeds, ideal when spped matters
- Widely supported
-
Cons
- Lower compression ratio than
bzip2 - Not splittable
- Lower compression ratio than
bzip2
bzip2 - pros and cons
-
Pros
- Higher compression ratio than
gzip, particularly with large files, ideal when space matters - Splittable
- Higher compression ratio than
-
Cons
- Slower than
gzip, especially on decompression. - Consumes more CPU and memory
- Slower than
lz4
Wikipedia - LZ4 (compression algorithm) (opens in a new tab)
lz4 - pros and cons
-
Pros
- Very fast compression and decompression speeds, compression speed is similar to
lzo, and decompression speed issignificantly fasterthanlzo - Splittable
- Very fast compression and decompression speeds, compression speed is similar to
-
Cons
- Less compression than
gzipandbzip2
- Less compression than
lzo
Wikipedia - Lempel–Ziv–Oberhumer (LZO) (opens in a new tab)
lzo - pros and cons
-
Pros
Higher compression speedcompared toDEFLATEcompression- Very fast decompression
- Allows the user to adjust the balance between compression ratio and compression speed, without affecting the speed of decompression
- Produces files slightly larger than
gzipwhile only requiringa tenth of the CPUuse and onlyslightly higher memory utilization. - Splittable
-
Cons
- Lower compression ratio than
gzipandbzip2
- Lower compression ratio than
Snappy
Wikipedia - Snappy (compression) (opens in a new tab)
Snappy - pros and cons
-
Pros
- Very fast compression and decompression speeds
- Widely used in Big Data
- Default compression format for
Parquetfiles
-
Cons
- Compression ratio is
20–100% lowerthangzip - Not splittable
- Compression ratio is
xz
Wikipedia - XZ Utils (opens in a new tab)
xz - pros and cons
-
Pros
- Higher compression rates than alternatives like
gzipandbzip2, particularly for very large files. - Higher decompression speed than
bzip2 - Splittable
- Higher compression rates than alternatives like
-
Cons
- Slowest
- Most
resource-intensive - Lower decompression speed than
gzip - Compression can be much slower than
gzip, and is slower thanbzip2for high levels of compression
Use cases
-
gzip
Use when speed is crucial, and moderate compression is acceptable. Ideal for log files and scripts.
-
bzip2
Suited for compressing large text files or when a balance between speed and compression is needed.
-
xz
Best for archiving large datasets or software distributions where compression ratio matters the most.