gzip
Gzip is a DEFLATE implementation for use with individual files. It is popular on Unix-like systems such as Linux & macOS, and is often seen paired with tar
to create .tar.gz
archives. Formats like ZIP & PNG also use Deflate to different effects.
Format Breakdownâ
While ZIP is a multi-file format that can compress multiple files into a single compressed file, Gzip is a single-file format that compresses a single file into a single compressed file. Both use DEFLATE for compression. ZIP supports encryption, while Gzip does not. ZIP also stores more extensive header information.
Historyâ
In order to properly understand the gzip format, we must first talk about ZIP. A lot of similar or identical information is covered in our ZIP entry.
The ZIP format was developed by Phil Katz as an open format with an open specification, where his implementation PKZIP was shareware.
A restricted ZIP format exists and is used in other filetypes such as Java .jar archives, a slew of Microsoft Office file formats, Office Document Format files (.odt, .ods, .odp), and EPUB files for e-readers.
In around 1990, Info-ZIP came onto the scene. "Info-ZIP's purpose is to provide free, portable, high-quality versions of the Zip and UnZip compressor-archiver utilities that are compatible with the DOS-based PKZIP by PKWARE, Inc." (https://infozip.sourceforge.net/). They did this successfully, leading to increased adoption of the ZIP format.
In the early 1990s the gzip format was developed, derived from the Deflate code in the Info-ZIP utilities. It was designed to replace the Unix compress
utility, which used the (at the time) patented LZW compression algorithm which threatened its free use. Though some specific implementations of Deflate were patented by Phil Katz, the format was not, so a Deflate implementation that did not infringe on any patents was written.
As a compress
replacement, the Unix gzip utility can decompress data that was compressed using compress
. Gzip compresses quite a bit better than Unix compress due to its use of DEFLATE, and it has very fast decompression. It also adds a CRC-32 checksum as an integrity check for the archived data. The header format permits the storage of more information than the compress format allowed, such as the original file name & the file modification time.
The popular tar
utility, which creates an archive of files, has an option to compress directly to the .tar.gz
format and is a very popular use caze for gzip. Since the compression of a .tar
can take advantage of redundancy across files, ZIP often compresses less effectively than the marriage of tar & gz. .tar.gz
is the most common archive format in use on Unix due to its very high portability, but there are better compression methods available. Some of these include XZ, bzip2, brotli, 7-zip, & Zstandard.