Glossary Term
ZIP (file format)
Introduction, History, and Standardization of ZIP file format
- ZIP is an archive file format that supports lossless data compression.
- ZIP files may contain compressed files or directories.
- The format was created in 1989 as a replacement for the ARC compression format.
- The ZIP file format was designed by Phil Katz and Gary Conway.
- The format proliferated widely on the internet during the 1990s.
- ISO/IEC 21320-1 Document Container File — Part 1: Core was published in 2015, stating that document container files are conforming Zip files.
- The standard restricts certain features and prohibits encryption, digital signatures, and patched data features.
Version history and features of ZIP file format
- The ZIP File Format Specification has its own version number.
- Key advances in various versions include support for Deflate64 compression, AES encryption, Unicode filename storage, and expanded compression algorithms.
- The latest version is 6.3.10, released in 2022, which added z/OS attribute values and additional 3rd party Extra Field mappings.
ZIP file structure and internal layout
- ZIP files allow for random-access processing without compressing or decompressing the entire archive.
- A directory is placed at the end of a ZIP file, identifying the files and their locations.
- ZIP archives can include extra data unrelated to the archive itself, allowing for self-extracting archives.
- The order of file entries in the central directory does not have to coincide with the order in the archive.
- ZIP files are identified by the presence of an end of central directory record.
- Local file headers provide redundancy by including the same information as the central directory.
- ZIP files can be updated by appending new files and an updated central directory.
ZIP file signatures, headers, and timestamps
- Each entry in a ZIP archive is introduced by a local file header with file information and optional extra data fields.
- Extra data fields are used to support ZIP64 format, encryption, file attributes, and higher-resolution timestamps.
- The ZIP format uses specific 4-byte signatures to denote various structures in the file.
- ZIP archives can be spread across multiple file-system files for storage or transmission purposes.
- The timestamp resolution of files in a ZIP archive is limited to two seconds, matching the FAT filesystem of DOS.
Compression methods, encryption, ZIP64 support, and compatibility
- The ZIP format supports various compression methods, with DEFLATE being the most common.
- ZIP supports a password-based symmetric encryption system known as ZipCrypto.
- ZIP64 format extensions were introduced to overcome the 4GB limit of the original ZIP format.
- Support for ZIP64 varies across different software and operating systems.
- Some extension libraries and tools support ZIP64, while others may not.