Glossary Term
tar (computing)
History and Evolution of Tar
- Tar was first introduced in Version 7 Unix in January 1979, replacing the tp program.
- The file structure for tar was standardized in POSIX.1-1988 and later POSIX.1-2001.
- Tar command was abandoned in POSIX.1-2001 in favor of pax command.
- Unix-like operating systems usually include tools to support tar files.
- Tar has been ported to the IBM i operating system.
Rationale and File Format of Tar
- Tar writes data in records of many 512B blocks to optimize writing time.
- The user can specify a blocking factor, which is the number of blocks per record.
- Tar formats waste less space on tape drives that read and write variable-length data blocks.
- Writing one large block takes less time than many small blocks.
- Tar formats are supported by most modern file archiving systems.
- There are multiple tar file formats, including ustar, pax, and GNU tar.
- Tar archives consist of file objects preceded by a 512-byte header record.
- File data is written unaltered, but its length is rounded up to a multiple of 512 bytes.
- Modern tar implementations fill extra space with zeros.
- The end of an archive is marked by consecutive zero-filled records.
- The file header record contains metadata about a file.
- The header record is encoded in ASCII for portability across different architectures.
- The original Unix tar format defines fields for file name, file mode, owners' numeric user ID, groups' numeric user ID, and file size.
- Numeric values are encoded in octal numbers using ASCII digits.
- The checksum is calculated by taking the sum of the unsigned byte values of the header record.
UStar Format and POSIX.1-2001/pax
- UStar is the Unix Standard TAR format introduced by the POSIX IEEE P1003.1 standard in 1988.
- UStar format allows for longer file names and stores additional information about each file.
- Older tar programs ignore the extra information, while newer programs test for the presence of the 'ustar' string.
- The maximum filename size in UStar format is 256, split between a path filename prefix and the filename itself.
- UStar format includes global extended headers, extended headers for the next file, and vendor-specific extensions.
- Sun proposed a method for adding extensions to the tar format in 1997.
- The method was later accepted for the POSIX.1-2001 standard.
- The format is known as extended tar format or pax format.
- The new tar format allows users to add vendor-tagged vendor-specific enhancements.
- The POSIX standard defines tags such as atime, mtime, path, linkpath, uname, gname, size, uid, gid, and character set definitions.
Uses and Command Syntax of Tar
- The tar format is extensively used for open-source software distribution.
- *NIX distributions utilize it for various source and binary package distribution mechanisms.
- Most software source code is made available in compressed tar archives.
- Tar is commonly used for backup and archiving purposes.
- It is also used for creating and extracting archives in command line operations.
- Basic options for tar include create, auto-compress, append, extract, file, list, and verbose.
- The -c or --create option is used to create a new archive.
- The -a or --auto-compress option automatically compresses the archive based on the file name extension.
- The -r or --append option appends files to an existing archive.
- The -x or --extract option extracts files from an archive.
- To create an archive file named archive.tar from README.txt and the src directory: tar -cvf archive.tar README.txt src
- To extract the contents of archive.tar into the current directory: tar -xvf archive.tar
- To create an archive file named archive.tar.gz from README.txt and the src directory and compress it with gzip: tar -cavf archive.tar.gz README.txt src
- To extract the contents of archive.tar.gz into the current directory: tar -xvf archive.tar.gz
Key Implementations and Limitations of Tar
- Many older tar implementations do not record or restore extended attributes or access-control lists.
- Star introduced support for ACLs and extended attributes through its own tags for POSIX.1-2001 pax.
- More recent versions of GNU tar support Linux extended attributes.
- Other formats have been created to address the limitations of tar.
- The original tar format has design features that are considered dated.
- Solaris tar is based on Unix V7 tar and is the default on Solaris OS.
- GNU tar is the default on most Linux distributions and supports various formats.
- FreeBSD tar is the default on BSD-based operating systems and can extract from multiple formats.
- Schily tar, also known as star, has popular extensions and was developed in 1982.
- Python tarfile module supports multiple formats and has been available since 2003.
- Tar archive files have the suffix '.tar' and can be compressed using gzip, bzip2, xz, and others.
- The compressed form of the archive receives a filename by appending the format-specific compressor suffix.
- BSD tar detects a wide range of compressors using the data within the file, not the filename.
- Unrecognized formats need manual compression or decompression by piping.
- MS-DOS's 8.3 filename limitations resulted in additional conventions for naming compressed tar archives.
- Tar has a history of incompatibilities, known as the tar wars.
- Most tar implementations can also read and create cpio and pax formats.
- Solaris tar is based on the original Unix V7 tar.
- GNU tar is based on the public domain implementation pdtar.
- FreeBSD tar has become the default on most BSD-based operating systems.