History and Evolution of Tar
– Tar was first introduced in Version 7 Unix in January 1979, replacing the tp program.
– The file structure for tar was standardized in POSIX.1-1988 and later POSIX.1-2001.
– Tar command was abandoned in POSIX.1-2001 in favor of pax command.
– Unix-like operating systems usually include tools to support tar files.
– Tar has been ported to the IBM i operating system.
Rationale and File Format of Tar
– Tar writes data in records of many 512B blocks to optimize writing time.
– The user can specify a blocking factor, which is the number of blocks per record.
– Tar formats waste less space on tape drives that read and write variable-length data blocks.
– Writing one large block takes less time than many small blocks.
– Tar formats are supported by most modern file archiving systems.
– There are multiple tar file formats, including ustar, pax, and GNU tar.
– Tar archives consist of file objects preceded by a 512-byte header record.
– File data is written unaltered, but its length is rounded up to a multiple of 512 bytes.
– Modern tar implementations fill extra space with zeros.
– The end of an archive is marked by consecutive zero-filled records.
– The file header record contains metadata about a file.
– The header record is encoded in ASCII for portability across different architectures.
– The original Unix tar format defines fields for file name, file mode, owners’ numeric user ID, groups’ numeric user ID, and file size.
– Numeric values are encoded in octal numbers using ASCII digits.
– The checksum is calculated by taking the sum of the unsigned byte values of the header record.
UStar Format and POSIX.1-2001/pax
– UStar is the Unix Standard TAR format introduced by the POSIX IEEE P1003.1 standard in 1988.
– UStar format allows for longer file names and stores additional information about each file.
– Older tar programs ignore the extra information, while newer programs test for the presence of the ‘ustar’ string.
– The maximum filename size in UStar format is 256, split between a path filename prefix and the filename itself.
– UStar format includes global extended headers, extended headers for the next file, and vendor-specific extensions.
– Sun proposed a method for adding extensions to the tar format in 1997.
– The method was later accepted for the POSIX.1-2001 standard.
– The format is known as extended tar format or pax format.
– The new tar format allows users to add vendor-tagged vendor-specific enhancements.
– The POSIX standard defines tags such as atime, mtime, path, linkpath, uname, gname, size, uid, gid, and character set definitions.
Uses and Command Syntax of Tar
– The tar format is extensively used for open-source software distribution.
– *NIX distributions utilize it for various source and binary package distribution mechanisms.
– Most software source code is made available in compressed tar archives.
– Tar is commonly used for backup and archiving purposes.
– It is also used for creating and extracting archives in command line operations.
– Basic options for tar include create, auto-compress, append, extract, file, list, and verbose.
– The -c or –create option is used to create a new archive.
– The -a or –auto-compress option automatically compresses the archive based on the file name extension.
– The -r or –append option appends files to an existing archive.
– The -x or –extract option extracts files from an archive.
– To create an archive file named archive.tar from README.txt and the src directory: tar -cvf archive.tar README.txt src
– To extract the contents of archive.tar into the current directory: tar -xvf archive.tar
– To create an archive file named archive.tar.gz from README.txt and the src directory and compress it with gzip: tar -cavf archive.tar.gz README.txt src
– To extract the contents of archive.tar.gz into the current directory: tar -xvf archive.tar.gz
Key Implementations and Limitations of Tar
– Many older tar implementations do not record or restore extended attributes or access-control lists.
– Star introduced support for ACLs and extended attributes through its own tags for POSIX.1-2001 pax.
– More recent versions of GNU tar support Linux extended attributes.
– Other formats have been created to address the limitations of tar.
– The original tar format has design features that are considered dated.
– Solaris tar is based on Unix V7 tar and is the default on Solaris OS.
– GNU tar is the default on most Linux distributions and supports various formats.
– FreeBSD tar is the default on BSD-based operating systems and can extract from multiple formats.
– Schily tar, also known as star, has popular extensions and was developed in 1982.
– Python tarfile module supports multiple formats and has been available since 2003.
– Tar archive files have the suffix ‘.tar’ and can be compressed using gzip, bzip2, xz, and others.
– The compressed form of the archive receives a filename by appending the format-specific compressor suffix.
– BSD tar detects a wide range of compressors using the data within the file, not the filename.
– Unrecognized formats need manual compression or decompression by piping.
– MS-DOS’s 8.3 filename limitations resulted in additional conventions for naming compressed tar archives.
– Tar has a history of incompatibilities, known as the tar wars.
– Most tar implementations can also read and create cpio and pax formats.
– Solaris tar is based on the original Unix V7 tar.
– GNU tar is based on the public domain implementation pdtar.
– FreeBSD tar has become the default on most BSD-based operating systems.
This article needs additional citations for verification. (April 2012)
In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.
|AT&T Bell Laboratories
|Various open-source and commercial developers
|pdtar, star, Plan 9, GNU: C
|Unix, Unix-like, Plan 9, Microsoft Windows, IBM i
|BSD tar: BSD-2-Clause
GNU tar: GPL-3.0-or-later
pdtar: Public domain
Plan 9: MIT
|Internet media type
|Uniform Type Identifier (UTI)
u s t a r \0 0 0 at byte offset 257 (for POSIX versions)
absent in pre-POSIX versions
|Type of format
|POSIX since POSIX.1, presently in the definition of pax
1912 NW 143rd Ave #24,
Portland, OR 97229, USA