Skip to main content
Glossary Term

tar (computing)

History and Evolution of Tar - Tar was first introduced in Version 7 Unix in January 1979, replacing the tp program. - The file structure for tar was standardized in POSIX.1-1988 and later POSIX.1-2001. - Tar command was abandoned in POSIX.1-2001 in favor of pax command. - Unix-like operating systems usually include tools to support tar files. - Tar has been ported to the IBM i operating system. Rationale and File Format of Tar - Tar writes data in records of many 512B blocks to optimize writing time. - The user can specify a blocking factor, which is the number of blocks per record. - Tar formats waste less space on tape drives that read and write variable-length data blocks. - Writing one large block takes less time than many small blocks. - Tar formats are supported by most modern file archiving systems. - There are multiple tar file formats, including ustar, pax, and GNU tar. - Tar archives consist of file objects preceded by a 512-byte header record. - File data is written unaltered, but its length is rounded up to a multiple of 512 bytes. - Modern tar implementations fill extra space with zeros. - The end of an archive is marked by consecutive zero-filled records. - The file header record contains metadata about a file. - The header record is encoded in ASCII for portability across different architectures. - The original Unix tar format defines fields for file name, file mode, owners' numeric user ID, groups' numeric user ID, and file size. - Numeric values are encoded in octal numbers using ASCII digits. - The checksum is calculated by taking the sum of the unsigned byte values of the header record. UStar Format and POSIX.1-2001/pax - UStar is the Unix Standard TAR format introduced by the POSIX IEEE P1003.1 standard in 1988. - UStar format allows for longer file names and stores additional information about each file. - Older tar programs ignore the extra information, while newer programs test for the presence of the 'ustar' string. - The maximum filename size in UStar format is 256, split between a path filename prefix and the filename itself. - UStar format includes global extended headers, extended headers for the next file, and vendor-specific extensions. - Sun proposed a method for adding extensions to the tar format in 1997. - The method was later accepted for the POSIX.1-2001 standard. - The format is known as extended tar format or pax format. - The new tar format allows users to add vendor-tagged vendor-specific enhancements. - The POSIX standard defines tags such as atime, mtime, path, linkpath, uname, gname, size, uid, gid, and character set definitions. Uses and Command Syntax of Tar - The tar format is extensively used for open-source software distribution. - *NIX distributions utilize it for various source and binary package distribution mechanisms. - Most software source code is made available in compressed tar archives. - Tar is commonly used for backup and archiving purposes. - It is also used for creating and extracting archives in command line operations. - Basic options for tar include create, auto-compress, append, extract, file, list, and verbose. - The -c or --create option is used to create a new archive. - The -a or --auto-compress option automatically compresses the archive based on the file name extension. - The -r or --append option appends files to an existing archive. - The -x or --extract option extracts files from an archive. - To create an archive file named archive.tar from README.txt and the src directory: tar -cvf archive.tar README.txt src - To extract the contents of archive.tar into the current directory: tar -xvf archive.tar - To create an archive file named archive.tar.gz from README.txt and the src directory and compress it with gzip: tar -cavf archive.tar.gz README.txt src - To extract the contents of archive.tar.gz into the current directory: tar -xvf archive.tar.gz Key Implementations and Limitations of Tar - Many older tar implementations do not record or restore extended attributes or access-control lists. - Star introduced support for ACLs and extended attributes through its own tags for POSIX.1-2001 pax. - More recent versions of GNU tar support Linux extended attributes. - Other formats have been created to address the limitations of tar. - The original tar format has design features that are considered dated. - Solaris tar is based on Unix V7 tar and is the default on Solaris OS. - GNU tar is the default on most Linux distributions and supports various formats. - FreeBSD tar is the default on BSD-based operating systems and can extract from multiple formats. - Schily tar, also known as star, has popular extensions and was developed in 1982. - Python tarfile module supports multiple formats and has been available since 2003. - Tar archive files have the suffix '.tar' and can be compressed using gzip, bzip2, xz, and others. - The compressed form of the archive receives a filename by appending the format-specific compressor suffix. - BSD tar detects a wide range of compressors using the data within the file, not the filename. - Unrecognized formats need manual compression or decompression by piping. - MS-DOS's 8.3 filename limitations resulted in additional conventions for naming compressed tar archives. - Tar has a history of incompatibilities, known as the tar wars. - Most tar implementations can also read and create cpio and pax formats. - Solaris tar is based on the original Unix V7 tar. - GNU tar is based on the public domain implementation pdtar. - FreeBSD tar has become the default on most BSD-based operating systems.