Lossless Compression
– Lossless compression is the process of encoding information using fewer bits without losing any information.
– Most real-world data exhibits statistical redundancy, which allows for lossless compression.
– Run-length encoding is a basic example of lossless compression, where repeated strings of data are substituted with table entries.
– The Lempel-Ziv (LZ) compression methods, such as LZW, are popular algorithms for lossless storage.
– Probabilistic models, like prediction by partial matching and the Burrows-Wheeler transform, are used in modern lossless compressors.
Lossy Compression
– Lossy compression methods accept some loss of information to save storage space.
– Perceptual differences in how humans perceive data are exploited in lossy compression schemes.
– Transform coding, particularly the discrete cosine transform (DCT), is widely used in lossy compression.
– Lossy compression is extensively used in multimedia formats for images, video, and audio.
– Psychoacoustics and psychovisuals are used in lossy compression for sound, images, and video.
Compression and Decompression Processes
– Data compression is performed by an encoder, while decompression is performed by a decoder.
– Compression reduces the resources required to store and transmit data.
– The design of compression schemes involves trade-offs between compression degree, distortion introduced, and computational resources required.
– Lossy compression methods use specialized techniques, such as psychoacoustics and speech coding, for specific types of data.
– Lossy compression can cause generation loss.
Theoretical Basis for Compression
– Information theory and Shannon’s source coding theorem provide the theoretical basis for compression.
– Algorithmic information theory is used for lossless compression, while rate-distortion theory is used for lossy compression.
– Claude Shannon is credited with creating the foundations of compression through his fundamental papers in the late 1940s and early 1950s.
– Coding theory and statistics are also associated with compression.
– Compression is based on the principles of reducing redundancy and exploiting statistical patterns in data.
Compression Formats and Applications
– DEFLATE is a popular compression format optimized for decompression speed and compression ratio.
– LZW algorithm is widely used in compression systems, including GIF images and programs like PKZIP.
– Arithmetic coding, a more modern technique, can achieve superior compression compared to other methods like Huffman coding.
– Compression formats like HEVC, MPEG, and JPEG are used in multimedia applications for video and image compression.
– Lossy compression is used in digital cameras, DVDs, Blu-ray, streaming video, and internet telephony.
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.
The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding: encoding is done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal.
Compression is useful because it reduces the resources required to store and transmit data. Computational resources are consumed in the compression and decompression processes. Data compression is subject to a space-time complexity trade-off. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (when using lossy data compression), and the computational resources required to compress and decompress the data.