Access the NEW Basecamp Support Portal

bzip2

« Back to Glossary Index

Introduction and Overview
– bzip2 is a free and open-source file compression program
– It uses the Burrows-Wheeler algorithm
– It compresses single files, not a file archiver
– Relies on external utilities for tasks like handling multiple files, encryption, and archive-splitting
– Initial release by Julian Seward in 1996

Compression Techniques
– Uses several layers of compression techniques including run-length encoding (RLE), Burrows-Wheeler transform (BWT), move-to-front transform (MTF), and Huffman coding
– Compresses data in blocks between 100 and 900 kB
– Converts frequently recurring character sequences into strings of identical letters
– Compression performance is asymmetric, with decompression being faster than compression

Maintainers and Modifications
– Multiple maintainers since the initial release
– Micah Snyder is the current maintainer since June 2021
– Modifications like pbzip2 for multi-threading to improve compression speed
– Suitable for big data applications with cluster computing frameworks like Hadoop and Apache Spark
– Compressed blocks can be independently decompressed

History and Implementation
– First public release by Julian Seward in July 1996
– Version 1.0 released in late 2000
– Federico Mena accepted maintainership in June 2019 after a nine-year hiatus
– Micah Snyder became the maintainer in June 2021
– Ongoing expansion and development of the project
– Uses a specific order of compression techniques during compression and reverse order during decompression
– Techniques include RLE, BWT, MTF, and Huffman coding
– Replaces sequences of consecutive duplicate symbols with a repeat length
– Burrows-Wheeler transform is at the core of bzip2
– Move-to-front transform and RLE steps optimize compression for natural data patterns

File Format, Efficiency, and Limitations
– No formal specification for bzip2 exists
– A .bz2 stream consists of a 4-byte header, compressed blocks, and an end-of-stream marker with a 32-bit CRC
– Compressed blocks are bit-aligned and no padding occurs
– bzip2 compresses most files more effectively than LZW and Deflate compression algorithms
– LZMA is generally more space-efficient than bzip2, but with slower compression speed
– Huffman coding is used with carefully selected codes
– Bitmap usage to show which symbols are used inside the block
– Limitations include a maximum length of plaintext in a single 900kB bzip2 block and the inability to store multiple files in a single compressed file

bzip2 (Wikipedia)

bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting.

bzip2
Original author(s)Julian Seward
Developer(s)Mark Wielaard, Federico Mena, Micah Snyder
Initial release18 July 1996; 27 years ago (1996-07-18)
Stable release
1.0.8 / 13 July 2019; 4 years ago (2019-07-13)
Repositoryhttps://gitlab.com/bzip2/bzip2/
Operating systemCross-platform[which?]
TypeData compression
LicenseModified zlib license
Websitesourceware.org/bzip2/
bzip2
Filename extension
.bz2
Internet media type
application/x-bzip2
Type codeBzp2
Uniform Type Identifier (UTI)public.bzip2-archive
Magic numberBZh
Developed byJulian Seward
Type of formatData compression
Open format?Yes

bzip2 was initially released in 1996 by Julian Seward. It compresses most files more effectively than older LZW and Deflate compression algorithms but is slower. bzip2 is particularly efficient for text data, and decompression is relatively fast. The algorithm uses several layers of compression techniques, such as run-length encoding (RLE), Burrows–Wheeler transform (BWT), move-to-front transform (MTF), and Huffman coding. bzip2 compresses data in blocks between 100 and 900 kB and uses the Burrows–Wheeler transform to convert frequently recurring character sequences into strings of identical letters. The move-to-front transform and Huffman coding are then applied. The compression performance is asymmetric, with decompression being faster than compression.

The algorithm has gone through multiple maintainers since its initial release, with Micah Snyder being the maintainer since June 2021. There have been some modifications to the algorithm, such as pbzip2, which uses multi-threading to improve compression speed on multi-CPU and multi-core computers.

bzip2 is suitable for use in big data applications with cluster computing frameworks like Hadoop and Apache Spark, as the compressed blocks can be independently decompressed.

« Back to Glossary Index

Request an article

Please let us know what you were looking for and our team will not only create the article but we'll also email you to let you know as soon as it's been published.
Most articles take 1-2 business days to research, write, and publish.
Content/Article Request Form

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!
Request for Proposal

Contact and Business Information

Provide details about how we can contact you and your business.


Quote Request Details

Provide some information about why you'd like a quote.