Directory

Encyclopedia

NodeWorks
                              ENCYCLOPEDIA

Link Checker

Home
Encyclopedia : L : LZ : LZM :

LZMA

 

LZMA

LZMA, short for Lempel-Ziv-Markov chain-Algorithm, is a data compression algorithm used in the 7z format of the 7-Zip archiver. It uses a dictionary compression scheme somewhat similar to LZ77 and features a high compression ratio (generally higher than Bzip2) and a variable compression-dictionary size (up to 4 GB).

Overview

The open source (written in C++) LZMA compression library uses
an improved LZ77 compression algorithm, as well as specific
preprocessing routines for binaries.

Instead of Huffman coding, some entropy coding is used.
The M in LZMA stands for Markov chain.

Streams for data, repeated-sequence size and repeated-sequence
location seem to be compressed separately.

Other used concepts include hash chains,
binary trees and
Patricia tries.

BCJ / BCJ2 Binary file compression

The LZMA SDK comes with the BCJ / BCJ2 compressor included:
For x86, ARM, PowerPC (PPC), IA64 and ARMThumb processors,
jump targets are normalized before compression. For x86, this
means that near jumps, calls and conditional jumps (but not
short jumps and conditional jumps) are converted from the
machine language "jump 1655 bytes backward" style notation
to normalized "jump to address 5554" style notation.

While 7-Zip BCJ2 assumes 32 bit displacements (addresses), for
example the UPX executable file compressor can also use 16
bit values when it detects 16 bit DOS binary file formats.
The RAR compressor uses displacement compression for 32 bit
x86 executables and IA64 Itanium executables.

The difference between BCJ and BCJ2 is that the former only
translates near jump / call targets to their normalized form,
BCJ2 compresses (x86 only) near jump, near call and conditional
near jump targets separately.

implementation (7-Zip) specific comments

The reference implementation, which is available under the GNU LGPL license, has the following properties:

The decompression code for LZMA is around 5KB and the dynamic memory needed during decompression is modest (it depends on the dictionary size). These features make the decompression phase of the algorithm well-suited to embedded applications.

Unfortunately, the use of Microsoft Windows specific features is deeply buried in the source code, which makes it very difficult to create a Unix-compatible version. However, there are two working ports to Unix-like platforms: p7zip is a more-or-less complete port of the 7z and 7za command-line versions of 7-zip
for POSIX systems like Unix (Linux, Solaris, OpenBSD, FreeBSD, Cygwin, ...), MacOS X and BeOS. LZMA Unix Port is a port of only the LZMA code to create a stream based compression utility similar to gzip. This tool is not an archiving utility and so its format is a plain one (and not equivalent to a raw LZMA stream from 7-zip due a missing UInt64 specifying uncompressed filesize at the end of the header). 7-zip uses a more flexible archive format, 7z, and thus neither tool can use the files the other creates, at least for now.

There is a Mac OS X port of 7zip called Compress (not related with the old archiving format), but it is buggy at best.

The PyLZMA Python Wrapper supports compression and decompression on the Windows and Linux platforms.

Some embedded router-dsl-wireless devices (like the US Robotics 9105 and 9106) run a modified version of Linux (source code available on USR website, apparently the source comes from Broadcom) which boots on a filesystem which is basically Cramfs, modified to use LZMA compression instead of ZLIB. They seem to use a thick layer of glue code around the reference decompression code (it's a read-only filesystem like ISO9660, the standard compact disc filesystem). Modified cramfs tools are available to deal with such LZMA CRAMFS filesystem images.

External links


NodeWorks boosts web surfing!
Page Returned in 0.125 seconds - HTML Compressed 66.9%

This article is from Wikipedia. All text is available
under the terms of the GNU Free Documentation License.
 GNU Free Documentation License
© 2008 Chamas Enterprises Inc.