Python 2 pyliblzma based decompressing wrapper for lzma/xz streams

Python 2 pyliblzma based decompressing wrapper for lzma/xz streams

The lzma.LZMAFile class from pyliblzma for Python 2 can only construct from file paths, but not from file-like objects which support read() or write() operation. If you have an object representing an lzma/xz stream in Python 2 and you want to decompress from it in a streamed fashion, here is an example implementation using the lzma.LZMADecompressor class also from pyliblzma.

  1. class LZMADecompressing: # pylint: disable=R0903
  2.     """a decompressing wrapper for an lzma/xz stream."""
  3.  
  4.     def __init__(self, lzma_stream, buffer_size = 10240):
  5.         """lzma_stream is not closed by this wrapper."""
  6.         self.lzma_stream = lzma_stream
  7.         self.buffer_size = buffer_size
  8.         self.decompressor = lzma.LZMADecompressor()
  9.         self.finished = False
  10.  
  11.     def read(self, size):
  12.         """conventional read() for a stream/file object."""
  13.         if self.finished:
  14.             return ''
  15.         else:
  16.             decompressor = self.decompressor
  17.             chunk = decompressor.decompress(decompressor.unconsumed_tail, size)
  18.             _len = len(chunk)
  19.             if _len == size: # if leftover already meets the request, return it.
  20.                 return chunk
  21.             else: # otherwise we need to feed in more from lzma_stream.
  22.                 chunks = [chunk]
  23.                 while True:
  24.                     lzma_chunk = self.lzma_stream.read(self.buffer_size)
  25.                     if not lzma_chunk: # nothing left, what we have is all.
  26.                         if decompressor.flush(): # won't have any data.
  27.                             raise AssertionError
  28.                         self.finished = True
  29.                         break
  30.                     else: # feed in new data and see what we get.
  31.                         chunk = decompressor.decompress(
  32.                             decompressor.unconsumed_tail + lzma_chunk,
  33.                             size - _len)
  34.                         chunks.append(chunk)
  35.                         _len += len(chunk)
  36.                         if _len == size:
  37.                             break
  38.                 return ''.join(chunks)

The decompress(data, max_length) method of an lzma.LZMADecompressor object consumes compressed data, which could just be a chunk of data read from an lzma/xz stream, returns up to max_length of decompressed data, and saves possibly excess compressed data in an attribute named unconsumed_tail on the object. Every time we call this method, we need to concatenate the unconsumed_tail with new data from the stream and pass as the data parameter so that the stream is consumed continuously. There is also another attribute named unused_data. it is any possible dummy data in the stream after the end of lzma/xz data, and it can simply be ignored if the stream is supposed to be an lzma/xz stream.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

More information about formatting options

To prevent automated spam submissions leave this field empty.