How to store sequence of bits as a bit stream and use the least possible memory ?

16 visualizaciones (últimos 30 días)
I am implementing a huffman encoding procedure which for example encodes the decimal number -5 as '100010'. Until now I have implemented some code which correctly produces the encoded sequence. The encoded sequence is stored as a string. However, I want the bit sequence to be stored as a bit stream and use only the neccesary amount of space for storage. For example, I want the encoded sequence for -5, which is '100010', to take 6 bits of memory. Is that possible ?
  1 comentario
Guillaume
Guillaume el 10 de En. de 2020
Note that in all modern processors and certainly all that run matlab (x86/x64) the minimum unit of memory is 8 bits and everything is in multiple of 8.

Iniciar sesión para comentar.

Respuestas (3)

Matt J
Matt J el 10 de En. de 2020
Editada: Matt J el 10 de En. de 2020
You could chain all your strings together as a logical column vector and use bwpack, see
Example:
>> A='10001010001010001011010100011000100010100010';
>> B=bwpack((A-'0').');
>> whos A B
Name Size Bytes Class Attributes
A 1x44 88 char
B 2x1 8 uint32

Stephane Dauvillier
Stephane Dauvillier el 10 de En. de 2020
You can use the functions:
  1. dec2bin to tranform decimal number into binary,
  2. string to transform the output of the dec2bin from char to string
  3. regexprep to replace the first 0 by empty
  4. strjoin to join everything
For instance:
strjoin(regexprep(string(dec2bin([66,8,19,7])),"^0+","","once"),"")

Guillaume
Guillaume el 10 de En. de 2020
A bit similar to Matt's suggestion, but may be slightly more memory efficient since bwpack uses multiples of 32-bit, is to:
  • split your stream into block of 8-bits
  • pad the last block to 8 bits if shorter than 8.
  • convert each block to uint8, the smallest unit of storage on x86/x64
eg.
bitstream = char(randi(double('01'), 1, 19)) %random stream of 19 0-1 characters
encoded = bitstream - '0'; %convert to binary vector. Uses 4 times as much memory as bitstream
encoded = [encoded, zeros(1, mod(-numel(bitstream), 8))]; %pad with zeros at the to multiple of 8
encoded = uint8(sum(reshape(encoded, 8, []) .* 2.^(7:-1:0)')) %convert each sequence of 8 bits into 8-bit integer. 1st bit is MSB
You can easily convert back to the (padded) bitstream:
padded_origstream = reshape(dec2bin(encoded, 8)', 1, [])

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by