LZ4 Coding

Sep 12, 2013 at 5:16 PM
Edited Sep 12, 2013 at 7:24 PM
Hey,

I am trying to reduce the size of string to be below 150 bytes. I am trying to use LZ4 encoding/Stream.

Can you please guide me on using lz4 ecoding.

the length of strings I am trying to reduce are around 200 - 600 bytes.

If LZ4 is not the right one for this problem, can you suggest any other techniques which perform well.
Coordinator
Sep 13, 2013 at 4:32 PM
Edited Sep 16, 2013 at 10:03 AM
I don't think LZ4 would be effective against strings so short. LZ4 is dictionary compression (LZ family) so it reduces repetition. What you need is some entropy coding Arithmetic or Huffman because strings are usually using reduced alphabet. DefalteStream uses Huffman so you can use it.
Important: Implementation of DeflateStream in .NET 4 has poor performance against small samples of data. I would make sure to use DefalteStream from DotNetZip or enforce your application to work against .NET 4.5 (they fixed it in 4.5).
const string lorem = 
    "Lorem ipsum dolor sit amet, consectetur adipisicing elit, " +
    "sed do eiusmod tempor incididunt ut labore et dolore magna " +
    "aliqua. Ut enim ad minim veniam, quis nostrud exercitation " +
    "ullamco laboris nisi ut aliquip ex ea commodo consequat. " +
    "Duis aute irure dolor in reprehenderit in voluptate velit " +
    "esse cillum dolore eu fugiat nulla pariatur. Excepteur sint " +
    "occaecat cupidatat non proident, sunt in culpa qui officia " +
    "deserunt mollit anim id est laborum.";

string Compress(string text)
{
    using (var mstream = new MemoryStream())
    {
        using (var zstream = new DeflateStream(mstream, CompressionMode.Compress))
        using (var twriter = new StreamWriter(zstream, Encoding.UTF8))
        {
            twriter.Write(text);
        }
        return Convert.ToBase64String(mstream.ToArray());
    }
}

string Decompress(string base64)
{
    using (var mstream = new MemoryStream(Convert.FromBase64String(base64)))
    using (var zstream = new DeflateStream(mstream, CompressionMode.Decompress))
    using (var treader = new StreamReader(zstream))
    {
        return treader.ReadToEnd();
    }
}

void Main()
{
    var original = lorem + lorem + lorem + lorem + lorem;
    var compressed = Compress(original);
    Console.WriteLine("{0}:{1}", compressed.Length, compressed);
    var decompressed = Decompress(compressed);
    Console.WriteLine("{0}:{1}", decompressed.Length, decompressed);
}
If you don't need string as output (byte[] is sufficient) remove Base64 enconding.
If you need even better result search for Ascii85 encoding.