LZ4Codec.Encode64(pageBytes, 0, pageBytes.Length); with small byte arrays

Jun 10, 2015 at 1:47 AM
Edited Jun 10, 2015 at 11:39 PM
I see failures encoding when length of byte array is less than a certain size (around < 100). Is this expected and if so why? (using safe pure version)

I worked around it by not compressing small byte arrays with a condition like
  if (pageBytes.Length > s_lz4MinSize)
  {
    pageBytes = LZ4Codec.Encode64(pageBytes, 0, pageBytes.Length);
    Debug.Assert(pageBytes != null);
  }
Coordinator
Jul 28, 2015 at 2:44 PM
I'm not sure what do you mean as failure. If it needs more bytes than your input data and complains about it? Yes, it is possible. Not all data can be compressed especially small chucks.
There is function that will calculated how much it may take:
public static int MaximumOutputLength(int inputLength)
{
  return inputLength + (inputLength / 255) + 16;
}
If, on the other hand, if you are saying that safe version fail when unsafe works, it is something to look at. Can you provide sample data?
Jul 29, 2015 at 2:30 AM
I switched to unsafe mode a while back and I tried to reproduce this issue tonight but so far I can't. Function I used from your lib internally uses MaximumOutputPath so it should work but as I recall output could end up even larger than MaximumOutputLength would give you for certain small byte arrays. I put in code to always skip compression if byte array is < 100 bytes since compression normally would not reduce the size anyway in such cases.

All of this is working fine now, I am very happy with this lib, LZ4 compression is really great. Almost same speed as uncompressed data but with space reduction!

See it in work yourself within VelocityDB (www.VelocityDB.com)

I saw that original C code was updated to be even faster in the last year, any plans to update the C# version to be compatible with this improved LZ4?
Coordinator
Jul 29, 2015 at 9:02 PM
There was some problem with Encode when it could throw IndexOutOfRange exception.
I fixed it and fix is in new repo on GitHub: https://github.com/MiloszKrajewski/lz4net
I'm not sure if it was your problem so I don't know if it fixes it, but you can try.
It has not been released (as nuget package) as I did not do sanity testing yet.

The new C version has some things which make porting complicated. Yann Collet (The Author) changed the original code to use many small methods. It's, of course, the right thing to do but in .NET it kills performance unless you use aggresive inlining, which is not available in .NET 4, which would mean: no more XP compatibility.

That's kind of big change and I'm not convinced I should do it yet.