Friday, June 17, 2011

Return of The Multiple Length String of "a" 's Part II

Ok, much improved.  The whole process of tests now only takes 20 minutes, not 60 (all 966 tests), and the throughput is now:

256bit 778MB takes 44-45 seconds
512bit 778MB takes 44-45 seconds
1024bit 778MB takes 51-52 seconds

This is a DRASTIC improvement, but I still feel like I can do better.  Basically what I did was go back to good ol' C++ routines for the cores of the ThreeFish functions, and unroll EVERYTHING!  Only the MIX and INVMIX functions right now are actual functions, and normally where there would be loops, I actually wrote out the individual steps and removed as many array/pointer look-ups as possible.

With a few compiler tweaks, I might be able to squeeze more performance out.  Also, as I've discovered in testing, my memory usage needs to improve, and I need to make sure that my C++ routines zero out memory properly when they are done with their local buffers.  I've already done some of that in this version (which I will push up to CodePlex later today).

NOTE: (landmine) For whatever reason, my C++ routines will NOT use the intrinsic _rotl64 and _rotr64 functions.  The functions don't even seem to work under Visual Studio 2010 SP1 with .NET 4.  I had to write out the rotates as two shift functions:

*Y = (UInt64)(*Y << N) + (UInt64)(*Y >> (64 - N)); //ROTATE LEFT
*Y = (UInt64)(*Y >> N) + (UInt64)(*Y << (64 - N)); //ROTATE RIGHT

Whether or not the intrinsic functions would give me any kind of speed boost I have no idea.  If I "_forceinline" these functions I might even get a tad more speed, but at 20 minutes for a full battery of tests, I'm going to just push up this version for now and call it a small victory.

[UPDATE:  Ok, _forceinline is a BAD idea, it takes 10 times as long to build, and it actually reduces the performance versus not using it, so don't do _forceinline on the mix functions]

I am consistently getting the same hash results for all my tests, and all my code still passes the NIST Known Answer Tests (much faster might I add), so the outputs that I posted previously should still work for any implementation if you are so inclined to use them.

No comments:

Post a Comment