Friday, May 20, 2011

Skein as a Crypto Hash (part two)


The SHA-3 competition that NIST is running is coming to a close next year (2012).  Only 5 candidates remain, and the algorithms that have come forth, both winners and losers, have really changed the conventional thinking on what a secure hash algorithm should look like.  I say should as the field is still pretty young all things considered.  In many instances, like Skein, Grøstl, and some others, the hash algorithm is actually a block cipher that has been modified to be irreversible.

Now, there are many other things that these new algorithms bring to the table, but... to be honest... I'm rather smitten with one of them.  Skein.

Still, I encourage you to look at all of them because these last 5 do have some really interesting things going on.  Just bear in mind all white papers are designed to make your head explode.  I'm still heavily medicated from my brush with this particular white paper...

So, the punch line.  The reason I think this algorithm has a lot of traction is its flexibility.  The designers are all security and data experts and all have been around the things that make these algorithms work, and what makes them break, and the ways people use them incorrectly.

One of the big things about hash algorithms is their use in signatures and certificates, and that they have to be immune to spoofing and forgery.  SHA1 and MD5 are very prone to collisions and length-extension attacks, which make forgery possible.  So the designers of Skein came up with a system that extends on research out there already and the known problems with the older algorithms.  I haven't read the source papers, but they are referenced in the white paper.

Here's how I understand it:  Basically, every block of data should be treated uniquely, and the final output should be processed an extra time, just to be sure the function can't be length-extended.  The algorithm uses flags for each type of input, which can be stacked in one fluid process, rather than having to run through whole process for each piece individually and then set up all over again, like in traditional HMAC mode.  Each block is also processed with a counter, guaranteeing them to be unique to prevent against possible loops, and in cases where the input may contain a lot of repetitive stuff.

The heart of the algorithm lies in the ThreeFish algorithm, also defined in the white paper.  This is a function that uses very simple XOR, non-carry addition, and bit rotation, but it does these things through a permutation of the 64-bit state words (did I mention that Skein was native 64-bit throughout?), a LOT of times (72 times for the 256- and 512-bit flavors, and 80 for the 1024-bit flavor to be exact).  The simplicity and use of basic CPU functions allows for fast throughput, and the large number of rounds provides depth enough to overcome rebound and differential attacks.  The ThreeFish algorithm can actually be used for straight encryption, but it has a feed-forward flag that collapses the data onto itself through an XOR when set for use as a hash-primitive.  

It also has a 128-bit tweak that is processed alongside and independent of the key.  It's this tweak that gives Skein the ability to flag each block and provide the counters.  There's a lot of different input types to the ThreeFish cipher that are passed from the main Skein transformation flow.  I won't go into too much detail here, but there are two in particular that I think that set Skein apart.

One block type is the hash output size embedded in the configuration block.  Yes, Skein can output an arbitrary number of bits, and the length requested is part of the configuration of the transformation UP FRONT because Skein doesn't truncate outputs.  If you have two requests for simple hashes on a block of data, but one request asks for 160 bits (SHA1 replacement for example), and another calls for 161 bits (no idea why, that's a weird number, but follow along), the outputs are COMPLETELY DIFFERENT!  I mean every actual bit. The 160 bit output is not an truncation of the 161 bit output like it would be in just about every other algorithm. 

One of the other types of transform blocks is a Personalization String.  This basically means if you have Skein set up in one place for passwords, and in another place for signatures, and another for simple file hashing, you can distinguish them apart without having to compile separate assemblies.  You could have strings like:


Now even if the same data is passed into each process and the outputs are all the same length, since each is personalized, they are unique.  That way any one system that uses them can't be hijacked to process data for any the of the others.  Security hole, PLUGGED.

Anyways, what does this have to do with me...  I'll tell you tomorrow.


No comments:

Post a Comment