Monday, May 30, 2011

Skein vs. The multiple length string of "a" 's

This series of tests took 59 minutes, 4 seconds.  There are 966 tests in total.  My laptop is pretty powerful as far as laptops go, as it is meant to be a portable business desktop (Dell Precision M6400, 64-bit, stats listed below).  Doesn't do so well as a gaming computer, but it does have some graphics power.  But this isn't about games or graphics.  This is about data through-put.  These tests were all pure memory transformations.


Screen cap straight from Computer > Properties
Format: 
Bit Length State
Bit Length Output
String Length (number of 'a's)
Time (h:mm:ss.fffff)
The Resulting Hash 


Results are Base64 encoded for accuracy and brevity.  These are not declared as ASCII strings, but they are built as such.  There is the possibility that .NET is doing something to them in the background that makes them Unicode.  No tree hashing was used.


The take away from this is:  778MB string in memory, roughly 2 minutes flat for 256-bit state-width regardless of output size, 2 minutes 30 seconds for 512- or 1024-bits regardless of output size.  


Complete results after the cut.  Sorry for the small font size but this blog has a narrow text window.  I encourage all developers also working on this code or the same algorithm from any other language to compare all these results from your own implementations and to comment where there are differences, or if you have timing results for similar tests.

Sunday, May 29, 2011

Hackers strike again, Lockheed Martin latest victim

Image from WikiPedia article linked below.

Ok, so over at RSA back in March, they had a data breach (the original source linked to by HAD had gone missing shortly after HAD reported on it, and it was from an RSA press release, here is the NY Times article).  They weren't sure what was taken but it was related to their SecureID products.

Well the breachers DID apparently take something important because now over at Lockheed Martin, who use the very RSA secure tokens that "may" have been breached, there was another hack.  Coming from BoingBoing I'm waiting to hear from other sources to see what the full extent is, since other government contractors could also be affected, but this does not bode well for other organizations.  I know some large hospitals across the country and the World Bank use similar devices for their remote workers and some partners.

HEY BIG COMPANIES:  When you get breached, even if you think nothing was taken or the stuff was "encrypted", DO SOMETHING ABOUT IT, SOONER RATHER THAN LATER!!!   This is why California enacted laws that say you HAVE to tell people, quickly and in writing, when stuff like this happens with customer or consumer data.

I'm looking at you Sony!

Saturday, May 21, 2011

Extreme Couponing

<rant>Ok, so now that TLC is no longer the "Learning" channel, and has become the "Looney" channel, they have this new show about Food Hoarders Extreme Couponing.

So these ladies clip just about every coupon they can get their hands on, they also have membership cards for their local supermarket, and they plan this out like a military raid almost.  One even has an Excel spreadsheet to make sure every coupon counts ahead of time.  Every product, every sale, every available quantity, they meticulously count out the savings.

It doesn't matter if they already have a billion of a particular product at home already.  If it gives them 6 more for a penny, they will do it. 

This one lady bought $600+ worth of shit personal care products and groceries for $2.41!  The whole of her basement is a stash worth over the cost of a new luxury car (read $30,000+)!

I can understand getting the best deals, and I know groceries are getting expensive with fuel prices going up.  But for fuck's sake, do you really need 140 LARGE bottles of laundry detergent for a family of 4?!?!  Who does that much laundry?!?!   And to buy 6 more because you can get it for 25 cents?  That's clinical obsession, no butts about it. 

If they were donating a lot of this stuff to a food kitchen or a homeless shelter or for displaced individuals from floods or fires, I would line up to pay their 6 dollar (actual retail price $1000+) grocery hauls.  But they don't, they displace all their personal belongings and stuff every crevice of their houses for more shit groceries they already have and don't need.</rant>

Friday, May 20, 2011

SHA-3: Competition, Misnomer

So as I mentioned in my last series of posts, the National Institute of Standards and Technology (NIST) is having a competition to replace the aging Secure Hash Standard (SHA-1/SHA-2).  They are calling it, thus far, SHA-3. 

This would be the third hash standard, as indicated by the name.  At the same time, this version is going to be COMPLETELY different from the previous versions, top to bottom.  I think the name is a little misleading...

When they replaced the Data Encryption Standard with the Advanced Encryption Standard a number of years ago, they changed the name to reflect the fact that the new standard was completely different, and much much better.  I'm of the opinion that whatever they pick (*COUGH*SKEIN*COUGH*) should be called the Advanced Hashing Standard (AHS) to reflect the same.

Makes sense, right?

Skein as a Crypto Hash (part three)

(...CONT)

Ok, so I've been gushing about this new wonderful crypto hash function called Skein.  What the hell does this have to do with me?

I've been working in .NET (mostly C# lately) for several years now, and I was always frustrated by the limited scope of the built-in crypto functions.  Particularly since there is ample evidence that MD5 and SHA1 are horribly crippled, the folks at Microsoft refuse to give them up.

Now I know full well that something as standardized as a government mandated suite of tools takes a LONG time to roll out, and adoption is slow and drawn out.  Especially if there is no standardized replacement (which is NOT the case here, SHA2 was supposed to fix that), people are even more reluctant to invest in new hardware/software.  I get that.  But to limit the options to the old and busted stuff and only adopting a very small set of the new hotness, that's just narrow minded.</rant>

Anyways...

I tried a number of years ago to implement many of these algorithms in C++ just to learn how they worked, and to play with them in a sandbox that I controlled.  I learned the hard way that memory management and threat mitigation is not for the faint of heart.

The .NET framework changed the game, and with VB.NET, I was no longer limited to slow, clunky, rapid application development with no meat.  I could incorporate new ideas and found the aforementioned built-in functions allowed me to do more (to a point).  I continued to develop and explore and such, and slowly worked my way into C#. 

I also came across the CryptoGram newsletter during this time, and it kept me up to date on some of the security issues of the day.

That's when I learned about the new SHA-3 competition, and Skein.  And I jumped at the opportunity to work with a new algorithm and really get my hands into it.  I read the white-paper, narrowly avoided a cranium explosion, and dug into the reference code.  Wow...  You haven't lived until you've taken a dive into advanced C++ code written by 7 industry experts by committee. 

Holy crap...

The white paper didn't explain a few things clearly but the code shed some light on most of those (some days I can read machine language better than English).

After a long battle, and comparing my results with the specified samples and a few random samples on the net that people had written in Perl, Java, Python, and others, I finally have something that I think people can actually USE.  And the best part is just about every algorithm submitted, including Skein, is in the public domain, which means they are free for any use you want (assuming of course you don't live in Libya, North Korea, or Iran, Uncle Sam's orders!).  So... yeah.  I'm releasing it into the wild.  :) 

And in case you haven't read between the lines yet, I'm plugging my CodePlex project.  :D  I figured I should at least include the back story, and I needed blog material to get me started.  Sue me.

The other thing is, I think there should be some more test data out there for others to use.  I found it hard to get any samples outside of the obvious tests of the basic functionality.  For instance: The white paper describes how to use Skein as a PRNG, a Key Derivation Function (KDF), and a few notes and how to sign keys, and how to sign messages incorporating the public key used so that the signature and the document cannot be separated.  These are some important uses and functions, but there are no samples.  So I'm releasing my version, and I encourage others to look into expanding the functionality, both for comparison, but also so that this particular algorithm gets more analysis in a variety of languages and situations.

Basically the designers focused on the functionality and parameters spelled out by NIST, and NIST only wanted a HASH function; that's it.  As a result, even though they addressed a number of security issues in the design, issues that REALLY need attention in the industry, they over-designed it for the venue.  I think they knew that, and thus left out any samples that detracted from the matter at hand:  hashing.  Unfortunate, but you also have to figure these guys have day-jobs too. 

There's a LOT of functionality in Skein as a whole, and thus there would be a glut of data to produce (and reproduce with every tweak through the submission process).  So, I figure the community can provide the missing pieces.

I have a nagging suspicion that if NIST does pick Skein, they might knee-cap it just to make it fit the box they wanted, not expand it or let it spill into other boxes.  What I would like to see is Skein make it's way into the hashing standard AND the digital signature standard.

Anyways, that's my take.  Do what you will with it, take it or leave it.  Would also love to hear your comments.

Skein as a Crypto Hash (part two)

(...CONT)

The SHA-3 competition that NIST is running is coming to a close next year (2012).  Only 5 candidates remain, and the algorithms that have come forth, both winners and losers, have really changed the conventional thinking on what a secure hash algorithm should look like.  I say should as the field is still pretty young all things considered.  In many instances, like Skein, Grøstl, and some others, the hash algorithm is actually a block cipher that has been modified to be irreversible.

Now, there are many other things that these new algorithms bring to the table, but... to be honest... I'm rather smitten with one of them.  Skein.

Still, I encourage you to look at all of them because these last 5 do have some really interesting things going on.  Just bear in mind all white papers are designed to make your head explode.  I'm still heavily medicated from my brush with this particular white paper...

So, the punch line.  The reason I think this algorithm has a lot of traction is its flexibility.  The designers are all security and data experts and all have been around the things that make these algorithms work, and what makes them break, and the ways people use them incorrectly.

One of the big things about hash algorithms is their use in signatures and certificates, and that they have to be immune to spoofing and forgery.  SHA1 and MD5 are very prone to collisions and length-extension attacks, which make forgery possible.  So the designers of Skein came up with a system that extends on research out there already and the known problems with the older algorithms.  I haven't read the source papers, but they are referenced in the white paper.

Here's how I understand it:  Basically, every block of data should be treated uniquely, and the final output should be processed an extra time, just to be sure the function can't be length-extended.  The algorithm uses flags for each type of input, which can be stacked in one fluid process, rather than having to run through whole process for each piece individually and then set up all over again, like in traditional HMAC mode.  Each block is also processed with a counter, guaranteeing them to be unique to prevent against possible loops, and in cases where the input may contain a lot of repetitive stuff.

The heart of the algorithm lies in the ThreeFish algorithm, also defined in the white paper.  This is a function that uses very simple XOR, non-carry addition, and bit rotation, but it does these things through a permutation of the 64-bit state words (did I mention that Skein was native 64-bit throughout?), a LOT of times (72 times for the 256- and 512-bit flavors, and 80 for the 1024-bit flavor to be exact).  The simplicity and use of basic CPU functions allows for fast throughput, and the large number of rounds provides depth enough to overcome rebound and differential attacks.  The ThreeFish algorithm can actually be used for straight encryption, but it has a feed-forward flag that collapses the data onto itself through an XOR when set for use as a hash-primitive.  

It also has a 128-bit tweak that is processed alongside and independent of the key.  It's this tweak that gives Skein the ability to flag each block and provide the counters.  There's a lot of different input types to the ThreeFish cipher that are passed from the main Skein transformation flow.  I won't go into too much detail here, but there are two in particular that I think that set Skein apart.

One block type is the hash output size embedded in the configuration block.  Yes, Skein can output an arbitrary number of bits, and the length requested is part of the configuration of the transformation UP FRONT because Skein doesn't truncate outputs.  If you have two requests for simple hashes on a block of data, but one request asks for 160 bits (SHA1 replacement for example), and another calls for 161 bits (no idea why, that's a weird number, but follow along), the outputs are COMPLETELY DIFFERENT!  I mean every actual bit. The 160 bit output is not an truncation of the 161 bit output like it would be in just about every other algorithm. 

One of the other types of transform blocks is a Personalization String.  This basically means if you have Skein set up in one place for passwords, and in another place for signatures, and another for simple file hashing, you can distinguish them apart without having to compile separate assemblies.  You could have strings like:

PROJECTA/512-512/20110511/spark.dust.joe/signatures
PROJECTA/512-512/20110511/spark.dust.joe/hashes
PROJECTA/512-512/20110511/spark.dust.joe/passwords

Now even if the same data is passed into each process and the outputs are all the same length, since each is personalized, they are unique.  That way any one system that uses them can't be hijacked to process data for any the of the others.  Security hole, PLUGGED.

Anyways, what does this have to do with me...  I'll tell you tomorrow.

(...CONT)

Thursday, May 19, 2011

Skein as a Crypto Hash (part one)

So those in the know in the cryptography world realize that MD5 is dead for anything other than simple file hashes, and SHA1 is not far behind it.  MD5 is broken, and SHA1 is garnering new attacks on a regular basis making it a poorer and poorer choice.  They also perform pretty lousy by today's standards.

NIST realized this and had an open competition a while back (which as of this writing is still in Round 3) and the front runners are looking really good, both in performance and security.  Being a long-time recipient of the Crypto-Gram Newsletter of one Mr. Bruce Schneier (considered the Chuck Norris of the security world), when he announced that he was part of the team that submitted Skein, which is a flexible hash algorithm with a tweakable-block-cipher at its core, my curiosity peaked.

First, a little primer for those not in the know or who haven't had to work with such matters (consider yourself lucky, this field can make even the most paranoid feel unprepared), what does a Cryptographic Hash Algorithm do and why is it important.  *COUGH*   Ok, now that we've covered that, why do we need a new one?

Simply put, computers are becoming faster (if not through base mathematical power or speed, then in the ability to do multiple things at once and to hold more things in memory and do more complex things to that memory).  The developers are getting smarter.  Crypto analysts are also getting smarter.  The ways in which the older algorithms used to scramble data are becoming less secure, not because the algorithms changed;  once they become standardized quite the opposite, they remain perfectly static save for maybe a patch or two.  The way people have looked at the data coming out of the algorithms has changed.

The data and algorithms have been picked over and scrutinized and churned under close watch by mathematicians, cryptographers, and statisticians for many years now.  SHA1, for example has been around since 1993.  It's 2011 (18 years later).  MD5 has been around even longer, and was the basis for SHA1, which improved on MD5 but still suffers from some of the same internal flaws.

With that kind of scrutiny, the cracks and flaws in any crypto system only get wider, not smaller, that's just how it works.  So now that the first real government-standard algorithm has aged to its breaking point, it needs a successor.

They did try with the SHA2 family 10 years back (and there's talk of even more variants of the SHA2 family to make them direct drop-ins for SHA1 to speed up the adoption rate).  These use more data and change the structure of the algorithm, but that only goes so far, and they still perform pretty slowly for today's needs.  That's where SHA-3 comes in.

(CONT...)

The Obligatory Welcome Post

Hi.   Ummm... I'm Dustin, aaaand... this is my blog.  Or at least it will be, although by the time you read this it could already be... or be abandoned (gosh that would be awful...)

Anyways, I will ramble at length or at brief about various technology topics at random and at will.  There will be NO regular schedule and few filters.  If you don't like the words [REDACTED] or [REDACTED] or [THIS ONE WAS PARTICULARLY BAD] or [NOT SURE ON THIS ONE BUT BETTER TO BE SAFE], then I implore you to grow up and read it anyways.

If you don't understand the reference in the title... then you've obviously never heard of WikiPedia, go read it now and when you come back we'll both have a laugh, together.

You can also follow my madness on Twitter.