Wednesday, December 28, 2011

SharePoint 2010 BCS Field - Setting the field/column

I found where SharePoint keeps the bodies!

...well at least where it keeps the tombstones.

[In SharePoint and other programming worlds, the term "tombstone" or "headstone" often refers to a link, ID, or other mechanism or value that references some other value, much like a pointer in C/C++ terms.]

For BCS columns, I couldn't find a good article to automatically do the look-up and set the related fields of an External Content Type (ECT) field/column for SharePoint 2010 (pure 2010 mind you).  The code I COULD find was for SharePoint 2007, ported to 2010, but the objects and DLL's referenced in all of the examples were rendered OBSOLETE by Microsoft, so I couldn't compile against them.  This was a total bummer and a set-back, because those methods used to actually work (according to the comments people posted on those articles and blogs).

What I WAS able to find, was that, for MS SQL tables at least, there was a "__b" string that was stored in each item of a purely external list that referenced the external row.  You could see it as you hovered over the item in the list, the URL would contain a string at the end ..."&ID=__b"... blah blah blah.  That blah portion could be a complex number, or a dressed-up GUID, whatever the primary key identifier field was in SQL, but it ALWAYS started "__b" (again, this is for MS SQL, your mileage may vary).  I tested this value by creating a new item in a different list that had that ECT column, and when I did the look up against the value, it pulled back the record!!! EUREKA! 

So if I set the value for that field in code (ala "theItem["Field"] = stuff.ToString();") then it should work, right?  WRONG!  You'll see that the value might be set, or it might not be (random), and the value won't actually look itself up.  :(  Also if you run a workflow against the field and attempt to get anything like "[field: OtherTableRelatedField]" it will error out, telling you that the value isn't in the expected format, which will halt a workflow at "Error Occurred".  The look-up works in the UI in the Edit or New form with no problems, but not from code, be it client object model, server object model, or PowerShell.

HOWEVER!  Afterwards, you can do a refresh using the little icon next to the ECT column title in a view through the GUI to force it to refresh against the external content, and it will parse the entire list and update any rows that are out of date, or that contain a tombstone ("__b") value.  AND THEN IT WILL DO THE LOOK-UP!  Afterwards, your workflows and other code will operate just fine; the related fields will have the appropriate data in them.

I have found NO WAY to automatically or programmatically kick off that refresh so that the field will be set correctly at the time that I set it.  :(  So... this is basically a hack, and an 80% solution. 

Also, I should point out, that tombstone value that points to the DB row (that "__b" value) is stored in a RELATED field to the ECT column, in a Multiple Lines of Text (Notes) type field!  This is not easy to find when debugging, so it took me the better part of a day!  I also wasn't aware of that "__b" value even existed until a coworker of mine pointed it out to me (who isn't even a programmer, but works with workflows all the time, Thanks Jason!).

Here's how you get that value in code:

SPFieldMultiLineText notes = theItem.Fields.GetField(
as SPFieldMultiLineText; // notice the nesting?
string tombstone = notes.GetFieldValueAsText(
theItem[theItem.Fields["ECTField"].RelatedField]);  // more nesting

This will provide the value.  Then to stuff it back in to a new item that I had created moments before (in this case, a document set), I had to do this, with AllowUnsafeUpdates set to true around the block of code:

SPBusinessDataField dataField = theItem.Fields["ECTFieldName"] as SPBusinessDataField; // external data column

theWeb.AllowUnsafeUpdates = true;
dataField.ParseAndSetValue(theItem, tombstone);
theItem[dataField.RelatedField] = theItem["ECTFieldName"];
// if you do theItem.SystemUpdate() instead, 
//   workflows won't kick off
theList.Update(true);  // not sure if this is needed?
theWeb.AllowUnsafeUpdates = false;

This will literally set the field value to that "__b" string, at which point, you must do a refresh on the column through the UI (which does EVERY ITEM IN THE LIST!) in order for SharePoint to replace the tombstone with actual data from the external system.

If someone can get me the rest of the way with this, I would GREATLY appreciate it, but at least this "hack" might be useful to some other developer.

Happy Hacking, and Merry Coding!

Monday, September 12, 2011

Ressurecting the Dead (TV Edition)

So after hurricane Irene came through there were a series of bad thunderstorms that ripped through Maryland.  My mother-in-law's neighborhood was the unfortunate victim of a "well placed" lightning strike that killed her Vizio 32" plasma TV.  <rant>This is of course MY fault for not telling her that she needed to buy the EXPENSIVE surge protector</rant>  It wouldn't turn on, and the Vizio logo wouldn't light up (it normally lights up orange when plugged in but off, and white while actually powered up #powerleech).

Fearing the worst (and hoping for a free TV out of the deal), I took it home, painstakingly disassembled the outer case, CAREFULLY separated and set aside the outer glass (there's a tinted pane in front of the actual display that literally just sits inside the front frame, no screws or tape or glue, just little tabs molded into the plastic).  I guess that pane is to keep the heat away from the viewers?  Anyways, upon inspection of the circuit boards, I was surprised to see the LG logo everywhere, and also not surprised to see my old nemesis...


Those 5 with the crowned-tops are the culprits.  (They're also strangely arranged in a straight line...)

I carefully wrote down the positions, part numbers, values, and then de-soldered them one-by-one.  I give the designers credit for silk-screening the component numbers ON BOTH SIDES! (thumbs up) That is a HUGE help in trying to find the right sonnavabitch component to de-solder.

I had 2 replacements on-hand, and a RadioShack nearby.  Less than 24 hours later...


<anecdote> Incidentally, a number of months back after a similar storm, I stumbled upon an LG 22" LCD widescreen monitor in the dumpster, which only had a minor scratch on it.  That particular model isn't in production anymore, but it sold for $230+ when new.  It would only show a white screen when I attempted to power it on, so it was in "better" shape than the Vizio but otherwise unresponsive (no menu, no text, just pure white).  After I opened it up, I saw the exact same problem, and oddly enough they used the exact same manufacturer of caps (although looking back that shouldn't surprise me if in the end they both had LG boards inside them #duh).  The problem I ran into with the monitor, though, is I got it working, but it showed the white screen after I put it all back together.  It pissed me off confused me at first, but then I realized, the connectors had been TAPED in place before I took it completely apart, and I removed that tape to fix the boards.  As I was reassembling, the connectors were falling out, because they didn't have really anything in the way of retainers.</anecdote>

Lesson learned:  make sure your connectors STAY PUT when you reassemble! #moralofthestory

<rant> Manufactures, in a misguided and mostly cost-minded fashion, tend to pick components that just barely go over the "expected" operating parameters of the circuit being designed.  Using more robust components costs money, so they use just what they have to.  Because of that, a simple spike on the power lines kills many electronics which end up in the dumpster and the landfill when they could EASILY have been saved by just using a few better quality components, and/or if someone bothered to open them up to fix simple issues like this.  What a shame... #ewaste</rant>

Sunday, September 4, 2011

New iTunes not showing the store after update

So a friend of my wife's computers were not showing the iTunes store after the update to on both 64 and 32 bit platforms, on Vista and Win 7.  The only thing in common between the two machines was McAfee offered by Verizon.

As it turns out you have to do a "netsh winsock reset" from an administrator command prompt, and then reboot.  McAfee seems to complain that something has been removed after you do that, and telling McAfee to fix the problem might screw up iTunes again, but I'm not sure.  Once I solved the problem I moved on to other things so I didn't do further testing.  Just something to be aware of.

Wednesday, July 13, 2011

Lost my phone... now what?

Losing a cell phone that contains large portions of your life is frustrating and nerve destroying... What passwords do I have to change?  How will I get my contacts back?  What stuff did I have on there that someone could use against me (ID theft, etc.)?

Per an earlier blog post you know I use the Google Two-Factor authentication, which is this case made it REALLY easy to cut off my mail from that phone, but other things were still attached that required me to change ALL KINDS of passwords, like my work e-mail, etc.

...this sucks...  At least I won't lose my Angry Birds progress (that was on another phone).

Wednesday, June 29, 2011

Milestone, MCTS for SharePoint 2010 App Dev (70-573)

As of today, I am an MCTS in SharePoint 2010 (Application Development).


It feels GREAT!  The test is harder than it looks!

Saturday, June 25, 2011

Dual 9-Segment LED Display Module with Points

Dual 9-Segment LED Display Module
With Points
By @SparkDustJoe
Open Hardware, No trademark or patents

I’m not great with microcontrollers and surface mount components, so I’m putting out this spec document and hoping someone runs with it.
What I’m looking for is a module with 2 digits, capable of displaying some simple English or Latin alphanumeric characters, and a few special characters (think degree symbol, apostrophe, +/-, etc.).  The module also includes 4 dots next to each digit (to the right) for decimals, colons, am/pm indicators, etc.
This module will have either traditional serial (9600/8N1) or SPI serial inputs with cascade capability, and the possibility of being addressable.  
The diagram at the top is a simplified version of what I’m looking for.  The actual individual digit height should be closer to 3”-5” (read, three to five INCHES) high with individual surface-mount LED’s grouped together to make up the segments.  For easy mounting behind glass or acrylic, the components should be on the opposite side as the LED’s, and with the possible need for a MAX7219 (or equivalent) chip along-side the microprocessor, the board will be have to be double-sided.  The segments and points should be close enough to the left and right sides to allow for easy stacking side-by-side for longer displays (think clocks, scoreboards, etc.).  Mounting holes at the top and bottom are acceptable and preferred.  If the usual slant or angle found in most LED/LCD displays push the dimensions outside the realm of possible, then the digits and points should simply be vertical.
The actual data input should be easy and intuitive for the end-user to allow for quick development.  The schema for commands should allow for direct segment control or simple alphanumeric input.  The simplest form would be “AcVV”, where A is an address between 0 and F (up to 16 stacked modules allowed),  c is the command type, and VV would be the values of the left and right digits OR the middle and right sets of points respectively.  In the case of the points, they are turned on or off like flags in a 4-bit number, 0 being all off, F being all on.  For direct segment control, the command would look more like “AsXXXYYY”, as that requires a larger input to accommodate all the bits.  Any invalid or malformed command is simply dropped.

Commands (always lower case):
a = ASCII input (not every character can be displayed in 9 segments, upper and lower case are treated as the same)
d = Digit input (0-9, A-F, upper and lower case are treated as the same)
p = Points (binary flags for each of the 8 points).
s = Segment (raw binary control of the 18 segments)
e = Erase (no other input, just blanks all segments and points, and removes them from memory, the module is still considered "awake")
b = Blank (no other input, just blanks all segments and points, but leaves them in memory until a wake command is received.  Essentially puts module to "sleep".  All future commands will still affect memory, but they will not be displayed)
w = Wake (display current memory to the segments and points)
; = separator, more than one command on a line allowed when separated


“FaWY” would set module 15’s digits to the characters W and Y.  

“4e” would blank the digits and points of module 4

“3pF0” would turn all the points in module 3 in the middle ON and on the right side OFF.

“0d 9;0p06;1d45;1p08” would set the digits on module 0 to blank and 9, and set the points in the middle OFF, and the middle two right-most points ON, and the digits on module 1 to 4 and 5, with the middle points OFF, and the top right-most point ON.  This displays [ 9:45’] (for a clock, 9:45 PM).

For actual segment control, each segment is a bit in a 9-bit number as follows (refer to the diagram at the top):

A=1 (lsb), B=2, C=4, D=8, E=16, F=32, G=64, H=128, I=256 (msb)

Add up the numbers and pass hexadecimal to command “AsXXXYYY” (x = left, y = right). 

This example is open to tweaks for performance in whatever platform it is developed, but any command set used must be fully documented.

To achieve the addressable access, perhaps a 16 position hex encoder or dip switches can be used in a manner that the on-board micro controller reads at boot, but then doesn’t bother to check it again until next boot.
The serial inputs can be copied across the board to the opposite side, so that all modules receive the same data at the same time when connected in a string.  If this will be the problem with the SPI bus, then the micro controller can echo any commands it received to the next module.  This may introduce unwanted propagation delay.  Perhaps a trace on the board can be soldered or unsoldered to select a slower, standard clock rate (or baud), or a much higher but still standard clock rate (or baud).

If the performance allows, and if the costs of the final board are tolerable, this spec can be expanded to 4 digits, with corresponding points.  The inputs to all the commands would be doubled.

I hope this spec is straight-forward enough to get some creative juices flowing.   I look forward to seeing what anyone builds from this, and I am releasing this design spec as OPEN HARDWARE, NO TRADEMARK, NO PATENTS.

I own one of these little guys, but it's on the small side, not able to be stacked, doesn't allow colons AND decimal points, just doesn't cut it.  Don't get me wrong, it works great, but... it's limited. 

Thursday, June 23, 2011

Demise of The Multiple Length String of "a" 's

Ok, well I've discovered an important thing about .NET String objects.  They suck, for the following two reasons:

-They are immutable, meaning every alteration to a string you make produces a NEW COPY of the string in memory.  So if you append 10 items to a string in a loop, you will have 10 progressively longer copies of the string in memory! (in that specific case, use a StringBuilder object instead, much more efficient)
-They are by default Unicode, which is a good thing if you're dealing with web-text, or multiple languages.  Not so good with testing crypto code or passing certain types of passwords.  Use SecureString's instead or just deal with raw byte arrays.  For example, if I build a string of single byte 'a's, in memory that becomes a string of DOUBLE byte 'a's in Unicode.  Boom!  Just that fast I've doubled my memory usage for that one string!

These two reasons drove my testing rig for Skein into using 4GB of memory (that's GIGABYTES, with a G) every time it hit the larger strings.  That was the reason I created a "low memory" test that used shorter and fewer strings for doing full-bodied but not over-burdening line-item tests.

So, what have I learned?  Don't use strings!  At least not for this purpose.

Also, I learned that if my documentation says I do something, I BETTER BE FUCKING WELL DOING THAT THING!  :(  I had listed my encryption functions as using a specific type of padding, when in reality I was doing a completely different type of padding!  Not completely incompatible, but now that I've fixed it, this is a breaking change

Unfortunately this also means this will potentially break my GoogleAuthCLONE if the old DLL is replaced with the new DLL, any old accounts might get blown away.  Such is development....

Friday, June 17, 2011

Return of The Multiple Length String of "a" 's Part II

Ok, much improved.  The whole process of tests now only takes 20 minutes, not 60 (all 966 tests), and the throughput is now:

256bit 778MB takes 44-45 seconds
512bit 778MB takes 44-45 seconds
1024bit 778MB takes 51-52 seconds

This is a DRASTIC improvement, but I still feel like I can do better.  Basically what I did was go back to good ol' C++ routines for the cores of the ThreeFish functions, and unroll EVERYTHING!  Only the MIX and INVMIX functions right now are actual functions, and normally where there would be loops, I actually wrote out the individual steps and removed as many array/pointer look-ups as possible.

With a few compiler tweaks, I might be able to squeeze more performance out.  Also, as I've discovered in testing, my memory usage needs to improve, and I need to make sure that my C++ routines zero out memory properly when they are done with their local buffers.  I've already done some of that in this version (which I will push up to CodePlex later today).

NOTE: (landmine) For whatever reason, my C++ routines will NOT use the intrinsic _rotl64 and _rotr64 functions.  The functions don't even seem to work under Visual Studio 2010 SP1 with .NET 4.  I had to write out the rotates as two shift functions:

*Y = (UInt64)(*Y << N) + (UInt64)(*Y >> (64 - N)); //ROTATE LEFT
*Y = (UInt64)(*Y >> N) + (UInt64)(*Y << (64 - N)); //ROTATE RIGHT

Whether or not the intrinsic functions would give me any kind of speed boost I have no idea.  If I "_forceinline" these functions I might even get a tad more speed, but at 20 minutes for a full battery of tests, I'm going to just push up this version for now and call it a small victory.

[UPDATE:  Ok, _forceinline is a BAD idea, it takes 10 times as long to build, and it actually reduces the performance versus not using it, so don't do _forceinline on the mix functions]

I am consistently getting the same hash results for all my tests, and all my code still passes the NIST Known Answer Tests (much faster might I add), so the outputs that I posted previously should still work for any implementation if you are so inclined to use them.

Saturday, June 11, 2011

Return of The Multiple Length String of "a" 's

After some performance tweaking on the C# compiler, and some unrolling of the core ThreeFish functions (meaning, taking the function calls and actually putting them into the main flow of code, and removing all loops and explicitly iterating through the algorithm).  The code files are HUGE now, but this removes most stack pushes/pulls, and removes a lot of array look-ups.  Also, the testing program was still compiling as "Any CPU" which I've rectified to x64.  I plan on rerunning the full battery of tests with these improvements.  Stay tuned...

Saturday, June 4, 2011

Google does Two-Factor Authentication, and so do I

So, Google has rolled out their version of Two Factor Authentication, requiring you to use your phone to generate a time-based One Time Pad (TOTP) code in order to log into your account.  I like it.  It's good they took the initiative even for free e-mail accounts.  I do have several criticisms, mostly with the draft doc of TOTP, but I will post those another time, as they aren't show-stoppers for me.

The idea is: you can't log into your account without an additional pin or code generated from something that isn't on the login page, it has to come from some other device in your possession, which is unique to you.  This follows the mantra "Something you are [user ID], something you know [password or passphrase], and something you have [your phone, smart-card, or other token generating device]."  This makes the account more secure.  Banks use this same approach to keep online accounts secure, World of WarCraft has its own variant, and there are a few others.

Honestly, I wish more providers did this (I'm looking at YOU FaceBook!).

The Google Authenticator app is meant for Android devices, iDevices (iPhone/iPod etc), and Blackberry devices that have cameras, although you can manually enter the info if the device can't read a QR barcode that is generated when you set up Two-Factor Authentication. 

During activation of the Two-Factor process, an 80-bit random number is created at that time and made part of your account.  This is needed to generate the TOTP codes.  It's this 80-bit key that makes your Authenticator unique. 

<side note>
Google also prevents any other device or service from logging into to any Google service!  Now what?

For every device that you want to authorize, you generate a unique password that is completely random, and Base32 encoded so that all you have to enter are lower case letters and numbers.  This becomes your "password".  So your iDevice now has a different password from your Android, from your Blackberry, from your... whatever other thing logs into Google.  Also if that... thing... gets lost, you can revoke the password for JUST THAT ONE THING!  GENIUS!  Now you don't have to change every password you own if one gets compromised!
</side note>

Sometimes, though, you may want to log in to Google and may not have your [insert device here] handy in order to get your TOTP code.  This is one criticism I have, but it's more of an annoyance.

Enter my GoogleAuthCLONE project for Windows!

Now you can securely have your accounts stored in Windows and generate the codes when you need them without having to reach for your [device].  This can also READ barcode image files (like a bmp, jpg, etc, one barcode per file).  That way if you screen cap the setup process with Google, you can come back to this program and just read the barcode to prevent "fat fingering" your information. 

Also, as you can see in the pic, you can generate barcodes that your [device] can read if it has a camera and the Google Authenticator app.  This way if you have all your accounts in this program, and your [device] meets some terrible event (theft, data wipe, bad custom ROM install, etc.), you can re-enter all your accounts without having to reset every single Google account in the process.  In the case of theft, though, reconfiguring your accounts might be a better course of action.  All accounts are stored behind a good password (as enforced by the program) and encrypted on disk using the ThreeFish algorithm which is part of the Skein hash algorithm

Complexity is enforced by requiring a length of at least 8 characters, 1 number, 1 special character, AND upper and lower case letters.

Get it from the CodePlex page, and if you have problems with it, comment there or below this post.  This is released under the Apache 2.0 License which is spelled out on the CodePlex page.  The original Authenticator program developed by the Google dev team was released under the same license. My work was inspired by their program, but it is not a derived work.

[Disclosure:  I wrote the .NET implementation of Skein/ThreeFish that is being used here but I was not one of the original team members that created Skein and ThreeFish.  I figured this was a good real-world example of its use.  Eventually, I might add Skein to the list of available HMAC algorithms used to generate the TOTP's, which would be, unfortunately, incompatible with the Google version.  I'm also using a 3rd party toolkit for the barcodes.]


Monday, May 30, 2011

Skein vs. The multiple length string of "a" 's

This series of tests took 59 minutes, 4 seconds.  There are 966 tests in total.  My laptop is pretty powerful as far as laptops go, as it is meant to be a portable business desktop (Dell Precision M6400, 64-bit, stats listed below).  Doesn't do so well as a gaming computer, but it does have some graphics power.  But this isn't about games or graphics.  This is about data through-put.  These tests were all pure memory transformations.

Screen cap straight from Computer > Properties
Bit Length State
Bit Length Output
String Length (number of 'a's)
Time (h:mm:ss.fffff)
The Resulting Hash 

Results are Base64 encoded for accuracy and brevity.  These are not declared as ASCII strings, but they are built as such.  There is the possibility that .NET is doing something to them in the background that makes them Unicode.  No tree hashing was used.

The take away from this is:  778MB string in memory, roughly 2 minutes flat for 256-bit state-width regardless of output size, 2 minutes 30 seconds for 512- or 1024-bits regardless of output size.  

Complete results after the cut.  Sorry for the small font size but this blog has a narrow text window.  I encourage all developers also working on this code or the same algorithm from any other language to compare all these results from your own implementations and to comment where there are differences, or if you have timing results for similar tests.

Sunday, May 29, 2011

Hackers strike again, Lockheed Martin latest victim

Image from WikiPedia article linked below.

Ok, so over at RSA back in March, they had a data breach (the original source linked to by HAD had gone missing shortly after HAD reported on it, and it was from an RSA press release, here is the NY Times article).  They weren't sure what was taken but it was related to their SecureID products.

Well the breachers DID apparently take something important because now over at Lockheed Martin, who use the very RSA secure tokens that "may" have been breached, there was another hack.  Coming from BoingBoing I'm waiting to hear from other sources to see what the full extent is, since other government contractors could also be affected, but this does not bode well for other organizations.  I know some large hospitals across the country and the World Bank use similar devices for their remote workers and some partners.

HEY BIG COMPANIES:  When you get breached, even if you think nothing was taken or the stuff was "encrypted", DO SOMETHING ABOUT IT, SOONER RATHER THAN LATER!!!   This is why California enacted laws that say you HAVE to tell people, quickly and in writing, when stuff like this happens with customer or consumer data.

I'm looking at you Sony!

Saturday, May 21, 2011

Extreme Couponing

<rant>Ok, so now that TLC is no longer the "Learning" channel, and has become the "Looney" channel, they have this new show about Food Hoarders Extreme Couponing.

So these ladies clip just about every coupon they can get their hands on, they also have membership cards for their local supermarket, and they plan this out like a military raid almost.  One even has an Excel spreadsheet to make sure every coupon counts ahead of time.  Every product, every sale, every available quantity, they meticulously count out the savings.

It doesn't matter if they already have a billion of a particular product at home already.  If it gives them 6 more for a penny, they will do it. 

This one lady bought $600+ worth of shit personal care products and groceries for $2.41!  The whole of her basement is a stash worth over the cost of a new luxury car (read $30,000+)!

I can understand getting the best deals, and I know groceries are getting expensive with fuel prices going up.  But for fuck's sake, do you really need 140 LARGE bottles of laundry detergent for a family of 4?!?!  Who does that much laundry?!?!   And to buy 6 more because you can get it for 25 cents?  That's clinical obsession, no butts about it. 

If they were donating a lot of this stuff to a food kitchen or a homeless shelter or for displaced individuals from floods or fires, I would line up to pay their 6 dollar (actual retail price $1000+) grocery hauls.  But they don't, they displace all their personal belongings and stuff every crevice of their houses for more shit groceries they already have and don't need.</rant>

Friday, May 20, 2011

SHA-3: Competition, Misnomer

So as I mentioned in my last series of posts, the National Institute of Standards and Technology (NIST) is having a competition to replace the aging Secure Hash Standard (SHA-1/SHA-2).  They are calling it, thus far, SHA-3. 

This would be the third hash standard, as indicated by the name.  At the same time, this version is going to be COMPLETELY different from the previous versions, top to bottom.  I think the name is a little misleading...

When they replaced the Data Encryption Standard with the Advanced Encryption Standard a number of years ago, they changed the name to reflect the fact that the new standard was completely different, and much much better.  I'm of the opinion that whatever they pick (*COUGH*SKEIN*COUGH*) should be called the Advanced Hashing Standard (AHS) to reflect the same.

Makes sense, right?

Skein as a Crypto Hash (part three)


Ok, so I've been gushing about this new wonderful crypto hash function called Skein.  What the hell does this have to do with me?

I've been working in .NET (mostly C# lately) for several years now, and I was always frustrated by the limited scope of the built-in crypto functions.  Particularly since there is ample evidence that MD5 and SHA1 are horribly crippled, the folks at Microsoft refuse to give them up.

Now I know full well that something as standardized as a government mandated suite of tools takes a LONG time to roll out, and adoption is slow and drawn out.  Especially if there is no standardized replacement (which is NOT the case here, SHA2 was supposed to fix that), people are even more reluctant to invest in new hardware/software.  I get that.  But to limit the options to the old and busted stuff and only adopting a very small set of the new hotness, that's just narrow minded.</rant>


I tried a number of years ago to implement many of these algorithms in C++ just to learn how they worked, and to play with them in a sandbox that I controlled.  I learned the hard way that memory management and threat mitigation is not for the faint of heart.

The .NET framework changed the game, and with VB.NET, I was no longer limited to slow, clunky, rapid application development with no meat.  I could incorporate new ideas and found the aforementioned built-in functions allowed me to do more (to a point).  I continued to develop and explore and such, and slowly worked my way into C#. 

I also came across the CryptoGram newsletter during this time, and it kept me up to date on some of the security issues of the day.

That's when I learned about the new SHA-3 competition, and Skein.  And I jumped at the opportunity to work with a new algorithm and really get my hands into it.  I read the white-paper, narrowly avoided a cranium explosion, and dug into the reference code.  Wow...  You haven't lived until you've taken a dive into advanced C++ code written by 7 industry experts by committee. 

Holy crap...

The white paper didn't explain a few things clearly but the code shed some light on most of those (some days I can read machine language better than English).

After a long battle, and comparing my results with the specified samples and a few random samples on the net that people had written in Perl, Java, Python, and others, I finally have something that I think people can actually USE.  And the best part is just about every algorithm submitted, including Skein, is in the public domain, which means they are free for any use you want (assuming of course you don't live in Libya, North Korea, or Iran, Uncle Sam's orders!).  So... yeah.  I'm releasing it into the wild.  :) 

And in case you haven't read between the lines yet, I'm plugging my CodePlex project.  :D  I figured I should at least include the back story, and I needed blog material to get me started.  Sue me.

The other thing is, I think there should be some more test data out there for others to use.  I found it hard to get any samples outside of the obvious tests of the basic functionality.  For instance: The white paper describes how to use Skein as a PRNG, a Key Derivation Function (KDF), and a few notes and how to sign keys, and how to sign messages incorporating the public key used so that the signature and the document cannot be separated.  These are some important uses and functions, but there are no samples.  So I'm releasing my version, and I encourage others to look into expanding the functionality, both for comparison, but also so that this particular algorithm gets more analysis in a variety of languages and situations.

Basically the designers focused on the functionality and parameters spelled out by NIST, and NIST only wanted a HASH function; that's it.  As a result, even though they addressed a number of security issues in the design, issues that REALLY need attention in the industry, they over-designed it for the venue.  I think they knew that, and thus left out any samples that detracted from the matter at hand:  hashing.  Unfortunate, but you also have to figure these guys have day-jobs too. 

There's a LOT of functionality in Skein as a whole, and thus there would be a glut of data to produce (and reproduce with every tweak through the submission process).  So, I figure the community can provide the missing pieces.

I have a nagging suspicion that if NIST does pick Skein, they might knee-cap it just to make it fit the box they wanted, not expand it or let it spill into other boxes.  What I would like to see is Skein make it's way into the hashing standard AND the digital signature standard.

Anyways, that's my take.  Do what you will with it, take it or leave it.  Would also love to hear your comments.

Skein as a Crypto Hash (part two)


The SHA-3 competition that NIST is running is coming to a close next year (2012).  Only 5 candidates remain, and the algorithms that have come forth, both winners and losers, have really changed the conventional thinking on what a secure hash algorithm should look like.  I say should as the field is still pretty young all things considered.  In many instances, like Skein, Grøstl, and some others, the hash algorithm is actually a block cipher that has been modified to be irreversible.

Now, there are many other things that these new algorithms bring to the table, but... to be honest... I'm rather smitten with one of them.  Skein.

Still, I encourage you to look at all of them because these last 5 do have some really interesting things going on.  Just bear in mind all white papers are designed to make your head explode.  I'm still heavily medicated from my brush with this particular white paper...

So, the punch line.  The reason I think this algorithm has a lot of traction is its flexibility.  The designers are all security and data experts and all have been around the things that make these algorithms work, and what makes them break, and the ways people use them incorrectly.

One of the big things about hash algorithms is their use in signatures and certificates, and that they have to be immune to spoofing and forgery.  SHA1 and MD5 are very prone to collisions and length-extension attacks, which make forgery possible.  So the designers of Skein came up with a system that extends on research out there already and the known problems with the older algorithms.  I haven't read the source papers, but they are referenced in the white paper.

Here's how I understand it:  Basically, every block of data should be treated uniquely, and the final output should be processed an extra time, just to be sure the function can't be length-extended.  The algorithm uses flags for each type of input, which can be stacked in one fluid process, rather than having to run through whole process for each piece individually and then set up all over again, like in traditional HMAC mode.  Each block is also processed with a counter, guaranteeing them to be unique to prevent against possible loops, and in cases where the input may contain a lot of repetitive stuff.

The heart of the algorithm lies in the ThreeFish algorithm, also defined in the white paper.  This is a function that uses very simple XOR, non-carry addition, and bit rotation, but it does these things through a permutation of the 64-bit state words (did I mention that Skein was native 64-bit throughout?), a LOT of times (72 times for the 256- and 512-bit flavors, and 80 for the 1024-bit flavor to be exact).  The simplicity and use of basic CPU functions allows for fast throughput, and the large number of rounds provides depth enough to overcome rebound and differential attacks.  The ThreeFish algorithm can actually be used for straight encryption, but it has a feed-forward flag that collapses the data onto itself through an XOR when set for use as a hash-primitive.  

It also has a 128-bit tweak that is processed alongside and independent of the key.  It's this tweak that gives Skein the ability to flag each block and provide the counters.  There's a lot of different input types to the ThreeFish cipher that are passed from the main Skein transformation flow.  I won't go into too much detail here, but there are two in particular that I think that set Skein apart.

One block type is the hash output size embedded in the configuration block.  Yes, Skein can output an arbitrary number of bits, and the length requested is part of the configuration of the transformation UP FRONT because Skein doesn't truncate outputs.  If you have two requests for simple hashes on a block of data, but one request asks for 160 bits (SHA1 replacement for example), and another calls for 161 bits (no idea why, that's a weird number, but follow along), the outputs are COMPLETELY DIFFERENT!  I mean every actual bit. The 160 bit output is not an truncation of the 161 bit output like it would be in just about every other algorithm. 

One of the other types of transform blocks is a Personalization String.  This basically means if you have Skein set up in one place for passwords, and in another place for signatures, and another for simple file hashing, you can distinguish them apart without having to compile separate assemblies.  You could have strings like:


Now even if the same data is passed into each process and the outputs are all the same length, since each is personalized, they are unique.  That way any one system that uses them can't be hijacked to process data for any the of the others.  Security hole, PLUGGED.

Anyways, what does this have to do with me...  I'll tell you tomorrow.


Thursday, May 19, 2011

Skein as a Crypto Hash (part one)

So those in the know in the cryptography world realize that MD5 is dead for anything other than simple file hashes, and SHA1 is not far behind it.  MD5 is broken, and SHA1 is garnering new attacks on a regular basis making it a poorer and poorer choice.  They also perform pretty lousy by today's standards.

NIST realized this and had an open competition a while back (which as of this writing is still in Round 3) and the front runners are looking really good, both in performance and security.  Being a long-time recipient of the Crypto-Gram Newsletter of one Mr. Bruce Schneier (considered the Chuck Norris of the security world), when he announced that he was part of the team that submitted Skein, which is a flexible hash algorithm with a tweakable-block-cipher at its core, my curiosity peaked.

First, a little primer for those not in the know or who haven't had to work with such matters (consider yourself lucky, this field can make even the most paranoid feel unprepared), what does a Cryptographic Hash Algorithm do and why is it important.  *COUGH*   Ok, now that we've covered that, why do we need a new one?

Simply put, computers are becoming faster (if not through base mathematical power or speed, then in the ability to do multiple things at once and to hold more things in memory and do more complex things to that memory).  The developers are getting smarter.  Crypto analysts are also getting smarter.  The ways in which the older algorithms used to scramble data are becoming less secure, not because the algorithms changed;  once they become standardized quite the opposite, they remain perfectly static save for maybe a patch or two.  The way people have looked at the data coming out of the algorithms has changed.

The data and algorithms have been picked over and scrutinized and churned under close watch by mathematicians, cryptographers, and statisticians for many years now.  SHA1, for example has been around since 1993.  It's 2011 (18 years later).  MD5 has been around even longer, and was the basis for SHA1, which improved on MD5 but still suffers from some of the same internal flaws.

With that kind of scrutiny, the cracks and flaws in any crypto system only get wider, not smaller, that's just how it works.  So now that the first real government-standard algorithm has aged to its breaking point, it needs a successor.

They did try with the SHA2 family 10 years back (and there's talk of even more variants of the SHA2 family to make them direct drop-ins for SHA1 to speed up the adoption rate).  These use more data and change the structure of the algorithm, but that only goes so far, and they still perform pretty slowly for today's needs.  That's where SHA-3 comes in.


The Obligatory Welcome Post

Hi.   Ummm... I'm Dustin, aaaand... this is my blog.  Or at least it will be, although by the time you read this it could already be... or be abandoned (gosh that would be awful...)

Anyways, I will ramble at length or at brief about various technology topics at random and at will.  There will be NO regular schedule and few filters.  If you don't like the words [REDACTED] or [REDACTED] or [THIS ONE WAS PARTICULARLY BAD] or [NOT SURE ON THIS ONE BUT BETTER TO BE SAFE], then I implore you to grow up and read it anyways.

If you don't understand the reference in the title... then you've obviously never heard of WikiPedia, go read it now and when you come back we'll both have a laugh, together.

You can also follow my madness on Twitter.