Post-talk note: Press 's', open up the popup, press the largest image and then use the arrow keys to move through the slides
I gave this talk about a month ago. I managed to do 20 slides and 20 lyrics in 5 minutes. I was shooting for 40 slides and 40 song lyrics in 10 minutes, which was a bit extreme. BUT I DID IT -- 43 EACH. So I'll be moving quickly. “this time, I’m telling you…”
Also, side note that unless you are like me and love T-Swift AND checksums, you'll probably be at least 50% confused during this whole thing. But it'll be fun.
Hello.
"Nice to meet you. Where’ve you been? I can show you incredible things."
I'm using Taylor Swift to explain what's the the big deal behind cryptographic hashing functions. Because she's nice.
Don't know about you... but I'm feeling like SHA-2!
Checksumming with Taylor Swift
This talk is "Don’t Know About You… But I’m feeling like Sha-2!"
Hello.
"Um, you don’t know about me? But I bet you’d like to."
And I wasn't kidding about the amount of Taylor Swift you're gonna get in the next ten minutes.
Bro...
CHECKSUMMING! "It's difficult but it's real." So what do people mean when they are like "Bro, are you even salting that hash?"
"Magic, madness, heaven, sin." Cryptography is serious business, but I think we can get to a common baseline of understanding fairly easily. And T-Swift helps. "This is the golden age!"
So what is a checksum? "And I know everything about you. I don't wanna live without you."
IT'S JUST A STRING OF NONSENSE. That's all it is. "It's beautiful, wonderful, don't you ever change." And as long as this string is the same right now as it is when you check up on it later, you're good.
Except, y'know, like, (malicious) hacking. "Haters gonna hate."
"Next chapter." (Thinly veiled lyric reference)
First, let's talk about some concepts.
Security-wise, checksums are used to prevent a man-in-the-middle attack by using this Taylor Swift method of breakup success. I made a diagram! To represent this better. So, you can see here... encrypted passwords or other sensitive information can pass through a network to another location without that information being compromised. “You go talk to your friends talk to my friends talk to me” Cuz we certainly aren't talking, right? WE BROKE UP!
The way this is successful is if information is passed through other channels but never directly, it doesn't matter if hackers are "standing by you, waiting at your back door," your data will still be safe because plaintext information is never transferred directly, and that obfuscation and abstraction happens just like this. Except instead of your friends and my friends, its checksums and salting.
“So many walls, I can't break thru.”
Salting
I mentioned Salting. Salt is a random string, which is linked to existing password hashes and then hashed again for extra layers of protection. Like friends protecting you from bad exes. The salt value and resulting hash is then stored in the database. You are able to encrypt that information, without each other, it cannot be un-encrypted. So you can "shake it off."
Collision Attacks
A collision attack attempts to find two arbitrary outputs which produce the same hash value -- hence, a collision. "A simple complication, miscommunication leads to fallout..." (Two files existing with the same value). There's a vulnerability here, if someone can fake your checksum, they can get to the information they want. Like if Drew looks at you and you "fake a smile so he won't see." Or maybe moreso, someone can inject bad information into your files...
"'Cause when you're fifteen and somebody tells you they love you, you're gonna believe them." And when that SSL certificate reads green, you want to believe it. But you have to watch out for security leaks all the time. Like heartbreakers.
BROKEN
Security-wise, these things don't last "forever & always." So you hear, oh, “_this format_ is broken.” MD5, SHA-1. What does that mean? It's because the technology that keeps these file fingerprints secure can be corrupted if the algorithm can be decrypted by computers. “This slope is treacherous, this path is dangerous”
1. It's the ability to break the code at all, and 2. as well as the amount of time and money necessary to cause that break to happen. If an algorithm would take modern computers years to decrypt, it's still pretty secure. It's "my Achilles heel." It follows a sort of Moore's Law pattern: it gets faster and cheaper every day. Info security is a moving target. "These walls that they put up will fall down."
So when you're shipping these things to production or investigating what might be a good fit for you, maybe you were thinking they "were forever ever and I used to say, "Never say never...""
But these things get busted every day.
"And I'm like... "I just... I mean this is exhausting, you know?""
Algorithms!
About some of the algorithms...
CRC
CRC! This stands for Cycle Redundancy Check. "We were both young when I first saw you." The CRC was invented by W. Wesley Peterson way back in 1961; CRC32 is the work of several researchers and was published in 1975. “32, and still growing up now.” It's a lot older and more basic of a process, so obviously there are lots of limitations here... not good for security, can be faked easily. But it is used in file format verification, Matroska files being an example of this, using a self-check mechanism.
SHA
So we have SHA. SHA stands for Secure Hash Algorithm. It was developed by the National Institute of Standards and Technology (NIST) as a U.S. Federal Information Processing Standard (FIPS).
SHA is very common, but it gets a little complicated because there are different numbers meaning different things. "It's a rollercoaster kind of rush."
The lower, less secure versions (SHA-1) are used to generate small unique identifiers, like git commits. Git uses SHA-1 so you have unique commits.
More complex algorithms are used in elsewhere.
HTTPS used to use SHA-1, but that's been broken for a while. "All this time, how could you not know, baby?"
It now uses SHA-2 (That's the SHA-256 version, not the SHA-512 version -- the algorithm is basically the same but there are different version) to determine if you are legitimately on the website you intend to be on via the SSL certificate.
MD5
Then we have MD5... we see MD5 a lot in archives, at least in my experience. "You look like my next mistake!" BitTorrent files also use MD5 for fixity and file integrity. "I think I know where you belong, I think I know it's with me."
MD5 stands for Message Digest algorithm 5 (y'know, because computer scientists are great at naming things) and was invented by Ronald Rivest in 1991 to replace the old MD4 standard (you can guess what MD4 stood for). It's broken, though. Sorry. Busted.
So I work in the archives field, where, for the most part, usually security isn't an issue and it's okay if checksums are "broken" from a security standpoint as long as they work from a fixity standpoint. "Nothing's gonna change, not for me and you, not before I knew how much I had to lose"
Ummm...file integrity, file location authentication -- are the files there and are they what they used to be?. "Hey, isn't this easy?"
You can use checksums to abstract larger files away for quicker access and validation. “When you said you needed space, what?”
If you're thinking about checksums because of security, you gotta go in "head first, fearless."
The core principal still stands regardless of whether your work in information security or in digital preservation though: you want the assurance that a file has not changed in any way, even ways that seem invisible. "At least, that's what people say."
And that's why this stuff is so important. From a digital preservation stance, you wanna "back up, baby, back up."
But for security, things are more like "its 2am and I'm cursing your name." and you end up spending more time "dreaming, instead of sleeping." Or maybe wary of people that do so.
So we aren't "out of the woods yet." I don't have time to get into all the details and problems with checksum, but you could say it's "trouble trouble trouble". Or maybe "it's like trying to solve a crossword puzzle but there's no right answer".
Sorry, anything related to cryptography is incredibly complex mathematically and should be left to experts. It's complicated stuff! "Is this is my head? I don't know what to think!" But while you're already out there... "The battle's in your hands now".