Allow me to begin with the literal definition of a password. A password is a string you use for validation in order to access a given resource (maybe a database or a physical facility). It is at the core of major security infrastructures and is generally used to thwart (reject) unauthorized access to particular resources. Due to revolutions in technology, however, authorized, but illegal, access to your resources can still be done by use of the same passwords which are often obtained illegally through hacking techniques such as SQL injection. A major step can be made by the security administrators (maybe a website admin) to prevent misuse of passwords even if accessed illegally. In this article, I will discuss this major step, which is password hashing. By the end of this article, it is the author’s wish that you appreciate this technology.
How websites store data
When you create an account on a website say Talwork, the website stores your login credentials (username, email address, and password) on its SQL databases. A database is just a collection of data in an organized manner, while SQL, pronounced as ‘sequel’, is an acronym for Structured Query Language, which is a programming language for managing databases. In a typical/naive world, you would expect the passwords to be stored in the database in their plain form e.g. if your password is “RoxiMaxi56mus”, then you would expect it to be stored as “RoxiMaxi56mus” in the database.
Problems do exist with this method of password storage. Due to reasons stated below, this method of password storage deserves mockery.
- Your password, especially those used to access very private/sensitive resources such as bank accounts, need only be known by you and not even by the website administrator (who often has access to the database). Truth has it that, you do not know who the admin is when they are alone. Ethics has it that they have to protect the privacy of the customer credentials, but some web admins are the same hackers who pounce on unaware customers by simply looking at and using the customer’s credentials.
- For security reasons, or in the case of a hacking activity to the website resulting to database leakages, customer login credentials may be exposed to the public. If the password were stored as plain as they were entered, then this may lead to illegally authorized accesses to customers’ accounts since the passwords are publicly visible to anyone who navigates to the database dumps.
- Most customers use the same password for their multiple accounts e.g. the same password for Facebook, their computers, gambling sites and maybe their bank accounts. This is a very unethical practice considering that if a password to one of the accounts say Facebook, is known to your friend, then the same can be used to access your computer, bank a/c, or your favourite betting site e.g. Sportspesa. If your Facebook account is hacked and your plain text password leaked, then you are only at the mercy of the hacker who can decide to hack your multiple accounts only stopping when satisfied.
Due to the reasons stated above and for some other unknown to me, password storage as a plaintext is a malpractice that should be avoided by resource/security admins. To help avoid these problems that may arise, hashing technology should be used.
What is hashing?
Consider the following cases:
Mixing two colours is easy while finding out the constituent colours of the colour mixture isn't quite that easy. Multiplying two large (prime) numbers is easy too, but given a huge prime number, it isn't easy to find the two prime factors which produce the result when multiplied. This basic idea, of easy in one direction, and difficult in the reverse direction is what hashing is all about. Think about how this can save you in case of password leakages; it is very simple to hash your passwords from a plaintext into a hash text but is extremely difficult to obtain a plain text from a hash text, hence if a hacker possess hashed passwords, then these can do for them very little unless they have the resources to crack the hash. This is simply security.
So here is how it is done. Hashing is achieved through a hashing algorithm as shown below.
As can be seen above, a hashing algorithm consists of just a hashing function that maps/converts an input plaintext into a hash text, otherwise known as a hash sum. Hashing is generally used to confirm that data (stored or in transit) has not been modified in any essence. To state it plainly, a given file e.g. a word document or a program like a virus, if hashed, generates a unique hash sum. If this file, like the word document, is modified, for example, a full stop is added to or removed from it, it will generate a different hash when run through the same hash function. If no modification is made to the document, then the same hash sum will be generated hence confirming that no tempering was made. Various hashing algorithms exist but only a few are very popular. The most commonly used ones are like the MD5 and the SHA.
For you to appreciate what hashing is, simply create a small text document using any text editor or word processor of your choice. Navigate to this website or download the hash calculator. Upload your document into either of your programs (the online version or the offline hash calculator). Hash it using any of the algorithms available, I prefer MD5 or SHA512. Check out the string output beside the algorithm used. Copy paste this string somewhere. Modify the document in any way, you can add or remove a words, punctuation, or a sentence. Save the changes and then upload the modified document again. Redo the hashing and then copy paste the output hash sum. Compare this hash sum against the previous one. They should be different. This means your document was modified. Try rehashing the same document without modifying it, it should produce the same hash sum hence your document was not modified/tampered with.
With this in place, you no longer have to store your passwords in plaintext, rather this is what ought to happen. (Serious web admins/ those security conscious have implemented it). When you are creating an account, the website should generate a hash sum for your password. It is only this hash sum that is stored in the database as your password while your plaintext password is discarded by the site. By doing this, if your password is leaked in a major hack, you are at a minimal risk since it is highly unlikely that the hackers will obtain the plaintext version of your hash sum password. Even the admins stand to gain nothing in using or sharing your hash sum. What do you think will happen if someone knows your hash and attempts to use it to log in (since it is this hash sum that will be used to compare against the one stored in the database to validate a login, mind you they are the same)? Well if you use the hash sum to log in, it will still pass through the same hash function on the website, and hence a different hash sum will be generated. This time the hash used to log in is just treated as a normal text. So how does the website check if you are typing the correct password during login if the site itself should not know you password? The answer is simple, for every password entered, a hash sum is generated, and this is compared to the hash stored in the database. If similar, then a validated log in, else log in failed.
Ever wondered why almost all websites give you a new password when you forget your old one, instead of just telling you your password. Well, now you know, it turns out that they themselves don't know your password, and hence can't tell you. When they offer you a chance to change your password, they just change the corresponding hash in their tables, and now your new password works. Cool, isn’t it?
Can you crack my hash sum?
Yes, I can. Depending on the complexity of your password, a relatively complex hash sum will be generated. The only known technique for cracking a hash sum is the brute force attack. Dictionary attacks are also employed.
Brute force technique.
Basically, suppose someone had the password "pass". Now, a hacker who only has access to the hashes can hash all the passwords in alphabetical order and then check which hash matches. (Assume hacker knows that the password has length of four characters and only alphabets). He tries 'aaaa','aaab', 'aaac',...'aaba', 'aabb' ,'aabc',.....'aazz' , 'abaa', ... 'paaa','paab',.. ,'pass'. When he tries 'aaaa', the hash is not d@A2qAawqq21109, it is something else. Till he reaches 'pass', he gets a hash which doesn't match d@A2qAawqq21109. But for 'pass', the hash matches. So, the hacker now knows your password. It should be very clear too you now that, the stronger the password, the more difficult the hash sum is to break.
Brute force techniques have proved to be working, though they may take even millions of years depending on the computing power of the hacker’s machine and/or the strength of the password which creates a strong hash sum. I will soon release an article on password strength in which I will discuss the dictionary attack. In that article, I will also discuss the various strengthening mechanisms for the hashing technology, such as salting. I will also give a lead on how to implement hashing in your website. Have a nice read and leave a comment below.