Home » Is my developer’s home-brew password security right or wrong, and why?

Is my developer’s home-brew password security right or wrong, and why?


/** Dave's Home-brew Hash^H^H^H^H^Hkinda stupid algorithm */

// user data
$password = '';

// timestamp, "random" #
$time = date('mdYHis'); // known to attackers - totally pointless
// ^ also, as jdm pointed out in the comments, this changes daily. looks broken!

// different hashes for different days? huh? or is this stored as a salt?
$rand = mt_rand().'n'; // mt_rand is not secure as a random number generator
// ^ it's even less secure if you only ask for a single 31-bit number. and why the n?

// crypt is good if configured/salted correctly
// ... except you've used crypt on the username? WTF.
$crypt = crypt($user.$time.$rand); 

// hash
function hash_it($string1, $string2) {
    $pass = md5($string1); // why are we MD5'ing the same pass when crypt is available?
    $nt = substr($pass,0,8); // <--- BAD BAD BAD - why shuffle an MD5 hash?!?!?
    $th = substr($pass,8,8);
    $ur = substr($pass,16,8);
    $ps = substr($pass,24,8); // seriously. I have no idea. why?
    // ^ shuffling brings _zero_ extra security. it makes _zero_ sense to do this.
    // also, what's up with the variable names?

    // and now we SHA1 it with other junk too? wtf?
    return $hash

$hash = hash_it($password, $crypt); // ... please stop. it's hurting my head.


Dave, you are not a cryptographer. Stop it.

This home-brew method offers no real resistance against brute force attacks, and gives a false impression of “complicated” security. In reality you’re doing little more than sha1(md5(pass) + salt) with a possibly-broken and overly complicated hash. You seem to be under the illusion that complicated code gives better security, but it doesn’t. A strong cryptosystem is strong regardless of whether the algorithm is known to an attacker – this fact is known as Kerckhoff’s principle. I realise that it’s fun to re-invent the wheel and do it all your own way, but you’re writing code that’s going into a business-critical application, which is going to have to protect customer credentials. You have a responsibility to do it correctly.

Stick to tried and tested key derivation algorithms like PBKDF2 or bcrypt, which have undergone years of in-depth analysis and scrutiny from a wide range of professional and hobbyist cryptographers.

If you’d like a good schooling on proper password storage, check out these links:

  • How to store salt?
  • Do any security experts recommend bcrypt for password storage?
  • How to securely store passwords?

Advantages of a public protocol:

  • Probably written by smarter people than you
  • Tested by a lot more people (probably some of them smarter than you)
  • Reviewed by a lot more people (probably some of them smarter than you), often has mathematical proof
  • Improved by a lot more people (probably some of them smarter than you)
  • At the moment just one of those thousands of people finds a flaw, a lot of people start fixing it

Disadvantages of a public protocol:

  • If indeed a flaw is found,
  • AND made public
  • AND not fixed fast enough

then attackers will start searching for vulnerable sites/applications. BUT:

  • They’ll probably go after more important targets first
  • When the flaw is exposed, you know and can lock-down
  • Not all flaws are “critical” (most are like “collisions are possible under certain conditions” or “the time to bruteforce is not 1012 years, but 106 years”)

Security by obscurity is deemed outright bad. Inventing your own falls under that category.

If Dave is really “your” developer, as in you have the authority to fire him, then you have the authority to direct him to use a more well-documented scheme, and you should.

In cryptography, the fewer secrets that are required to be kept, the better. This applies especially to “hard-coded” secrets, such as the hash function itself, which are not secrets as soon as the code containing the secret leaves your building.

This is why open standards for cryptography are a good thing; if the scheme has been around for years or even decades, with the basic algorithm and various implementations publicly known, and still nobody’s found a way to get in, it’s a good scheme.

Case in point, Polynomial provided a very good breakdown of what’s wrong with the scheme, but here are the major failings of the proposed hash algorithm:

  • MD5 is completely broken; While a preimage attack isn’t made much easier by the primary ways that MD5 is broken (it’s extremely vulnerable to known-plaintext strong collisions; knowing the message and the digest, one can produce a collision in 2^24 time), the hashing algorithm is way too fast to be secure against modern distributed cracking hardware. It should never be used in applications requiring data security.

  • SHA-1 is considered vulnerable; Again, there’s not an elegant vector for a preimage attack, but there is a strong collision attack (2^61 time for a 160-bit hash), and the hashing algorithm is fast enough and the keyspace small enough that current hardware can feasibly brute-force it.

  • More hashes don’t necessarily mean better hashing. Hashes are a deterministic process; they stand in for a “random oracle” in theoretical cryptography, but since they must always produce the same output given the same input, they cannot add any entropy to what is already inherent in the input.

    Stated simply, using a secret combination of multiple hashes as Dave is doing, even if that combination is unknown, doesn’t add a significant amount of work to a brute-force (the relative order of three hashes has 6 possibilities, increasing the complexity of trying all possibilities by less than 3 powers of 2), and that extra complexity disappears as soon as an attacker can decompile your code and discover the relative order of the hash functions.

  • Passwords are inherently low-entropy. The best “user-consumable” (non-gibberish) passwords max out in complexity at about 50 bits of entropy, meaning they can be cracked, if you have some idea of what you’re looking for, in 2^50 or better time. That makes passwords easier to crack than other things you’d normally hash (like certificate digests), requiring additional “proof of work” to increase their security.

  • This scheme is not adding any significant proof of work; three hashes instead of one, in the grand scheme of things, adds the equivalent of about 1.25 bits of entropy to the scheme in extra required work; you could do better by just requiring passwords to be one character longer.

BCrypt, in contrast to all of the above, has as its main advantage an extremely slow key derivation function or KDF. Blowfish, the encryption cipher that forms the basis of BCrypt, uses an “SBox”; a large array of predetermined initial values, which is modified by the key and/or IV to produce the initial state of the cipher through which the plaintext is fed. In BCrypt, this SBox setup is made more complex by taking the smaller password and salt values and running them through the “unkeyed” cipher a predetermined number of times, “warming up” the cipher into a state that theoretically could only be produced by performing the same number of iterations of setup, using the same password and salt. This “warmed-up” cipher is then fed a constant plaintext to produce the hash value. In CS terms, the hash forms a “proof of work”; a computational task that is difficult (time- and/or resource-consuming) to perform, but easy to check for correctness (does the produced hash match the one we already have?).

That “predetermined number” of iterations of SBox setup is configurable; this is BCrypt’s real power. A number of “rounds” of key setup are specified when hashing, and for n rounds, 2^n iterations of key setup are performed. That means that incrementing the number of rounds makes the hash perform twice as much work and take twice as long on the same hardware to produce the requisite proof of work. BCrypt therefore can easily keep up with Moore’s Law, which has been the bane of many hash functions that came before.

BCrypt’s main theoretical weakness is its low memory usage; while computation time can be exponentially increased, the amount of memory required remains effectively constant (the SBox doesn’t get any bigger as the iterations increase and no “intermediate results” are required to be kept around). As such, while BCrypt stymies the pure pipelined computational power of a GPU, it remains vulnerable to cracking by more sophisticated “field-programmable gate arrays” (basically a massive, highly modular set of “soft-wired” logic units, that are the current state-of-the-art in highly distributed computing), because the low memory constraints mean you can just throw more relatively-inexpensive circuit boards at the problem.

A newer algorithm, SCrypt, combats FPGA crackers by exponentially increasing the memory demands as well as the computational expense of calculating the hash, so the limited memory available to each FPGA quickly makes them infeasible as well (distributed cracking would basically require large numbers of CPU/FSB/RAM combinations, essentially full computers, hooked together). The only problem there is that SCrypt is only 2 or 3 years old, so it doesn’t have the cryptanalytic pedigree of BCrypt (13 years old). And really, if you have to worry about an FPGA-armed cracker getting a password in a feasible amount of time (like before you can change all of them anyway), you have pissed off some very powerful people, and have already exposed another serious vulnerability (allowing an attacker to obtain your stored password hashes in the first place, so they can crack one “offline”).

Bottom line, use BCrypt for password hashing. It’s 13 years old, based on a 20-year-old cipher algorithm, an in that time no known vulnerabilities have been found that would aid a password cracker. It’s slow as molasses, and can be configured to always be so, provided you combine it with a requirement for users to change their password every 90 days or so, so new passwords keep getting hashed with more expensive configurations.

Related Solutions

Custom query with Castle ActiveRecord

In this case what you want is HqlBasedQuery. Your query will be a projection, so what you'll get back will be an ArrayList of tuples containing the results (the content of each element of the ArrayList will depend on the query, but for more than one value will...

What is the “You have new mail” message in Linux/UNIX?

Where is this mail? It's likely to be in the spool file: /var/mail/$USER or /var/spool/mail/$USER are the most common locations on Linux and BSD. (Other locations are possible – check if $MAIL is set – but by default, the system only informs you about...

How can I find the implementations of Linux kernel system calls?

System calls aren't handled like regular function calls. It takes special code to make the transition from user space to kernel space, basically a bit of inline assembly code injected into your program at the call site. The kernel side code that "catches" the...

Is a composite index also good for queries on the first field?

It certainly is. We discussed that in great detail under this related question: Working of indexes in PostgreSQL Space is allocated in multiples of MAXALIGN, which is typically 8 bytes on a 64-bit OS or (much less common) 4 bytes on a 32-bit OS. If you are not...

Explaining computational complexity theory

Hoooo, doctoral comp flashback. Okay, here goes. We start with the idea of a decision problem, a problem for which an algorithm can always answer "yes" or "no." We also need the idea of two models of computer (Turing machine, really): deterministic and...

Building a multi-level menu for umbraco

First off, no need pass the a parent parameter around. The context will transport this information. Here is the XSL stylesheet that should solve your problem: <!-- update this variable on how deep your menu should be --> <xsl:variable...

How to generate a random string?

My favorite way to do it is by using /dev/urandom together with tr to delete unwanted characters. For instance, to get only digits and letters: tr -dc A-Za-z0-9 </dev/urandom | head -c 13 ; echo '' Alternatively, to include more characters from the OWASP...

How to copy a file from a remote server to a local machine?

The syntax for scp is: If you are on the computer from which you want to send file to a remote computer: scp /file/to/send username@remote:/where/to/put Here the remote can be a FQDN or an IP address. On the other hand if you are on the computer wanting to...

What is the difference between curl and wget?

The main differences are: wget's major strong side compared to curl is its ability to download recursively. wget is command line only. There's no lib or anything, but curl's features are powered by libcurl. curl supports FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP,...

Using ‘sed’ to find and replace [duplicate]

sed is the stream editor, in that you can use | (pipe) to send standard streams (STDIN and STDOUT specifically) through sed and alter them programmatically on the fly, making it a handy tool in the Unix philosophy tradition; but can edit files directly, too,...

How do I loop through only directories in bash?

You can specify a slash at the end to match only directories: for d in */ ; do echo "$d" done If you want to exclude symlinks, use a test to continue the loop if the current entry is a link. You need to remove the trailing slash from the name in order for -L to...

How to clear journalctl

The self maintenance method is to vacuum the logs by size or time. Retain only the past two days: journalctl --vacuum-time=2d Retain only the past 500 MB: journalctl --vacuum-size=500M man journalctl for more information. You don't typically clear the journal...

How can I run a command which will survive terminal close?

One of the following 2 should work: $ nohup redshift & or $ redshift & $ disown See the following for a bit more information on how this works: man nohup help disown Difference between nohup, disown and & (be sure to read the comments too) If your...

Get exit status of process that’s piped to another

bash and zsh have an array variable that holds the exit status of each element (command) of the last pipeline executed by the shell. If you are using bash, the array is called PIPESTATUS (case matters!) and the array indicies start at zero: $ false | true $...

Execute vs Read bit. How do directory permissions in Linux work?

When applying permissions to directories on Linux, the permission bits have different meanings than on regular files. The read bit (r) allows the affected user to list the files within the directory The write bit (w) allows the affected user to create, rename,...

What are the pros and cons of Vim and Emacs? [closed]

I use both, although if I had to choose one, I know which one I would pick. Still, I'll try to make an objective comparison on a few issues. Available everywhere? If you're a professional system administrator who works with Unix systems, or a power user on...