[148914] in cryptography@c2.net mail archive
[Cryptography] I posted a memory-hard key stretching algorithm
daemon@ATHENA.MIT.EDU (Arnold Reinhold)
Fri Jan 3 17:47:32 2014
X-Original-To: cryptography@metzdowd.com
From: Arnold Reinhold <agr@me.com>
Date: Fri, 03 Jan 2014 15:40:31 -0500
To: Bill Cox <waywardgeek@gmail.com>
Cc: "cryptography@metzdowd.com" <cryptography@metzdowd.com>
Errors-To: cryptography-bounces+crypto.discuss=bloom-picayune.mit.edu@metzdowd.com
--===============2276496252726741399==
Content-type: multipart/alternative;
boundary="Apple-Mail=_077D5E54-E01C-429A-8263-6C8A295F5935"
--Apple-Mail=_077D5E54-E01C-429A-8263-6C8A295F5935
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
On 30 Dec 2013 15:05 Bill Cox wrote:
> It's at:
>=20
> https://github.com/waywardgeek/keystretch
>=20
> If this algorithm isn't too lame, I'll enter it in the password =
hashing
> competition in January. There isn't much time for feedback or code
> development, so if you're interested in these algorithms, please let =
me
> know your thoughts on this one. Essentially, I've upped the =
pre-hashing of
> the password to 4096 SHA-256 rounds, and replaced the memory hashing
> function of scrypt, Salsa20/8, with a simple hack that seems to run 8X
> faster while being unpredictable enough.
>=20
> The only other entry I've read about so far is based on Blake2, which =
is a
> nice improvement over Salsa20, I think, but like scrypt, it spends =
most of
> it's time hashing rather than filling the memory bandwidth. I'm not =
sure a
> cryptographically strong hash is called for, so I'm suggesting using a
> simpler hash that seems to work well enough. Any thoughts welcome.
Sounds promising. A couple of suggestions:
1. Consider replacing your SHA-256 rounds with the SIMD-hash from the =
SHA-3 competition. SIMD made it to round 2, but was eliminated from the =
finals because it was the most expensive to implement in hardware. Small =
hardware area was a positive criteria in the SHA-3 competition, but it =
should be considered a negative for KDFs. Why not benefit from all the =
evaluation work done for SHA-3 candidates? See, e.g. "Comparing Hardware =
Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs" Ekawat =
Homsirikamol, Marcin Rogawski, Kris Gaj. =
http://eprint.iacr.org/2010/445.pdf=20
=E2=80=8EIn particular Table 4.15 which compares the 14 Round Two =
candidates throughput-to-area ratio with SHA-512 shows SIMD being the =
lowest of the 14 hashes, which is the best outcome when we are designing =
a KBF. The gain in software vs hardware difficulty is around a factor =
of 11 (3.5 bits), not insignificant for a simple software change. I do =
not know how the 64-bit SIMD calculations play out in GPGPUs, but there =
should be some gain there as well. The ECHO hash was a close second. =
SIMD-256 is a little faster than SHA-256 in software, so your 4096 round =
count could b e increased if desired (perhaps the count should be a =
parameter). I also suggest going to a 512 bit hash since it requires =
more area. Why not? Finally, you might start with one round of SHA-2 or =
SHA-3, just to be able to say that any pre-image attack is blocked by a =
generally accepted standard hash. =20
(I'd been thinking of offering PBKDF2-SIMD-512 in the KDF competition.)
2. I also thought of an intermediate approach to the question of scrypt =
revealing password information via memory side channels. It=E2=80=99s =
not an obvious big win, but it might be useful in some situations:=20
Instead of using P, the password, in step 1 of scrypt, use:
Q =3D Hash (P, Salt) modulo m
(If m=3D2^k, this just means taking the k low order bits of the hash.)
The parameter m limits how much information about P is recovered if some =
of the scrypt memory is compromised. If m is low, and Hash is fast, an =
attacker can devote one processor to each possible value of Q, having =
each processor calculate Q for every trial password and only proceeding =
if the Q is its assigned value. Each processor will only have to fill =
memory once. If Hash is resource consuming, e.g. PBKDF2-4096 as you =
propose, then the processors will have to expend m-times the work of the =
defender to screen trial passwords. The attacker can get around this by =
screening trial passwords centrally and only passing the ones with a =
given Q to the processor handling that Q, but this entails a lot more =
interprocessor communication. As m gets larger, each processor will have =
to handle more than one Q, though they can do so one at a time. If m is =
big enough, the attacker is forced to do a lot of memory exercise. So =
there is a tradeoff here between password information inserted in step 1 =
and resulting difficulty for the brute-force attacker.=20
3. I think it is helpful to distinguish two use cases for KDFs: =
protecting login credentials and protecting cryptographic keys. In the =
case of a cryptographic key, the attacker who can glean KDF memory =
information most likely has easier access to some ciphertext generated =
with the key. Examples include whole disk encryption of a seized laptop =
or attacking WPA2 wireless encryption. The ciphertext itself is usually =
sufficient information to mount an off-line attack. So the gain from =
getting hold of the KDF memory, particularly after the algorithm is =
complete or half complete, is low. On the other hand, login credentials =
can be further protected by limiting the number or rate of failed =
attempts. So any leaked information that allows an off-line attack is a =
big improvement for an attacker. For login credentials, it may be worth =
throttling the amount of information an attacker get from a memory =
compromise as described above, even if it means less KDF gain. Of course =
with common weak passwords, even 20 bits of leakage can be bad, but I =
see no reason to put the full entropy of a strong login password into =
step 1. For cryptographic keys, any reduction in KDF gain may not be not =
worth it.
4. The SIMD-hash needs SIMD CPU instructions for full throughput, but =
these are available on modern x86 and ARM architectures. In the case of =
protecting cryptographic keys, there is less need for interoperability. =
If I am protecting my laptop's hard drive, I may as well use all the =
silicon in my CPU and not worry if the same algorithm will work =
efficiently on another architecture. We needn't design KDFs to the =
lowest common denominator.
Arnold Reinhold
=09
--Apple-Mail=_077D5E54-E01C-429A-8263-6C8A295F5935
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8
<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">On 30 =
Dec 2013 15:05 Bill Cox wrote:<br><br><blockquote type=3D"cite">It's =
at:<br><br><a =
href=3D"https://github.com/waywardgeek/keystretch">https://github.com/wayw=
ardgeek/keystretch</a><br><br>If this algorithm isn't too lame, I'll =
enter it in the password hashing<br>competition in January. There =
isn't much time for feedback or code<br>development, so if you're =
interested in these algorithms, please let me<br>know your thoughts on =
this one. Essentially, I've upped the pre-hashing of<br>the =
password to 4096 SHA-256 rounds, and replaced the memory =
hashing<br>function of scrypt, Salsa20/8, with a simple hack that seems =
to run 8X<br>faster while being unpredictable enough.<br><br>The only =
other entry I've read about so far is based on Blake2, which is =
a<br>nice improvement over Salsa20, I think, but like scrypt, it spends =
most of<br>it's time hashing rather than filling the memory bandwidth. =
I'm not sure a<br>cryptographically strong hash is called for, so =
I'm suggesting using a<br>simpler hash that seems to work well enough. =
Any thoughts welcome.</blockquote><div><br></div>Sounds promising. =
A couple of suggestions:<div><br></div><div>1. Consider replacing your =
SHA-256 rounds with the SIMD-hash from the SHA-3 competition. SIMD made =
it to round 2, but was eliminated from the finals because it was the =
most expensive to implement in hardware. Small hardware area was a =
positive criteria in the SHA-3 competition, but it should be considered =
a negative for KDFs. Why not benefit from all the evaluation work done =
for SHA-3 candidates? See, e.g. "Comparing Hardware Performance =
of Fourteen Round Two SHA-3 Candidates Using FPGAs" =
Ekawat Homsirikamol, Marcin Rogawski, Kris Gaj. <a =
href=3D"http://eprint.iacr.org/2010/445.pdf">http://eprint.iacr.org/2010/4=
45.pdf</a> </div><div><br></div><div>=E2=80=8EIn particular Table =
4.15 which compares the 14 Round Two candidates throughput-to-area ratio =
with SHA-512 shows SIMD being the lowest of the 14 hashes, which =
is the best outcome when we are designing a KBF. The gain in =
software vs hardware difficulty is around a factor of 11 (3.5 bits), not =
insignificant for a simple software change. I do not know how the =
64-bit SIMD calculations play out in GPGPUs, but there should be some =
gain there as well. The ECHO hash was a close second. SIMD-256 is a =
little faster than SHA-256 in software, so your 4096 round count could b =
e increased if desired (perhaps the count should be a parameter). I also =
suggest going to a 512 bit hash since it requires more area. Why not? =
Finally, you might start with one round of SHA-2 or SHA-3, just to be =
able to say that any pre-image attack is blocked by a generally accepted =
standard hash. </div><div><br></div><div>(I'd been thinking of =
offering PBKDF2-SIMD-512 in the KDF =
competition.)</div><div><br></div><div><div style=3D"margin: 0px;">2. I =
also thought of an intermediate approach to the question of scrypt =
revealing password information via memory side channels. It=E2=80=99s =
not an obvious big win, but it might be useful in some =
situations: </div><div style=3D"margin: 0px; min-height: =
14px;"><br></div><div style=3D"margin: 0px;">Instead of using P, the =
password, in step 1 of scrypt, use:</div><div style=3D"margin: 0px; =
min-height: 14px;"><br></div><div style=3D"margin: 0px;"> Q =
=3D Hash (P, Salt) modulo m</div><div style=3D"margin: 0px; min-height: =
14px;"><br></div><div style=3D"margin: 0px;">(If m=3D2^k, this just =
means taking the k low order bits of the hash.)</div><div style=3D"margin:=
0px; min-height: 14px;"><br></div><div style=3D"margin: 0px;">The =
parameter m limits how much information about P is recovered if some of =
the scrypt memory is compromised. If m is low, and Hash is fast, an =
attacker can devote one processor to each possible value of Q, having =
each processor calculate Q for every trial password and only proceeding =
if the Q is its assigned value. Each processor will only have to fill =
memory once. If Hash is resource consuming, e.g. PBKDF2-4096 as you =
propose, then the processors will have to expend m-times the work of the =
defender to screen trial passwords. The attacker can get around this by =
screening trial passwords centrally and only passing the ones with a =
given Q to the processor handling that Q, but this entails a lot more =
interprocessor communication. As m gets larger, each processor will have =
to handle more than one Q, though they can do so one at a time. If m is =
big enough, the attacker is forced to do a lot of memory exercise. So =
there is a tradeoff here between password information inserted in step 1 =
and resulting difficulty for the brute-force attacker. </div><div =
style=3D"margin: 0px;"><br></div><div style=3D"margin: 0px;">3. I think =
it is helpful to distinguish two use cases for KDFs: protecting login =
credentials and protecting cryptographic keys. In the case of a =
cryptographic key, the attacker who can glean KDF memory information =
most likely has easier access to some ciphertext generated with the key. =
Examples include whole disk encryption of a seized laptop or attacking =
WPA2 wireless encryption. The ciphertext itself is usually sufficient =
information to mount an off-line attack. So the gain from getting hold =
of the KDF memory, particularly after the algorithm is complete or half =
complete, is low. On the other hand, login credentials can be =
further protected by limiting the number or rate of failed attempts. So =
any leaked information that allows an off-line attack is a big =
improvement for an attacker. For login credentials, it may be worth =
throttling the amount of information an attacker get from a memory =
compromise as described above, even if it means less KDF gain. Of course =
with common weak passwords, even 20 bits of leakage can be bad, but I =
see no reason to put the full entropy of a strong login password =
into step 1. For cryptographic keys, any reduction in KDF gain may not =
be not worth it.</div><div style=3D"margin: 0px;"><br></div><div =
style=3D"margin: 0px;">4. The SIMD-hash needs SIMD CPU instructions for =
full throughput, but these are available on modern x86 and ARM =
architectures. In the case of protecting cryptographic keys, there is =
less need for interoperability. If I am protecting my laptop's hard =
drive, I may as well use all the silicon in my CPU and not worry if the =
same algorithm will work efficiently on another architecture. We needn't =
design KDFs to the lowest common denominator.</div><div style=3D"margin: =
0px;"><br></div><div style=3D"margin: 0px;">Arnold Reinhold</div><div =
style=3D"margin: 0px; min-height: =
14px;"><br></div></div><div><br></div><div><br><span =
class=3D"Apple-tab-span" style=3D"white-space:pre"> =
</span></div><div><br></div><div><br></div></body></html>=
--Apple-Mail=_077D5E54-E01C-429A-8263-6C8A295F5935--
--===============2276496252726741399==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography
--===============2276496252726741399==--