[2320] in cryptography@c2.net mail archive
Re: fastest blowfish.asm?
daemon@ATHENA.MIT.EDU (Eric Young)
Mon Mar 23 15:23:58 1998
Date: Mon, 23 Mar 1998 10:05:31 +1000 (EST)
From: Eric Young <eay@cryptsoft.com>
Reply-To: Eric Young <eay@cryptsoft.com>
To: Adam Back <aba@dcs.ex.ac.uk>
cc: cryptography@c2.net
In-Reply-To: <199803210122.BAA00257@server.eternity.org>
On Sat, 21 Mar 1998, Adam Back wrote:
> I have looked at two assembler implementations [1], [2] so far, and MS
> VC++ 5 and GCC beat both of them!
>
> Neither of the sources do anything about instruction scheduling to
> keep both execution units busy on the pentium.
>
> I should note that I am not using an Intel CPU, but an AMD k6 233
> clone, which may or may not influence effectiveness of scheduling
> tricks. Anyone know/done experiments on this?
>
> Adam
>
> [1] Eric Young's blowfish from SSLeay-0.8.1 also available separately
> as libbf-0.7.2m.tar.gz
I've changed things since then. I have near optimal for pentium and some
different code that is faster on ppro/pen2. (libbf-0.8.2b.tar.gz).
I got hold of a version of VTune and use it quite a bit now :-).
The fact that you are using an AMD k6 233 is a rather key point.
>From the readme in the current distribution
There are blowfish assembler generation scripts.
bf-586.pl version is for the pentium and
bf-686.pl is my original version, which is faster on the pentium pro.
When using a bf-586.pl, the pentium pro/II is %8 slower than using
bf-686.pl. When using a bf-686.pl, the pentium is %16 slower
than bf-586.pl
So the default is bf-586.pl
Depending on the CPU, the difference in scheduling is rather critical.
The main reason I have blowfish assember is for the gcc based unix boxes,
where the code generated is rather crappy.
Visual C 5 is a nice compiler :-)
eric