[2320] in cryptography@c2.net mail archive

home help back first fref pref prev next nref lref last post

Re: fastest blowfish.asm?

daemon@ATHENA.MIT.EDU (Eric Young)
Mon Mar 23 15:23:58 1998

Date: Mon, 23 Mar 1998 10:05:31 +1000 (EST)
From: Eric Young <eay@cryptsoft.com>
Reply-To: Eric Young <eay@cryptsoft.com>
To: Adam Back <aba@dcs.ex.ac.uk>
cc: cryptography@c2.net
In-Reply-To: <199803210122.BAA00257@server.eternity.org>


On Sat, 21 Mar 1998, Adam Back wrote:
> I have looked at two assembler implementations [1], [2] so far, and MS
> VC++ 5 and GCC beat both of them!
> 
> Neither of the sources do anything about instruction scheduling to
> keep both execution units busy on the pentium.
> 
> I should note that I am not using an Intel CPU, but an AMD k6 233
> clone, which may or may not influence effectiveness of scheduling
> tricks.  Anyone know/done experiments on this?
> 
> Adam
> 
> [1] Eric Young's blowfish from SSLeay-0.8.1 also available separately
>     as libbf-0.7.2m.tar.gz

I've changed things since then.  I have near optimal for pentium and some
different code that is faster on ppro/pen2. (libbf-0.8.2b.tar.gz).
I got hold of a version of VTune and use it quite a bit now :-).

The fact that you are using an AMD k6 233 is a rather key point.

>From the readme in the current distribution
	There are blowfish assembler generation scripts.
	bf-586.pl version is for the pentium and
	bf-686.pl is my original version, which is faster on the pentium pro.
	When using a bf-586.pl, the pentium pro/II is %8 slower than using
	bf-686.pl.  When using a bf-686.pl, the pentium is %16 slower
	than bf-586.pl
	So the default is bf-586.pl

Depending on the CPU, the difference in scheduling is rather critical.
The main reason I have blowfish assember is for the gcc based unix boxes,
where the code generated is rather crappy.

Visual C 5 is a nice compiler :-)

eric


home help back first fref pref prev next nref lref last post