[2224] in cryptography@c2.net mail archive
Re: DES, MMX, and FPGAs
daemon@ATHENA.MIT.EDU (Andreas Bogk)
Mon Mar 2 15:10:13 1998
Date: Mon, 2 Mar 1998 20:56:42 +0100
From: Andreas Bogk <andreas@telekom.artcom.de>
To: cryptography@c2.net
In-Reply-To: <6B5344C210C7D011835C0000F8012766010035E1@exna01.securitydynamics.com>; from Trei, Peter on Thu, Feb 26, 1998 at 04:10:26PM -0500
On Thu, Feb 26, 1998 at 04:10:26PM -0500, Trei, Peter wrote:
> Some highly uninformed blather about FPGAs follows:
>
> Wiener's paper uses custom VLSI chips, and claims a need for
> 26k "gates" or 78257 "sites". I'm not sure how this translates to
> silicon requirements in FPGAs - it seems that "gate" and "site" are
> nominal terms which are not directly equivalent to actual, physical
> gates. I've found one reference to a general DES FPGA implementation
> which requires 25k gates, though it's not clear if this reuses the
> round circuitry, or 'unrolls' it (I suspect the latter, which would
> be good).
This is most likely not an unrolled implementation. The numbers are
a little different for FPGAs. Usual FPGAs (some manufacturers prefer
to call them CPLDs, the theory is the same) consist of interconnected
blocks of programmable logic, called macro cells or logic elements.
These are lookup tables with 4 bit of input and 1 bit of output.
Additionally there are some flip-flops, and input-output pins.
Now a DES encryption consists of expansion, permutation, compression
and S-box lookup. All but the latter are free in hardware, they are
just some interconnections. The S-box consists of 8 boxes with 6 bits
of input and 4 bits of output each. With our 4-to-1 tables we can
build a S-box by generating each output bit separate, and using 2 of
the input bits to control a 4-to-1 multiplexer, which gets it's input
from 4 separate logic elements, who get the remaining four bits as
input, and are wired for each possible combination of the 2 bits.
I have written a perl script which generates the VHDL code to do the
above from a S-box description. It basically works, it's just larger
than expected. In theory, it should be possible to do the above in 7
LEs (logic elements), because the multiplexer should be implementable
in 3 LEs. Unfortunately my logic synthesizer thinks otherwise.
But if we stay at the theory, we'll need 8 * 4 * 7 LEs per stage of DES,
or 3584 for a full-blown unroll. You'll get that much LEs in what
manufacturers would call 70,000 gate FPGAs. Of course you'll need some
control logic around all that, so 100,000 gates would be a good bet.
The FPGA I target with my implementation is the Altera FLEX10k100, which
has 4992 LEs and 5392 flip-flops (you'll need at least 120 per stage for
registers for data and key). That should fit.
Now for the speed: the maximum clock rate depends on the logic you
implement, and is bound by the longest connection a signal has to travel
within one clock interval. I have no exact data yet, but 50 MHz should be
achievable, even more with the new generation of Flex10k, but I don't have
the software for this yet.
Andreas
--
Top ten resons why SGI sucks, No. 2:
"Assertion failed in file "../../c++runtime/throw.cxx", line 841
Abort (core dumped)" -- IRIX 6.2 run time library