[2228] in cryptography@c2.net mail archive
RE: DES, MMX, and FPGAs
daemon@ATHENA.MIT.EDU (Trei, Peter)
Mon Mar 2 16:24:27 1998
From: "Trei, Peter" <ptrei@securitydynamics.com>
To: cryptography@c2.net
Date: Mon, 2 Mar 1998 16:15:58 -0500
Thanks; it's good to hear from someone with a real-world knowledge of
the issues involved.
> -----Original Message-----
> From: Andreas Bogk [SMTP:andreas@telekom.artcom.de]
> Sent: Monday, March 02, 1998 2:57 PM
> To: cryptography@c2.net
> Subject: Re: DES, MMX, and FPGAs
>
> On Thu, Feb 26, 1998 at 04:10:26PM -0500, Trei, Peter wrote:
> > Some highly uninformed blather about FPGAs follows:
> >
> > Wiener's paper uses custom VLSI chips, and claims a need for
> > 26k "gates" or 78257 "sites". I'm not sure how this translates to
> > silicon requirements in FPGAs - it seems that "gate" and "site" are
> > nominal terms which are not directly equivalent to actual, physical
> > gates. I've found one reference to a general DES FPGA implementation
> > which requires 25k gates, though it's not clear if this reuses the
> > round circuitry, or 'unrolls' it (I suspect the latter, which would
> > be good).
>
> This is most likely not an unrolled implementation. The numbers are
> a little different for FPGAs. Usual FPGAs (some manufacturers prefer
> to call them CPLDs, the theory is the same) consist of interconnected
> blocks of programmable logic, called macro cells or logic elements.
> These are lookup tables with 4 bit of input and 1 bit of output.
> Additionally there are some flip-flops, and input-output pins.
[Trei, Peter]
Wiener's paper definitely uses an unrolled design, and he claims
78257 'sites', or an equivalent of 26086 'gates'. This is on a
CMOS
chip, and you point is well taken that this does not directly
translate to
the gate count for FPGAs. The 25k figure I gave is referred to
in passing
in a Scientific American article on FPGAs (June 97) [Trei,
Peter] :
>
With a configurable chip, the software can
calculate the
subkey values once, and the data-processing
circuitry
can be optimized for those specific subkeys.
This
approach allows the subkey-scheduling hardware
to be
completely removed from the system. These
savings
have allowed us to implement the DES algorithm
in a
13,000-gate FPGA, instead of the 25,000-gate
circuit
previously required. When the encryption key
must be
changed, the user can quickly specify a new
circuit,
customized to the new key, and download it to
the
FPGA.
[Trei, Peter]
It's not clear from this whether the loop is unrolled or not,
but it's
difficult to see how they could get this reduction in gate count
if it were not.
[Trei, Peter]
> Now a DES encryption consists of expansion, permutation, compression
> and S-box lookup. All but the latter are free in hardware, they are
> just some interconnections. The S-box consists of 8 boxes with 6 bits
> of input and 4 bits of output each. With our 4-to-1 tables we can
> build a S-box by generating each output bit separate, and using 2 of
> the input bits to control a 4-to-1 multiplexer, which gets it's input
> from 4 separate logic elements, who get the remaining four bits as
> input, and are wired for each possible combination of the 2 bits.
[Trei, Peter]
There may be better ways; a lot of work has been done on s-box
logic for Eli Biham's 'bitslice' DES algorithm, which may enable
us to reduce the number of gates required quite substantially.
> I have written a perl script which generates the VHDL code to do the
> above from a S-box description. It basically works, it's just larger
> than expected. In theory, it should be possible to do the above in 7
> LEs (logic elements), because the multiplexer should be implementable
> in 3 LEs. Unfortunately my logic synthesizer thinks otherwise.
>
> But if we stay at the theory, we'll need 8 * 4 * 7 LEs per stage of
> DES,
> or 3584 for a full-blown unroll. You'll get that much LEs in what
> manufacturers would call 70,000 gate FPGAs. Of course you'll need some
> control logic around all that, so 100,000 gates would be a good bet.
[Trei, Peter]
Wiener's key scheduling logic is very overcomplicated, and can
be greatly
simplified. You really only need a 56 bit latch at each round (I
think I have the
terminology correct)
We can win even more with reconfigurable logic is by 'compiling
in' parts
of the key which change only slowly, and reloading the chip
whenever they
do change. This is similar to the approach in the above cited
paper.
Also, we can skip the last DES round, and use a general purpose
machine to test the rare (1 in 2^32) candidate keys that this
returns.
> The FPGA I target with my implementation is the Altera FLEX10k100,
> which
> has 4992 LEs and 5392 flip-flops (you'll need at least 120 per stage
> for
> registers for data and key). That should fit.
[Trei, Peter]
Wiener claims 2164 flip flops.
> Now for the speed: the maximum clock rate depends on the logic you
> implement, and is bound by the longest connection a signal has to
> travel
> within one clock interval. I have no exact data yet, but 50 MHz should
> be
> achievable, even more with the new generation of Flex10k, but I don't
> have
> the software for this yet.
>
> Andreas
>
> --
> Top ten resons why SGI sucks, No. 2:
> "Assertion failed in file "../../c++runtime/throw.cxx", line 841
> Abort (core dumped)" -- IRIX 6.2 run time library