[1167] in cryptography@c2.net mail archive

home help back first fref pref prev next nref lref last post

Chip Technology (Re: Better DES challenge update)

daemon@ATHENA.MIT.EDU (Bill Stewart)
Fri Jul 4 12:49:16 1997

Date: Thu, 03 Jul 1997 18:28:16 -0700
To: Paul Bradley <paul@fatmans.demon.co.uk>
From: Bill Stewart <stewarts@ix.netcom.com>
Cc: cryptography@c2.net
In-Reply-To: <Pine.LNX.3.91.970702170033.474A-100000@fatmans.demon.co.uk
 >

>> BTW: Given that such a FPGA machine exists, does anyone have a good
>> idea what one could do with it besides cracking DES?
>I have to admit ignorance in terms of the uses of FPGAs in other 
>cryptanalytic tasks: Would an FPGA be significantly faster than general 
>purpose hardware for a factoring attack? How about finding discrete logs?

<H1>Chip Technology Stuff</H1>
I'm not a chip expert, but I know enough to be mostly buzzword-compliant,
and here in Silicon Valley there are enough chip benders that somebody
thinks it's worth putting up billboards that just say "FPGA2ASIC"
and a company name and URL :-)

Gate Arrays are chips with regular grids of logic gates and the glue 
around them to connect to neighboring gates.  It's not the densest
way to build chips, and you wouldn't use it for RAM, but it lets you
do development far more quickly than totally hand-rolled designs, 
and use a lot more automated tools.  Field-Programmable Gate Arrays (FPGAs)
are gate arrays with the logic stored in RAM, so you can just download
a design to the chip and run it.  They're not as dense or fast or cheap as
other types of gate arrays, since programmable glue and gates are
more complex than fixed-logic, but they're extremely flexible, 
and a popular way to do design is to start with a computer simulation, 
then do an FPGA, then go to an FPGA-to-ASIC company to fab it into faster, 
cheaper production chips.

Application-Specific Integrated Circuits (ASICs) can really be just about
anything you want for the widget you're making now, but a common approach
to ASIC design is to use gate arrays and libraries of pre-designed 
components, like UARTs and registers and DSP cores and small general-
purpose processor cores (like 8086s) and busses and multipliers and frobs,
and use a layout program to arrange them efficiently on the chip,
and maybe custom-design a few special pieces if necessary,
all of which are just values for the gate array gates and glue
or memory with well-defined interfaces to the gate array parts.

General-purpose CPUs are often just gate array chips that have
all the memory and bus interfaces, registers, arithmetic units, and glue,
in quantities that are balanced for normal workloads,
and they're often important enough to hand-roll more of the design
rather than letting the layout programs and libraries do the job
(somewhat like writing your own assembly code for a few critical parts of
operating system programs and using GCC to do the rest of the work.)

<H1>Crypto Using Chips</H1>
FPGAs and ASICs are very good at bit-twiddling (while general-purpose CPUs 
are designed to go fast on whole words or bytes), so both are a big win on
algorithms like DES that were designed to run on bit-twiddling hardware.
Most modern symmetric-key algorithms are designed to twiddle bytes or words 
because that's what general-purpose CPUs do well, so there's less payoff.
If you know the data is always going to come in here, go out there, 
and get many small steps of the same kind done to it every time, 
you can build pipelined data-flow processors that always do the same thing, 
but do it fast, but it helps if the crunching you do on the data
is much bigger than the amount of data you need to move in and out,
especially because moving data between chips is generally very slow.
You can sometimes win, especially for algorithms that parallelize well,
since you don't need to waste much of your resources on interrupt-handling,
floating point, six different I/O busses, division logic you won't use, etc.
Some of the Berkeley folks did a study and predicted that RC4-40 probably
won't gain much cost-effectiveness over fast general-purpose CPUs;
check the usual archives.
   
For RSA, Diffie-Hellman, and similar crypto, a major part of the job is 
multiplying and adding bignums, and it's possible to design chips with
bignum registers, adders, and multipliers, that do those better than
conventional processors that only handle 32-bit or 64-bit numbers.  
For instance, Rainbow (www.rnbo.com, the folks who bought Mykotronix) 
makes an RSA accelerator board that gets about 10-20x more performance
than general-purpose processors.  The big win comes from the bignum
acceleration; you don't want to put more of the algorithm logic
into the chip than you have to, since that will change a lot
as you discover what things really go fast and what don't,
and since modmults take a lot more crunching than moving results.
Deciding what to do with your multiplies and modmults is a job
for a general-purpose CPU.

<Henry-Spencer-Mode> For short-lived applications, it's usually faster 
to write a good program for the main CPU than spend your time
hacking chips which will need to interface, usually clumsily, with the CPU,
and for long-lived applications, it's usually cheaper to write a good 
program for the CPU and concentrate on making the CPU faster and faster,
rather than designing a bunch of separate chips for separate functions,
which will not be as well-designed as if you'd spent all the money 
designing the CPU very well, and you'll have to be constantly improving
all the separate chips and remaining compatible with the old interfaces
rather than focusing your energy on making the CPU faster, especially 
if the interfaces involve interrupt-handling. </Henry-Spencer-Mode>  
(Henry's a major fan of doing graphics in frame buffers using the CPU, 
though these days when everybody uses clumsy Intel CPUs and has to be 
compatible with clumsy PC interfaces and appalling excuses for OSs
there's still a major market for graphics boards that give board-makers
a way to make money providing high-performance products,
independently of whatever Intel and Motorola do with the CPUs.
But even at that, the more your board can look like memory, and 
less like something that needs lots of interrupt handholding, the faster.
He also had the last major PDP-11/45 Usenet system, which consistently
outperformed far more expensive newer machines for many years...)



#			Thanks;  Bill
# Bill Stewart, +1-415-442-2215 stewarts@ix.netcom.com
# You can get PGP outside the US at ftp.ox.ac.uk/pub/crypto/pgp
#   (If this is a mailing list or news, please Cc: me on replies.  Thanks.)


home help back first fref pref prev next nref lref last post