[2297] in cryptography@c2.net mail archive

home help back first fref pref prev next nref lref last post

A Cryptanalysis of CipherSaber-1

daemon@ATHENA.MIT.EDU (Arnold G. Reinhold)
Thu Mar 19 12:34:28 1998

Date: Thu, 19 Mar 1998 11:19:55 -0500
To: cryptography@c2.net
From: "Arnold G. Reinhold" <reinhold@world.std.com>

-----BEGIN PGP SIGNED MESSAGE-----

[I am posting a "beta" draft this paper to the cryptography list in
the hope of getting some comments before giving it wider distribution
- -- agr]

CipherSaber-1 is a way of using Ron Rivest{s RC4 cipher as a practical
symmetric key cryptosystem. It also is intended to demonstrate that
strong cryptography can be very simple to implement, and therefore
impossible to suppress without massive infringement on civil
liberties. A CipherSaber-1 program can be written in 16 lines of
QBasic. CipherSaber is explained more fully at
http://ciphersaber.gurus.com.

In CipherSaber-1, the ASCII representation of the user key or
passphrase is used directly as part of the RC4 key. To prevent the
same RC4 key from being used twice, a ten byte initialization vector
(IV) is appended to the user key to form the complete RC4 key. The IV
is stored as the first ten bytes of CipherSaber-1 encoded files.

This paper presents an analysis of several potential weaknesses in
CipherSaber-1 and recommendations for avoiding them in practice. These
recommendations turn out not to be at all onerous.

This paper uses the same notation that Bruce Schneier employed in his
description of RC4 in the second edition of Applied Cryptography,
except that the stream of cipher bytes that RC4 produces are denoted
as C(i).

1. Key Size

In most cryptosystems, the longer the key, the better. However, it was
pointed out by several individuals who looked at CipherSaber-1 when it
was first announced that a very long user key could prevent the IV
from affecting more than a few of the S(i) values. This, in turn,
could cause leakage of plaintext. To insure adequate mixing, we
advised users  to restrict their user key length to 54 characters or
less, as most would anyway. This limit insures that the IV will be
mixed in at least four times during the setup process. Here is a more
detailed justification of the 54 character limit.

We want to estimate the likelihood that initial cipher bytes will be
unaffected by the IV. Assume a user key length of 54 bytes, the
maximum length we recommend. Then all the S(i) after the first 54 will
be affected by the IV during key setup. S(0) through S(53) will only
be affected by the IV if they are swapped in key setup rounds 55
through 256. That means there 202 opportunities. The probability that
a given S(i) will remain unswapped is

     (255/256)**202  = 0.45.

So the expected number of the first 54 S(i) that are unaffected is
54*0.45
= 24.5. Thus the probability that S(i), where i is selected at random
from [0,256], has not been affected by the IV is

    24.5/256 = 0.096

We can calculate the values that the first few S(i) are likely be have
after the key setup is complete if we assume that none of the first
few swaps hit an S(i) that has already been swapped previously. Recall
that at the start of key setup, S(i) = i, for all i.
Then:

     S(0) = K(0)
     S(1) = K(0)+K(1)+1
     S(2) = K(0)+K(1)+K(2)+3
     ...

Call the successive cipher stream bytes C(i). If the first few S(i)
are not affected during the production of the first few cipher bytes,
then:

     C(0) = S(S(1) + S(S(1)))
     C(1) = S(S(2) + S(S(1)+S(2)))
     C(2) = S(S(3) + S(S(1)+S(2)+S(3)))
     ...
     C(i) = S(S(i+1) + S(x)), where x = S(1)+S(2)+...+S(i+1) mod 256

Each  S(i) for i between  0 and 53 have a 0.45 chance of being
unaffected by the IV. Therefore x has a 0.45**(i+1) chance of being
unaffected.

The values S(x) and S(S(i+1)+S(x)) each have a .096 chance of being
unaffected. Therefore C(i) has a chance p(i) of being unaffected by
the IV, where
p(i) =0 .096*0.096*0.45**(i+1) = 0.0092*0.45**(i+1)

     i	1/p(i)

     0	242
     1	539
     2	1198
     3	2662


1.1 C(0)  Analysis

The most vulnerable Cipher byte would appear to be C(0). However
assume
further that the first two characters of the user key are 7-bit ASCII
and do
not contain control characters. Thus they are in the range [20, 7E]
hex,
which is [32, 126] decimal.

Recall that  S(1) = K(0)+K(1)+1 and therefore is in the range
[65-253].

Also remember that C(0) = S(S(1) + S(S(1)))

We see that the S(S(1)) term is guaranteed to select an S value that
was
affected by the IV. So if the first two characters of the user key are
7-bit ASCII, C(0) will always be afffeced by the IV.


1.2 C(1) Analysis

Based on the above analysis, C(1) now appears to be the most
vulnerable cipher byte.

Remember that  C(1) = S(S(2) + S(S(1)+S(2))) and S(1) = K(0)+K(1)+1

If K(0) and K(1) are in the range [27, 100] decimal, then S(1) must be
in the range [55, 201] decimal. This means that S(2) and S(1)+S(2)
cannot both be in the range [0, 53] and C(1) is guaranteed to be
affected by the IV. The condition that K(0) and K(1) are in the range
[27, 100] decimal, which is [1B, 64] hex, is satisfied if the first
two characters in the key are digits or uppercase characters.  So if
the first two characters of the user key are digits or uppercase
characters, C(1) will always be afffeced by the IV. Given that the
chance of C(1) being unaffected by the IV is 1/539 which is less than
the likelihood of any random 8-bit value (1/256), we have not
recommended that users restrict the first two characters of their key
to digits or uppercase.


1.3 Effect  of Varying Key Size

For a 41 byte key (the size that insures 5 mixings), the probability
p(1) that  C(1) is unaffected by the IV is computed as follows. The
probability that one of the first 41 S(i) being unaffected by the IV
is

     (255/256)**(256-41)=.43

Probability of A random S(i) being unaffected is 0.43*41/256 = 0.069

Probability p(1) of C(1) being unaffected is (0.43*0.069)**2 = 1/1134

This is much less than the 1/256 likelihood of a random 8-bit value.


For a 74 byte key (the size that insures 3 mixings), the probability
p(1) that  C(1) is unaffected by the IV is computed as follows.

The probability that one of the first 74 S(i) being unaffected by the
IV is:

     (255/256)**(256-74) = 0.49

Probability of a random S(i) being unaffected is 0.49*74/256 = .141

Probability of C(1) being unaffected is (0.49*0.141)**2 = 1/209

This is greater than the 1/256 likelihood of a random 8-bit value,
justifying our recommendation not to use a key this big.


2. Weak Keys

On September 22, 1995, Andrew Roos posted a paper to
sci.crypt.research that  presented a class of weak keys in RC4.  The
conditions for this class are

     K(0) + K(1) = 0 mod 256.

These weak keys can leak information about the first byte of the key
or cipher stream.

Note that if the K(0) and K(1) are 7-bit ASCII characters at least one
of which is not NUL (00), the above condition is never satisfied. Thus
the use of printable 7-bit ASCII keys avoids this class of weak keys.

Several people have asked why the IV isn't prepended to the user key.
One reason is that doing so could permit weak keys to occur.


3. IV Size and Randomness

RC4 is a stream cipher. That means RC4 produces a stream of bits that
are xor'ed with the bits in the message. In a stream cipher it is
essential that the same key never be used twice. Here's why:

If Sam, the skilled snoop, ever encounters two different messages
encrypted with the same RC4 key, then Sam can xor their cipher text to
eliminate the RC4 cipher bits because xor repeated twice is a null
operation. Sam will then have a stream of bits that represent the xor
of the plaintext of the two messages. If Sam knows the plaintext of
one of the messages, he can easily recover the other. Worse, if the
messages are in a natural language such as English, he can usually
recover both messages -- even if he knew neither.  While revelation of
encoded messages is bad news for any cipher, in neither case is Sam
able to recover the user key.

CipherSaber addresses the requirement that the same key never be used
twice by appending a ten byte quantity, the initialization vector
(IV), to the users key. If the same IV is never used twice then the
same RC4 key can never occur twice.

The IV values can be any unique 8-bit quantities. For example, one
could use the year, month, day, hour minute second and hundredth of a
second. As long as no two messages were sent at exactly the same time,
you would be safe.

Another way to get a high likelihood of uniqueness is to use random
values for the IV bytes.  This also makes encrypted file look like
totally random byte strings, which may be useful in some cases.

Since there are 80 bits in the IV,  if  n messages have been encoded
with the same user key,  the probability q(n) that  two messages will
have the same IV is

     q(n) = n(n-1)/(2**80)

Even if one generated a billion messages, the probability that two
messages have the same IV is about one in a million. There is, however
a big "but" as we see in the next section.

3.1 Use of Built-in Random Number Generators to Create the IV.

The "but" is that we are assuming all the IV bytes are random.
Creating random values on a computer can be tricky, In particular, the
built-in random number generators supplied in most programming
environments are notoriously bad.

Most C programming environments offer a 32 bit random number
generator. If such a generator is used to create an 80-bit IV there is
still at most only 32 bits of randomness. The probability q(n) that
two messages will have the same IV is then

     q(n) = n(n-1)/(2**32)

If you want q(n) to be less than one in a thousand, you can only send
about 2000 messages with the same user key. To achieve even this level
of safety, you need to initialize the random number generator with a
seed that has 32 bits of randomness.


3.2 Problems with QBasic as a Source of IV Randomness

A more extreme example is Microsoft's QBasic 4.5. This Basic
interpreter is shipped ont the CD-ROM that comes with every copy of
Windows '95, making it undoubtedly the most widely distributed
programming environment in the world. QBasic is very well suited for
implementing RC4. It even has a SWAP statement. However, its random
number generator, RND, has only 24 bits of state. This means if you
want q(n) to be less than one in a hundred, you can only send about
400 messages with the same user key.

Even worse, the standard way to initialize QBasic's random number
generator is to put the statement RANDOMIZE TIMER at the beginning of
the program. TIMER is a built in function that returns a floating
point value containing the number of seconds since midnight. TIMER's
resolution appears to be 1/100 of a second. although I am told the
operating system only supports resolution of 1/55 of a second.
Assuming an eight hour variability in time of encryption, TIMER only
supplies about 20 bits of randomness. This means if you want q(n) to
be less than one in a hundred, you can only send about  140 messages
with the same user key.


3.3 Better Ways to Create the IV in QBasic

Here are some methods for generating the CipherSaber  IV in QBasic

The first is to xor the RND values with the characters returned by the
DATE$ function:

     randomize timer
     for i = 1 to 10
          iv$(i)=chr$((asc(mid$(date$,i,1)+rnd) mod 256))
     next  i

This method sharply reduces the likelihood that two messages will have
the same IV. However, while the encrypted file will look random to a
casual observer, an expert could detect  that the IV was not really
random and might even be able to extract the date of the message. In
most cases this is not a concern since the date could be determined by
the message header or file directory block.

Here is a method that comes close to creating a truly random IV. It
captures key stroke times while the user key (k$) is entered.

     get: c$=inkey$
     if c$ = "" then goto get
     if c$ = CR$ then go to done
     i=i+1
     j=(i mod 10) + 1
     k$(i)=c$
     randomize timer
     iv$(j)=chr$((rnd + asc(iv$(j)) mod 256)
     goto get

Note that the effect of randomize in QBasic appears to be cumulative.
Thus

     randomize 100
     randomize 100
     print rnd

produces different output from

     randomize 100
     print rnd

Even with DOS's low timing resolution, one can expect an 8 tick
variation per character entered, producing 3 bits of entropy. With a
15 character user key, this would produce about 45 bits of entropy.
The 18 to 20 bits of entropy represented by the first randomize timer
call would bring the IV close to 65 bits of entropy. Given that users
tend to enter random keys slowly, 4 bit of entropy per character is
more likely, bringing us close to a full 80 bits of randomness.

One disadvantage of the key stroke timing method is that you must
force the user to enter their key afresh for every message.

4. CipherSaber-2

To solve the problems caused by the single round of key setup in RC4,
I was originally going to include a CipherSaber-2 in my proposal.
CipherSaber-2 has a parameter N that specifies the number of times the
RC4 key setup loop is repeated. For N=1, CipherSaber-2 would just be
the same as CipherSaber-1. The value of N could be agreed upon by both
users and kept secret along with the key. Indeed, N can be considered
part of the key.

CipherSaber-2 has several advantages  over CipherSaber-1:

1. For n >= 2 it solves the mixing problem. User keys can be up to 246
bytes without any problems.

2. It would coax users to have longer keys. A big weakness in
cypto-for-the-masses is that people just will not pick long enough
keys. Having a second "bucket" helps from a human factors perspective
(sort of like area code and 7-digit phone number vs. a straight
10-digit phone number)

3. Large values of N makes exhaustive key search much slower. A
thousand-fold increase in the time needed to test a key is equivalent
to adding 10 bits of entropy to the key. A CipherSaber-2 written in C
and running on a Pentium could probably stand a five or six digit N
without an unacceptable delay.

I did not include CipherSaber-2 in the initial announcement because I
wanted to keep the proposal simple and because I wanted to use a
cipher that was straight out of a textbook. Also it is foolhardy to
introduce a new cipher without a time for other people to review it.


5. Recommendations

To use CipherSabe securely, we recommend that users:

o Use a high entropy key or passphrase. Using low entropy keys is the
most common mistake users make  A low entropy key will render any
cryptosystem vulnerable. We recommend using a key consisting of 15 or
more random characters, or 5 more  short dictionary words. See
http://www.hayom.com/diceware.html for a simple yet rigorous way to
make high entropy passphrases.

o Keep keys 54 characters or shorter (most users would do this anyway)

o Use 7-bit printable characters, at least for the first few
characters  (most users would do this anyway as well)

o Use a strong method for generating the random IV, particularly if
you plan to encode more than a few dozen messages with the same user
key. See Section 3.3, above.

o For highest security, use CipherSaber-2 along with a strong source
of random numbers.


- --------

I want to thank Hal Finney for his valuable comments on a much earlier
draft. Additional comments on this paper in general, and on
CipherSaber-2 in particular, are welcome.


Arnold G. Reinhold
reinhold@world.std.com
March 19, 1998  Rev c




-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQCVAwUBNRFEa2truC2sMYShAQHrrgP+M/lrkPPKmprDHf7Ipkx5+weDwzdThpcY
0waDUKgYRPKbZgEiqoPBba418eyg2szUMPcFdHIeM/v7Cq7h39Jme1YNltSf5RZX
WnUnmhGNzJMOi+TgCOtIusi65N3k3gEO1V9QqVRhtZKmpUCBYqYNjs6t9kxu8as7
g0ZuAAVEWG8=
=j48r
-----END PGP SIGNATURE-----



home help back first fref pref prev next nref lref last post