[19] in Kerberos
Kerberos slave data base
jon@ATHENA.MIT.EDU (jon@ATHENA.MIT.EDU)
Sun Aug 9 21:15:46 1987
From miller%erlang.DEC@decwrl.DEC.COM Fri Jul 11 12:28:41 1986
Date: 11-Jul-1986 1059
From: miller%erlang.DEC@decwrl.DEC.COM (Steve Miller)
To: kerberos@athena.mit.edu (Distribution list @SYSOGIN:KERB)
Subject: Kerberos slave data base
Purpose:
--------
This note presents plans for slave copies of the Kerberos database.
Recall that multiple replicas of the Kerberos authentication database
and server may run concurrently in order to provide greater availability
and a degree of load sharing. In order to avoid the difficult problems
of distributed databases, only one copy, the master copy, may be directly
updated. All the other replicas, called slaves, are entirely dependent
on the master copy, and are only updated from the master copy. For
Kerberos type applications with relatively low frequency of update,
and modest requirements for timeliness, it is sufficient to update the
slave copies on the order of once or twice a day.
Master Copy:
-------------
The master copy of the Kerberos authentication database is layered on
top of the RTI Ingres relational database. Alternative technologies
with similar characteristics -- indexed access by multiple keys,
concurrency control, good performance independent of size, ability to
support many ad-hoc queries -- could be easily substituted if RTI Ingres
is not available. A small library
of embedded query language (equel) database access routines provides the
standard interface between the Kerberos database and various Kerberos
utilities. A few new utilities,
such as one to change the master key of the stored database, may need
either additional library routines, or specific equel programming.
The master copy is read/write, and once sufficient slave copies are
operational, should only be used for database related activities, not
routine Kerberos requests for tickets.
Slave Copies:
-------------
Slave copies of the database are designed to be read-only, and do not
use the RTI Ingres database, but rather only the standard Unix file
system. They are intended to support routine Kerberos ticket requests
as efficiently as possible. Slave copies only contain a subset of the
fields for each record, but should contain all records from the master
db.
A slave copy is produced periodically at the master site as follows:
1) A routine steps through the entire master copy, where for
each record it:
a) computes a hash function of the name and instance,
b) writes a Unix file entry consisting of a subset of
the fields of the master records, and the computed hash.
c) note that step (b) requires a Unix file, since RTI
Ingres does not support nested operations.
2) The Unix file containing all the slave records is sorted according
to the hash values.
3) Two new files are produced from the sorted file, an index file,
and a data file. (Details below). The data file contains the hash
sorted entries in order, without the hash values. The index
file contains an array of offset values, each value representing
the byte offset in the data file where all records with that
hash value begin.
4) Steps (2) and (3) could be combined, if desired.
The slave files are then copied to the slave Kerberos servers. This
copy should be checked for integrity by using Kerberos protocols to
compute a checksum of the file at the master site, and securely transmitting
the checksum to the slaves. The slave sites then recompute the checksum
of the received file, and compare it to the checksum securely received
from the Kerberos protocol. If they match, the file integrity is ok,
and the slave should reinitialize its database, as atomically as
possible, using the new file.
The slave server uses the slave file as follows:
1) On startup, it opens both the index and data files, checks their
size, allocates memory for the entire files, and reads both entire
files into memory. In other words, we are using the virtual memory
system as a cache. This should work well as long as the
size of the files does not greatly exceed the available physical
memory. Most or all lookups can be done entirely in memory.
Let the system do the dirty work, but keep an eye on the
paging/swapping counts. A rough calculation is
10,000 records x 50 bytes/record = .5mb, which should be fine
even on a base system with only 2mb total memory. But keep
monitoring, just in case the cacheing strategy needs to be changed.
2) For each request, compute the hash value of the name and instance
using the same hash function as before. Then determine the search
limits of offset[hashvalue] to offset[hashvalue+1] by looking up
the index file values. Within this range of offsets, search
the data file, comparing the actual name and instance to the
request. Either a match will be found, or the upper limit to the
search reached. The hash value is chosen to produce a mean
of only 10-15 entries per bucket, which should all fit in a
single page.
3) The slave database functions noted in (1) and (2) are packaged as a
Kerberos database library supporting a subset of the semantics of the
master library. Either the routine names could be identical, with two
differently named libraries, and "ld" specifying the appropriate
one, or slightly different versions of the routine names could be
used.
Details:
--------
hash function -
For a population of ~10,000-20,000 records, use ~ 1000 buckets such that
most buckets are 10- 20 records and fit within one page.
(I have some good functions, cheap to
compute, that I tested on the entire athena_reg database of >10000, and
produce a good mean and standard deviation of bucket counts. I have to
find the code to give you the particulars.)
index file --
If N is the number of buckets in the hash function, numbered 0 - N-1, then
the file need just be structured as:
unsigned long offset[N+1]; /* array of offsets*/
where
offset[i] = byte offset from beginning of data file to beginning of
records with hash value 'i'
and
offset[N] = size of data file, in bytes, used to terminate searches
on the last bucket.
data file --
The data file is a sequence of records, sorted by the computed hash.
Each record contains only those fields needed by routine Kerberos
ticket request operations. Any other needs should go to the master.
Name and instance are compressed so they only take up as much space
as needed, not the entire space (currently 40 char) allowed in the
master database. The fields are as follows, in order beginning with
the lowest address. By convention, all database multi-byte fields are
defined in vax byte order.
Field: type:
name ascii null term string
instance ""
key_version unsigned char
kdc_key_version unsigned char
max_life unsigned char
key_low unsigned long
key_high unsigned long
exp_date unsigned long
attributes unsigned ?? /* check definition*/
(Note: when searching, read name, instance, compare, then skip
fixed distance to next name,instance if needed. Compare new address to
limit.)
Both the data file and the index file could be prefixed with a timestamp
and address of creation, as long as these were taken into account in
computing the offsets. This may help in managing slave file versions.