[1330] in cryptography@c2.net mail archive

home help back first fref pref prev next nref lref last post

Re: text formatting ..... and coderpunks archive

daemon@ATHENA.MIT.EDU (Antonomasia)
Wed Aug 13 21:56:24 1997

Date: Wed, 13 Aug 1997 21:27:06 +0100
From: Antonomasia <ant@notatla.demon.co.uk>
To: coderpunks@toad.com
Cc: cryptography@c2.net


Ulf,

> >The proof-reading should be as easy as possible without a good copy
> >to compare against!

> I don't think proof-reading without a copy is possible if the result
> is supposed to be identical to the original -- especially for
> low-redundancy text such as C code.

I certainly don't expect all proof-reading to be done, or even attempted,
w/o a good copy.  The best efforts of the publisher to make text that almost
checks itself will not bring everything within reach of the paperless checker.
But it will make a significant difference in the kind of work we were doing
at HIP, and I still think it's worth aiming for.


> >It appears that OCR software applies rules on the likely content of natural
> >language text.  Some familar errors are:
> >
> >  "*bn" -> "*ten"

> This seems to indicate that the software was poorly configured or
> inadequate for this purpose. But even if shrink-wrapped OCR software
> does not support monospaced fonts and context-independent character
> recognition, it should be possible to reduce the error rate by running
> several different OCR programs on the files -- they are unlikely to
> yield the same errors. The correct lines, correct/probable checksums
> and perhaps even probable characters could be selected automatically.

I'd be interested to see whether context-independent CR made things
better or worse.  I know nothing about how likely different OCR programs,
or re-runs of the same one, are to get the same errors.  Would these
be based on the same scan from paper to bitmap ?  Maybe somebody
has looked at this ?

> Proof-reading could be supported by an emacs mode or something similar
> that tests the checksum for syntactical correctness, drives the
> unmunge program and automatically jumps to incorrect lines.

This looks like a good suggestion.



BTW, there was sufficient interest in a coderpunks archive
(one vote from David Wagner) that I have started the archive
collection.  Expect the search tool in a few days.


--
##############################################################
# Antonomasia   ant@notatla.demon.co.uk                      #
# See http://www.notatla.demon.co.uk/                        #
#### !!! PGP 5.0 beta available now at ftp.replay.com !!! ####

home help back first fref pref prev next nref lref last post