[79049] in cryptography@c2.net mail archive

home help back first fref pref prev next nref lref last post

Re: Entropy of other languages

daemon@ATHENA.MIT.EDU (Sandy Harris)
Wed Feb 7 13:40:08 2007

Date: Wed, 7 Feb 2007 05:42:49 -0800
From: "Sandy Harris" <sandyinchina@gmail.com>
To: cryptography@metzdowd.com
In-Reply-To: <45C67061.7050405@sound-by-design.com>

Allen <netsecurity@sound-by-design.com> wrote:

> An idle question. English has a relatively low entropy as a
> language. Don't recall the exact figure, but if you look at words
> that start with "q" it is very low indeed.
>
> What about other languages? Does anyone know the relative entropy
> of other alphabetic languages? What about the entropy of
> ideographic languages? Pictographic? Hieroglyphic?

The most general answer is in a very old paper of Mandelbrot's.
Sorry, I don't recall the exact reference or have it to hand.

He starts from information theory and an assumption that
there needs to be some constant upper bound on the
receiver's per-symbol processing time. From there, with
nothing else, he gets to a proof that the optimal frequency
distribution of symbols is always some member of a
parameterized set of curves.

Pick the right parameters and Mandelbrot's equation
simplifies to Zipf's Law, the well-known rule about
word, letter or sound frequencies in linguistics.
I'm not sure if you can also get Pareto's Law which
covers income & wealth distributions in economics.

-- 
Sandy Harris
Quanzhou, Fujian, China

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo@metzdowd.com

home help back first fref pref prev next nref lref last post