2012/12/22

Thinking about Alphabets

Since I've wrote a bunch about basically throwing away 30 years of work done by engineers significantly smarter than me, it occurs that you should really question everything when devising a new computer environment, besides just making a new character set that eliminates the more pointless glyphs of low order Unicode.  One aspect of that might be going as far as to redefining the alphabet used.

After all, when complaining about things being outdated and obsoleted by new technology and ideas, the 2,700 year old glyph system (albeit with added and removed glyphs over time) that forms the base of all language in the western world is a good candidate for reconsideration.  A lot of characters in the set are redundant - C and K, Y and I (and E).  In that sense, I am a fan of the International Phonetic Alphabet, which is a glyph system representing the pronounceable vocabulary of the human vocal tract.  It includes both the lung-based sounds, it has an extensions and sub-groups for clicks and other non-pulmonary sounds, and in the end it represents the goal of written language - to trans-code spoken language.  We can read a large spectrum of glyphs, and if we wanted we could encode them in a wide color gaumet, but our audible range is much more limited - the IPA has 107 characters, but in practice only around ~30 of them are significant, and if you got technical enough to create an alphabet with the discrete independent elements of spoken language, you could probably manage around that number.

But this isn't an infallible problem with a simple solution - the reason students don't hear about the IPA is because it is what many things with the word international in them the glyph system it uses is a hodge-podge mix of a dozen languages and dialects since some don't use the full range of human enunciation.  The result is that a lot of characters in the IPA are absurd multicharacter strings like ʥ, tᶣ, and ŋ̋.  Even though the modern Latin derived English alphabet leaves much to be desired, the glyphic complexity pretty much limited to a worst case of m or j.  So one objective of such a universal enunciation based alphabet, besides representing the proper human audible range, while not having redundant characters, is to have the simplest set of glyphs possible.

A good example of this is I.  A vertical bar is a capital i.  A vertical bar with a half foot is a capital L.  T is also pretty simple, as are N, Z, and V.  Thees all have 3 or fewer strokes in their structure, and have little subtle interrupt in their form.  In the same way humans read words by recognizing the entire glyph structure rather than the individual letters, having the least complex, most definitive glyphs represent the alphabet makes it the easiest to learn, recognize, and write.

The amount of work required to actually scientifically define the appropriate subset of the IPA to define all distinct audible tones of human speech, combined with the most minimalist and simple glyphic representation of that tone set, is something beyond the scope of my brain.  But in many ways it is an inevitable evolution for mankind to eventually optimize our speech and writing, in the same way most of the world is currently switching over to English as a common language.  Hopefully once we solve the thousand different languages problem, we can evolve up to a much more logical form of communication in both written and verbal form.  It makes software engineers cry less.

No comments:

Post a Comment