It occurs to me I have never written a reference for my stance on the copyright debacle of the 21st century, so I'll talk about my historical views of copyright, and where we go from here.
A long time ago, in a land far far away (at least an ocean away) the ideas of copyright were primarily motivated to prevent someone from taking your mathematical formulae, or written works, and claiming them as their own. It was wholly to protect an inventor of new things from having their "ideas" stolen.
In the US (this focuses on the US, because this is the land of copyright enforcement worldwide sadly) those policies meant that authors could publish their books with recourse if someone started making bootleg copies and trying to either sell or freely distribute them. They made the copyright on material last 25 years, which meant an author had effective monopoly on the distribution and usage of their ideas until that period expired, when it would enter public domain and anyone could use that work, without even citing a source.
In practice, this just meant that once something went public domain, you could nary profit off it anymore. In practice, it would mean that the price of a public domain book was only limited to the costs of actually printing and distributing said book, since market economics dictated that since the actual work printed was now free to use, anyone could print it.
This also applied to works of art, where bootleg replicas would violate the copyright of a painter, if you were to trace duplicate something, you could be taken to court by the original creator.
Disney came around in the 20th century, and on the nascent currents of a budding film industry, started creating animated films. Steamboat Willie, being the historical point of reference for copyright term today, being made in 1928. Since then, Disney has lobbied for extensions of copyright (I am still unaware of who the content creator of the film is attributed to in the lifetime + 70 years terminology) to keep that film out of public domain so that they can still claim ownership of Mickey Mouse.
Now, my opinion is that that is highly toxic to culture and society. By perpetually preventing the creative media of long dead authors from entering public domain it prevents modern artists from openly deriving works and perpetuating culture in a legally unambiguous way. Today, artists and authors create works of derivative value from still copyrighted material, even that which has absolutely become a part of culture (Bugs Bunny, Star Wars, James Bond). Even works as recent as the Lord of the Rings films or Harry Potter I would argue are absolutely essential culture in a significant portion of western society and media.
Instead of having the unambiguous law of deriving art and creative endeavor from public domain works, modern artists live in a society where an extremely huge portion of their inspiration is still under copyright of some corporate entity, even with long dead authors, and will be until there is some fundamental societal change that deems the continued perpetual copyright of everything unacceptable. Content creators are perpetually at the mercy of corporate entities that "own" the ideas behind almost everything in modern creative media, and that is tremendously harmful to society.
Likewise, very little culture from the last century is freely available. It is all under copyright, owned by some business that intends to sell and profit off limited distribution and legal monopoly for all time. This leads me to the critical point here, and why this is only becoming a really significant issue in the age of the internet.
Before everyone in the world became connected over electrical pulses across wires in the last 20 years, the act of distributing bootleg copies of creative works was itself a costly act. The physical video tapes, or photo paper, necessary to reproduce something was cost prohibitive enough that people wouldn't try to freely give material they possess without cost. Under those grounds, it becomes very obvious that someone selling copies is making money where the original creator should have been - under the original pretenses of copyright. Likewise, it was never frowned upon (and in many ways this is the folly of the industries backing draconian copyright law) for average joes to make copies of the media they possessed to share with friends and family, at least for a time between the 70s and 90s. Children would share cassette tapes, and parents would replicate a VHS tape to give to a neighbor, or loan it. They would share the media.
This also brings up another important distinction - since the olden times of copyright law, we have shifted the channels in which we impart and distribute the things we place under copyright. It is a violation of intellectual property to take a car produced by Ford, and rebuild it from scratch using the original car as a template. That violates Ford's patents on the designs of the vehicle. Patents, unlike copyright, exist as a way to say "I came up with this original invention rather than a work of art, and this is how I made it - nobody else can make this thing for some number of years, because I did it first". However, a significant portion of what was easily understood as patented - mechanical parts, schemata, building plans - and that which is copyrighted - works of art, writing, "ideas" that aren't "inventions" (even in the definitions they blur, though patents are specific enough to require a thorough specification of the patented good to be submitted to the patent office, as opposed to copyright which is assumed).
In the same realm as why software patents are horribly wrong, a lot of the ideas behind patents are socially destructive - if someone comes up with a medical breakthrough, rather than having market forces drive the costs of their creation to just the money it takes to make the product, they get an exclusive right to authorize its creation. This is one of the drivers for why the pharmaceutical industry is as large as it is - the cost of making a pill is scant, but you need to recoup millions in investment in creating a cure.
Patents, however, are still another whole pie of wrong and disaster to be dealt with later. Back to copyright.
Entering the 21st century, we are now able to distribute the things relegated to copyright - art, music, and IP - for free. We can recreate them, for free. We have intentionally been driven to represent them as information rather than any more physical form because information is cheap in the computer age. The revolution of the transistor has allowed us to convey knowledge for very little cost. The next revolution will be to convey the physical world at a similar lack of expense. But for now, we have the ability (and take advantage) of the capacity to replicate the numbers we store and transfer through these machines we have to whoever wants them, at an extremely negligible cost of electrisity on our parts. We do this freely now. Where the bootlegger in 1990 couldn't fathom shipping copies of Star Wars 6 to everyone in the US, he could stick it in a torrent and let anyone that wants it download it. It might take forever if nobody else participates in the sharing, but it is never financially impeding his ability to spread the knowledge he possesses.
And this requires a thorough definition - a VHS tape contains physically imprinted images on film. The film is magnified and displayed in rapid succession in a VCR to create the appearance of moving pictures. Physical pictures are pigments imbedded in some form of tree pulp or some other medium. Music is interesting in that we have never been able to recreate it in a physical medium - we have always been sending audio by way of electrical impulses or some other information form. Speakers themselves just reproduce sine waves of reverberations to create audible sounds. Sound itself is just distortions of air hitting ear drums. The only way to represent that is as a mathematical formula.
However, today, we don't store our videos on VHS tapes. Hell, our TVs never displayed VHS as magnified film - a VHS player would encode the images into a signal to be sent over coaxial cable, or S-video, or some other electrical medium as a numeric pattern of electrical pulses. That was already information. It is why we can rip the VHS tapes we had. It is how broadcast television works.
The medium we use today skips the intermediary physical form to massively increase available space and minimize costs - dvds and blu-ray disks are just magnitized platters storing numbers. The number forms we use, h264, or dalaa, or vp8, to "encode" a pixel map (and successive pixel maps to create the appearance of moving pictures) are all just mathematical formulae (that *&%# MPEG-LA can patent because of software patents). We take numbers, put them through a formula, and then use the resulting number as a pattern to electrically stimulate crystals in a display to generate certain colors in a grid. That produces a picture. We display pictures in rapid succession to create the appearance of motion.
That video is a number - the audio, also a number. Words are also numbers - unicode characters are just upwards of 4 bytes to denote a glyph representing some symbol from some language or other utility case. The machine will take the number and display the corresponding glyph it knows. Still numbers. Still information, still knowledge.
Because in the end, knowing a number is posessing knowledge - to know Pi is 3.1415826535 is to have information. To know the formulae to derive Pi is also knowledge.
Knowledge is cheap. It is easy to convey - we have passed knowledge through ages where physical possessions were much more scarce. History by word of mouth existed long before writing on physical possessions. It is easier to convey now more than ever - with the computer revolution, the distribution of knowledge anywhere on the Earth becomes exceedingly cheap compared to even half a century ago. A satellite can send electromagnetic radiation (radio waves) in a targeted direction to convey information. Interpret the wavelength or period of that waveform as a bit pattern, consider it in base 2, and you have any number. Any number can create any of the pictorial, videographic, or audible material ever produced if such material has been realized as such a number. Scanners, microphones, and cameras all work to capture the physical information (though all visual media is just capturing other electromagnetic radiation in the visible spectrum through the reflection of light off a pigmeted surface) can all be interpreted as such a number.
As a consequence, all we see, all we hear, all we sense has to be numbers, because we interpret them with brains that experience through electrical signals. Just like computers. The mediums through which we experience our environments are analogous to the mediums computers operate in, and thus our "world" is easily digitized.
Today, information is easy to convey. It is so inexpensive to reproduce a number digitally with a computer that it is effectively free. To convey the number, we have laid wiring to send numbers almost anywhere in the world. These wires are cheap, and the power necessary to send a signal over them is negligible. We can effectively freely convey information.
So we possess numbers. We have duplicates of some source number, be it Pi, or a Beatles song, the Illiad, a picture of your grandparents, or Star Trek: Wrath of Khan. These numbers are easy to replicate, and easy to distribute. Culture and experience are defined by our senses and how we process the world around us - through the same medium we can send information for free. This collective knowledge, and the culture and information contained therein, is physically able to be shared without cost and without hardship.
We don't do that, though, because we have laws originally meant to prevent bootleggers from undercutting an author selling their book. In my philosophy, even the bootlegger was fine to me - if you possess something, you should be able to do whatever you want with the knowledge you can derive from it, including recreating it, and distributing said copy.
The creation of knowledge is not something to be funded on a profit motive. The cost associated is in creating the knowledge - in forging it - not in distributing it, or replicating the result. A thousand years ago, trying to distribute a book was by far the dominate cost - and it was logical to charge for that. To recoup the costs of creating the content by making money on units. Today, the latter two are completely free, and the former is as expensive as ever, if not more so.
Charge the expense. If you want to create knowledge, ask people to pay you to create it. Don't abuse archaic law that artificially restricts the propagation of knowledge and information indefinitely as a means to create income. If you make something people want, people will pay you to make it. If those who possess the result never chose to distribute it, that is fine - it is a conscience choice. If someone does decide to freely release the knowledge, you should have no right to demand it not be given by others.
I am firmly against the ideas behind copyright and patent. I don't believe that true inventors and visionaries care about possessing an indefinite monopoly on the distribution of their creations. They create out of passion, and if they produce things of value, would easily find those who value their work to pay them to create more. Rather than having wealthy investors who are creating knowledge to profit from, knowledge should be funded by those who crave more knowledge and want to see it created.
We are at an extreme end of the spectrum - knowledge is never free, unless the creator goes out of their way to make it so. If they don't actively make it free, it will remain forever restricted and punishable by law to speak the numbers that reproduce this knowledge to the mind. Hopefully, we come back from the extreme. Maybe even one day, we will see the err of our ways as a species and realize knowledge is not something to profit from, but to freely share, yet it is something we need to value and put our resources towards seeing made.
As an addendum, I want to argue against the counterpoint to direct funding of knowledge creation - that people won't spend their money to see new content made if they don't have to pay to experience it. The problem is that people will consume resources they possess, even if they don't need to in most cases. Very few people are actually rational actors that conserve resources - if, suddenly, all the lost culture from the last century and for the foreseeable future became freely available, all the excess funds people would obtain, they would probably spend on physical possessions in truth. Then they would realize without funding, the creators of the knowledge they appreciate evaporates without monetary support, and they would literally put their money where they mouth is - something they like, like Harry Potter, they would actively put money into to see it happen. I absolutely think the contract negotiations in investing in the creation of this content requires something beyond the take-money-and-run guarantees of something like Kickstarter, but that is a negotiation of funding. It isn't some legal wall against information.
Another issue is attribution - I do believe in this. If you use something created by someone else, I would much prefer it require acknowledgement the source. I do think that sufficiently ambient culture becomes pervasive enough that you can't deceive about the original author, but in the limited case of few actors, you want to keep someone who created a painting from having someone else take it and claim it to be their own. So while I am firmly against laws against the distribution of knowledge, I do think attribution is still important, and should be maintained for the life of the creator, including in the case of a derivative work - though I don't want to see a judicial system bogged down with everyone claiming every other work was derivative, so I would rather see it where the direct usage of ideas of someone else, provable in direct quotation or reference, as requiring attribution. It goes back to the original purpose of copyright - preventing someone else from claiming your work as their own - not in preventing the spread of knowledge.
2013/02/09
2013/01/13
Software Rants 9: Sensible Filesystem Layout
I run Arch as my main OS now, and it is pleasing how Lib and Lib64 in / symlink to /usr/lib. So in this post I'm going to express my ideas behind filesystem layout and a way to make one that makes sense.
First, the aspects of a good VFS:
Resources is the collection of trasport protocols the machine supports, and subaddressing these directories accesses their resources (by address or dns resolution). This would include any locally mounted media not forming some other component of the filesystem.
The implication is that all resources are treated similarly. The remote ones are mounted by protocol, and local disks can (if you want) be mounted by fs type or the driver used to control them, the same way ftp, http, etc are all served by different daemons.
The socket layer is also in resources, and can provide external socket based network access or local IPC with message passing. Different daemons will process writes or reads from their own directories provided here. Resources is thus very dynamic, because it represents accessing everything provided by device controllers and daemons.
Users are the discretization of anything on a per-user basis. This includes programs, libraries, configurations, etc. Root isn't a special user, it is just a member of every group, and owns the top level directories. Each user has a personal Configuration directory, to hold application configuration.
Groups are an abstraction, like many things in this vfs - they can either be policy controls or file systems to be merged into the homes of the users that are members of them. For example, all users are members of default, and new user creation would inherit all application defaults from the default group. Until a user overrides default configuration, you could just have a symlink to default configurations, avoiding some redundancy. Any user who inherits a configuration from default could also have systemwide configuration changed to match. You could even create a default user to force usage of applications as default. Thus, if you ran something as default, you would always run it in its default configuration, and if a user doesn't have execute privileges on something they might have to run it as default. Sounds very nice in a corporate or locked down setting. I parethensize guest with default, because in general you want some "base" user everyone else inherits from. If you have a public user, that might be the guest account, or it might be a dedicated default account. Applications installed to this user would thus be accessable from everyone, and if they have execute privlidges in that user they could then have their own configurations and states stored locally for applications in one place.
Likewise, libraries could be treated the same way. The default or guest user could have their ~/libs as the fallback of all other library searches through any other user and any groups they are members of (that act as skeleton users). If you don't have a dedicated guest or default user, you could have the default group be its own filesystem containing a libs folder to search on, as could any other group. The idea here is that the user and group policy holds that you have a cascade search pattern from the perspective of a user - first the user itself, then the groups it is a member of in some profile defined precedence. This has the nice capacity to run applications, like in Linux, in user sandboxes. If the user has no grooup policy outside itself, and has all the applications it needs local to itself with any libraries, you could in effect jail and sandbox that session so it can't effect other users. You could even give it no access pirivlges to other top level directories to prevent it from having any outside interaction.
This also has a nice effect of providing easy mandatory access control - you can have default MAC in the default group, and per-user execution control for the things they are groups of, and elevated access control in the root account. I would definitely expect any next-gen OS to have MAC and security at the deepest levels, including the filesystem - that is why this VFS has per-user and per-application views of the environment.
Devices are the hardware components of the system to be accessible in their hardware form by daemons to control them. Daemons can individually publicize the control directories to other processes so they can either hide or show themselves. They can make virtual writable or read only files for block IO directly with them - the idea is that "displays" and "accelerators" would provide the resources for a display manager to provide a graphical environment, by showing a display accelerator (GPU) and any screens it can broadcast to (even networked ones (provided by link), which might be found under /Network/miracast/... as well).
System is another abstraction, provided by the kernel or any listening daemons on it. You can expect hardware devices to be shown here, including hardware disks to mount, or network controllers to inhereit. Since resources is an abstraction, hardware controllers for these devices use the System devices folder to access them, and their memory. In practice, a normal user shouldn't need a view on System at all, since application communication should be done over sockets, so /proc is only for the purpose of terminating other processes. An application can have a view of /proc to show itself, its children, and anything that makes itself visible. You shouldn't need signals, since with the socket layer you can just write a signal message to an application. The difference is rather than having dedicated longjumps to handle signals, an application needs some metric of processing the messages it receives. I think it is a better practice than having an arbitrary collection of signals that may or may not be implemented in program logic and have background kernel magic going on to jump a program to execute them.
I think this is much better than what we have in any OS, even Plan 9. Even if you don't consider a generic user, from a sysadmin standpoint, discretizing the difference between users, groups, and resources is a useful abstraction. I'd almost consider moving System itself into resources, since it is just the kernel providing utilities itself. You might want to allow applications to generate their own /Resources/ directories, maybe under a user subdirectory, to allow even more generic sharing and access to other processes goods.
First, the aspects of a good VFS:
- The ability to "search" on target directories to find things - libraries, executables, pictures - almost any search-able thing should have some simple directory you can search on and find the thing you are after.
- A layout that facilitates not searching at all, because search is expensive. You want the ability to say, with certainty, that the the foo library is in /something/dir/libs/foo. Even if foo isn't on this machine, there should be one path to foo that produces the library you want, in any environment (Windows completely fails on this, and the /usr/local and /usr/share nonsense on Linux does too, good thing almost nobody uses those anymore).
- Directory names that make sense. System32, etc, SysWOW64, usr, etc completely screw this up. So does Android, it calls the internal memory sdcard, and any sd card you actually plug into the device is external, and that is just from the non-root view of the filesystem.
- The ability to mount anywhere, and the ability to mount anything, be it actual storage media, a web server, a socket layer, process views, etc.
- The filesystem should be naturally a tree, with special files (symlinks, which should just be "links" because if you mount anything, you can "link" to a website under /net like /http/google.com/search?q=Google, or you could link to the root directory). Or to a process id. This creates graphs, but because graphs aren't inherent in the filesystem and are provided by specialized files, you can navigate a filesystem as a tree without risking loops if you ignore links.
- Forced file extensions or static typing by some metric. Don't leave ambiguity in what a file does, and make extensionless files illegal, and require definitions of what files do what - ex: if you have a binary data backage, use the extension .bin rather than nothing, because .bin would be registered as application specific binary data. If you have a utf-8 encoded text file, use .txt - the file would have a forced metadata header containing the encoding. Once you have an extension, you can embed file specific metadata that can be easily understood and utilized. Without extensions, you read the file from the beginning hoping to find something useful, or rely on metadata in the filesystem, which is often not transferable, especially over network connections.
- Root
- Boot
- Resources
- http
- google.com
- search?q=bacon
- ftp
- smb
- ssh
- temp
- AltimitSocket
- IPC Here
- TCP
- 80
- http
- Users
- Zanny
- Programs
- Videos
- Pictures
- Music
- Documents.dir
- Config.dir
- libs
- root
- guest (or default)
- Groups
- default
- printers
- display
- audio
- mount
- admin
- network
- devices
- virtual
- System
- power
- hypervisor
- firmware
- proc
- dev
- usb
- mice
- keyboards
- touchscreens
- displays
- disks
- by-label
- Storage.disk
- by-uuid
- printers
- microphones
- speakers
- memory
- mainboard
- processors
- accelerators
Resources is the collection of trasport protocols the machine supports, and subaddressing these directories accesses their resources (by address or dns resolution). This would include any locally mounted media not forming some other component of the filesystem.
The implication is that all resources are treated similarly. The remote ones are mounted by protocol, and local disks can (if you want) be mounted by fs type or the driver used to control them, the same way ftp, http, etc are all served by different daemons.
The socket layer is also in resources, and can provide external socket based network access or local IPC with message passing. Different daemons will process writes or reads from their own directories provided here. Resources is thus very dynamic, because it represents accessing everything provided by device controllers and daemons.
Users are the discretization of anything on a per-user basis. This includes programs, libraries, configurations, etc. Root isn't a special user, it is just a member of every group, and owns the top level directories. Each user has a personal Configuration directory, to hold application configuration.
Groups are an abstraction, like many things in this vfs - they can either be policy controls or file systems to be merged into the homes of the users that are members of them. For example, all users are members of default, and new user creation would inherit all application defaults from the default group. Until a user overrides default configuration, you could just have a symlink to default configurations, avoiding some redundancy. Any user who inherits a configuration from default could also have systemwide configuration changed to match. You could even create a default user to force usage of applications as default. Thus, if you ran something as default, you would always run it in its default configuration, and if a user doesn't have execute privileges on something they might have to run it as default. Sounds very nice in a corporate or locked down setting. I parethensize guest with default, because in general you want some "base" user everyone else inherits from. If you have a public user, that might be the guest account, or it might be a dedicated default account. Applications installed to this user would thus be accessable from everyone, and if they have execute privlidges in that user they could then have their own configurations and states stored locally for applications in one place.
Likewise, libraries could be treated the same way. The default or guest user could have their ~/libs as the fallback of all other library searches through any other user and any groups they are members of (that act as skeleton users). If you don't have a dedicated guest or default user, you could have the default group be its own filesystem containing a libs folder to search on, as could any other group. The idea here is that the user and group policy holds that you have a cascade search pattern from the perspective of a user - first the user itself, then the groups it is a member of in some profile defined precedence. This has the nice capacity to run applications, like in Linux, in user sandboxes. If the user has no grooup policy outside itself, and has all the applications it needs local to itself with any libraries, you could in effect jail and sandbox that session so it can't effect other users. You could even give it no access pirivlges to other top level directories to prevent it from having any outside interaction.
This also has a nice effect of providing easy mandatory access control - you can have default MAC in the default group, and per-user execution control for the things they are groups of, and elevated access control in the root account. I would definitely expect any next-gen OS to have MAC and security at the deepest levels, including the filesystem - that is why this VFS has per-user and per-application views of the environment.
Devices are the hardware components of the system to be accessible in their hardware form by daemons to control them. Daemons can individually publicize the control directories to other processes so they can either hide or show themselves. They can make virtual writable or read only files for block IO directly with them - the idea is that "displays" and "accelerators" would provide the resources for a display manager to provide a graphical environment, by showing a display accelerator (GPU) and any screens it can broadcast to (even networked ones (provided by link), which might be found under /Network/miracast/... as well).
System is another abstraction, provided by the kernel or any listening daemons on it. You can expect hardware devices to be shown here, including hardware disks to mount, or network controllers to inhereit. Since resources is an abstraction, hardware controllers for these devices use the System devices folder to access them, and their memory. In practice, a normal user shouldn't need a view on System at all, since application communication should be done over sockets, so /proc is only for the purpose of terminating other processes. An application can have a view of /proc to show itself, its children, and anything that makes itself visible. You shouldn't need signals, since with the socket layer you can just write a signal message to an application. The difference is rather than having dedicated longjumps to handle signals, an application needs some metric of processing the messages it receives. I think it is a better practice than having an arbitrary collection of signals that may or may not be implemented in program logic and have background kernel magic going on to jump a program to execute them.
I think this is much better than what we have in any OS, even Plan 9. Even if you don't consider a generic user, from a sysadmin standpoint, discretizing the difference between users, groups, and resources is a useful abstraction. I'd almost consider moving System itself into resources, since it is just the kernel providing utilities itself. You might want to allow applications to generate their own /Resources/ directories, maybe under a user subdirectory, to allow even more generic sharing and access to other processes goods.
2013/01/01
2013
One, January 1st 12:00 AM EST DST is dumb. My New Years is now the Winter Solstice at 0:00 GMT.
I have three major objectives this year:
I graduated. I think college is overpriced garbage for credentials that become meaningless when everyone and their mother with money can just toss them out lackadaisically. It is an outdated education model in an era when we can engage with knowledge at an infinite level through instantaneous ubiquitous unlimited communication through networks. In college, here is my take away on a per semester basis in college in my major:
I really feel that a lecture environment stifles creativity and specialization to a fault. It limits the students to the views of the teacher, and they have the time commitment of the total student body split between those professors. And I had small class sizes. My largest CS class was by far CS1, and since then the most students in one course was AI with like ~20 students. The average was 10. I can't fault my professors, they put in the effort. They also were teaching computer science, which is mostly the theory of computation, and it isn't directly applicable to the field.
But that is the problem. The theory is great, but we aren't living in times of leisure and excess, we are losing that at a tremendous rate. We don't have the money to blow on these 4 year degrees that don't teach anything essential to livelihood. I don't feel inherently enlightened by going to college, I was self-teaching myself astronomy and physics my last two years of high school. I learned about Quarks, Neutron Stars, the fundamental forces, and such through wikipedia. I learned C++ from the cppreference that I have been contributing to. Teaching yourself what you want to know is more possible and easier than ever before, and it needs to take hold in culture as the way to learn, because it is the only way to truly learn and enjoy it. At least for me, and anyone like me. Maybe someone likes the lectures, the tangential topics, the boring information digesting. I didn't, and I look forward to the future.
I don't think there will be any dramatic tech shifts in 2013, by the way. I hope I come to eat these words, but it seems like 2013 is the maturity of Android as a gaming platform with the release of Tegra 4, and Google's global conquest really comes to fruition as they take over the computing space with their mobile OS. Windows will flounder, qt5 will be awesome, I hope to see (maybe by my hand) KDE running on Wayland. I don't think we will get consumer grade robotics, 3d printing, or automated vehicles this year. We might see the hinting at something maybe coming in 2014, but this is a year of transition to the next thing. I hope I can get involved in whatever that is, not for profit, but for importance. I want to do big things. I live in fear of doing insignificant things in my time, and it is the biggest factor holding me back.
I have three major objectives this year:
- Get a job in my industry. After 6 months I haven't found anything, but I will press on. I'm really starting to lean towards freelancing because I really don't want to be locked into a 40 hour a week job. That is so much of my time that I would rather spend working on my own projects and learning.
- Get active in some project. Probably KDE, but I want it somewhere in the complete package Linux stack. I'm not going to try to reinvent the wheel by forking what other people have done and letting it atrophy out in obscurity, I'm going to engage with persistent projects to make them better. It is the only way to get the desktop metaphor mature.
- Tear down my mother's old house. She pays it off this year, and the main motivator for me not going 12 hours a day job hunting is that I don't want to move out just to have to come back in a few weeks or months to help move everything out and tear the thing down. But it has to be done, this is the year to do it, I'll see it through to fruition.
I graduated. I think college is overpriced garbage for credentials that become meaningless when everyone and their mother with money can just toss them out lackadaisically. It is an outdated education model in an era when we can engage with knowledge at an infinite level through instantaneous ubiquitous unlimited communication through networks. In college, here is my take away on a per semester basis in college in my major:
- Freshman 1: Writing functions, very basic data types, Monte Hall problem.
- Freshman 2: Object orientation, static types, basic containers, big O.
- Sophomore 1: Stack overflows, basic assembly, more containers, testing.
- Sophomore 2: Design patterns, C, Swing, threading in Java.
- Junior 1: Basic kernel concepts, how shells and pipes work.
- Junior 2: Data visualization algorithms, openMP, source control.
I really feel that a lecture environment stifles creativity and specialization to a fault. It limits the students to the views of the teacher, and they have the time commitment of the total student body split between those professors. And I had small class sizes. My largest CS class was by far CS1, and since then the most students in one course was AI with like ~20 students. The average was 10. I can't fault my professors, they put in the effort. They also were teaching computer science, which is mostly the theory of computation, and it isn't directly applicable to the field.
But that is the problem. The theory is great, but we aren't living in times of leisure and excess, we are losing that at a tremendous rate. We don't have the money to blow on these 4 year degrees that don't teach anything essential to livelihood. I don't feel inherently enlightened by going to college, I was self-teaching myself astronomy and physics my last two years of high school. I learned about Quarks, Neutron Stars, the fundamental forces, and such through wikipedia. I learned C++ from the cppreference that I have been contributing to. Teaching yourself what you want to know is more possible and easier than ever before, and it needs to take hold in culture as the way to learn, because it is the only way to truly learn and enjoy it. At least for me, and anyone like me. Maybe someone likes the lectures, the tangential topics, the boring information digesting. I didn't, and I look forward to the future.
I don't think there will be any dramatic tech shifts in 2013, by the way. I hope I come to eat these words, but it seems like 2013 is the maturity of Android as a gaming platform with the release of Tegra 4, and Google's global conquest really comes to fruition as they take over the computing space with their mobile OS. Windows will flounder, qt5 will be awesome, I hope to see (maybe by my hand) KDE running on Wayland. I don't think we will get consumer grade robotics, 3d printing, or automated vehicles this year. We might see the hinting at something maybe coming in 2014, but this is a year of transition to the next thing. I hope I can get involved in whatever that is, not for profit, but for importance. I want to do big things. I live in fear of doing insignificant things in my time, and it is the biggest factor holding me back.
2012/12/31
Software Rants 8: The "Linux" Desktop
This is mostly a post for the sake of copy-pasta in the future.
For one, I have fallen to the darkside: qt5 and KDE have won me, after spending a bit of time tweaking a KDE install I can't get over that the ideology underlying all the eyecandy is what I'm after. I am still hesitant by the stupid reliance on having their own libraries for everything, but in recent years the KDE camp seems to be getting more inclusive, so I guess now is the time to jump on that bandwagon. I know that Plasma Active and Plasma Workspaces is the future, even if one needs maturity and the other desperately needs optimizing.
But there are other realities to the Linux desktop that are coalescing - and I think we are in the home stretch of the final maturity of the platform, at least when Wayland hits the ground running. All the old and ill thought out technologies of the 90s are mostly gone, and as we plateau the efficiency of the classic computing paradigm, the requisite technologies to support that reality are finally emerging. Here I want to talk about what they are, and why I feel that way.
I think KDE has the right ideas. It might be slow, the default theme might be ugly as hell, but it is so freaking configurable in a really intuitive way that I can't fault it. XFCE, LXDE, and any other project that aims at "simplicity" I feel is really copping out nowadays. Simple can't mean "doesn't provide a complete package" but it is being used as an excuse. Simple is when you have good defaults, and multiple layers of configuration (in XFCE's defense, it nails this - you have the defaults, you can tweak the panels by context menus, you can then go into a system settings gui tree, then you can enter a gconf database editor and tweak even deeper, there are 3 layers of configurability, each incrementally more fine grained, larger, and less comprehensive, but it nails the format).
KDE is not golden - I have avoided it for a long time. They depend a lot on eyecandy but the optimizations just aren't there. But the ideology is in the right place, having targeted generic computing and mobile computing as completely different use cases, having configurability as a top priority, and having reasonable defaults. Except for the dumb desktop Cashew, I have no idea why that isn't just a removable panel, and it is concerning that the devs care to keep it so badly when it has strong community backlash. But it is the right direction to go in. I don't think the Ubuntu model is going to peter out much longer on the fumes it runs on - the end game just isn't there. Ubuntu TV might get good when it lands, so maybe it can take a niche as a DVR OS that also functions as a powerful desktop, but it won't make it in the mobile space, and it alienates its core users by stuffing ideology down ones throat. Also, Linux's strength is in a collaborative platform like KDE or XFCE, not in some principled elite council delegation platform like Gnome or Unity.
So I'm going to put my OSS contributions towards the KDE stack in the future. I prefer C++, so qt is a natural fit. razorQT is a neat fork, but ultimately it is too similar to the kde platform to compete. I feel anyone who would migrate over to razor would be better suited just optimizing KDE to make it run better than to try to start over from scratch again.
If KDE becomes the first comprehensive desktop with Wayland support, that will be the final nail in the Gnome coffin. The future of the Linux desktop is pretty much here, and it isn't a distribution, it is a software suite.
I don't really like how much effort KDE puts into competing in the application space though. In the end, the desktop is there to frame applications, not permeate them, and while Calibre is a great book management framework, I think the KDE project spreads itself too thin trying to provide the desktop and the user experience front. So while I may be running a KDE desktop, I'll be using Thunderbird, Firefox, Deluge, Skype, Clementine (which is a fork of Amarok... heh) and Libre Office rather than the underdeveloped KDE alternatives, because theose other projects have focus on one product that makes it all the better. That is what makes FOSS best - people congregate around projects with clear objectives and goals, and make them happen, not nebulous archives of thousands of projects trying to reproduce the work of another million developers in other projects. So if I end up getting a KDE development setup going, it will be working on the core. Though Dolphin seems to be a pretty good file manager.
Also, GTK applications in a kde dark theme look like ass. I'm going to have to blog about fixing that.
For one, I have fallen to the darkside: qt5 and KDE have won me, after spending a bit of time tweaking a KDE install I can't get over that the ideology underlying all the eyecandy is what I'm after. I am still hesitant by the stupid reliance on having their own libraries for everything, but in recent years the KDE camp seems to be getting more inclusive, so I guess now is the time to jump on that bandwagon. I know that Plasma Active and Plasma Workspaces is the future, even if one needs maturity and the other desperately needs optimizing.
But there are other realities to the Linux desktop that are coalescing - and I think we are in the home stretch of the final maturity of the platform, at least when Wayland hits the ground running. All the old and ill thought out technologies of the 90s are mostly gone, and as we plateau the efficiency of the classic computing paradigm, the requisite technologies to support that reality are finally emerging. Here I want to talk about what they are, and why I feel that way.
- Linux Kernel: For its monolithic, toss everything in kernel space, write it in Assembly, and make it the most illegible mess of a massive project ever, it does work, and now that pretty much every device and server exists in some elevated kernel-daemon unsolicited love affair, it is fast and stable. Be it the DRM and DRI for video, ALSA for audio, the IP stack and IP tables for networking, the file system implementations, or the core CPU scheduling / memory management, hardware is finally coming under the control of the kernel for the first time. The only real great leap I still see on this front is the implementation of openCL pervasively throughout the kernel and support infrastructure to utilize graphics hardware on consumer products appropriately. We still run all this stuff cpu side, and a lot of it can see some gpgpu optimization. This is still a few years out, and it will require the usage of gallium throughout the underlying system as an emulation layer on server and ancient hardware without openCL capable gpu hardware, but a slight memory overhead and constant instruction cost across the board will absolutely be worth taking advantage of the most pervasive modern hardware inclusion - a large collection of weak, high latency, high bandwidth parallel compute units, that are becoming increasingly generic for purpose.
- Comprehensive user space daemon collection: This is a meta topic. I Think the Linux space is finally stabilizing on a collection of portable servers to intermediate the kernel hardware control that are sufficiently feature dense to support any use case. Their continued active development means they are much more likely to adapt to new technology faster than having KDE / Gnome / XFCE / LXDE / Openbox / etc try to do everything their way and have no portability.
- Wayland: Once X dies and Wayland takes over, video on Linux becomes "complete". Video drivers then just need to implement the various dialects of openGL and CL, and not worry about integrating with a rendering technology from the 80s. The simplification of visuals will be the greatest boon since completely fair scheduling. I absolutely want to get involved on this front - I see a cohesive, beautiful and simplistic delegation of responsibility emerging that can and probably will prove revolutionary, especially as gaming comes to the platform in force. I hope Valve motivates the adoption of Wayland quickly as a consequence.
- Pulseaudio: It might be latency heavy, but that can be optmized. Having a central audio manager though is essential, and the generic nature of Pulse means it is pretty much an unstoppable force in audio at this point. As long as it continues to let audio nuts stick Jack in (maybe they could even cohabitate the space better, letting certain applications get Pulse passthrough to Jack, or supporting Jack as an independent audio sink).
- Systemd: Old init styles are out, and systemd does a lot right by being significantly based in file manipulation. To enable and disable services, you just add or break file system links. Systemd might be a kitchen sink, but considering it sits on top of a kitchen sink kernel, it seems appropriate. Systemd is rapidly becoming the user space kernel, which isn't necessarily a bad thing. However, the configuration and speed is superior to the competition.
- Dbus: The last two used dbus as its IPC channel. KDE has adopted it, Gnome made it, it is here to stay as the main protocol for IPC. Message passing is the way to go, and dbus can optmize itself enough internally to make it perfectly reasonable in 99.999% of use cases. It might not be that generic socket layer I liked in my earlier postings, but a lot of about Linux isn't pure, but it still works.
I think KDE has the right ideas. It might be slow, the default theme might be ugly as hell, but it is so freaking configurable in a really intuitive way that I can't fault it. XFCE, LXDE, and any other project that aims at "simplicity" I feel is really copping out nowadays. Simple can't mean "doesn't provide a complete package" but it is being used as an excuse. Simple is when you have good defaults, and multiple layers of configuration (in XFCE's defense, it nails this - you have the defaults, you can tweak the panels by context menus, you can then go into a system settings gui tree, then you can enter a gconf database editor and tweak even deeper, there are 3 layers of configurability, each incrementally more fine grained, larger, and less comprehensive, but it nails the format).
KDE is not golden - I have avoided it for a long time. They depend a lot on eyecandy but the optimizations just aren't there. But the ideology is in the right place, having targeted generic computing and mobile computing as completely different use cases, having configurability as a top priority, and having reasonable defaults. Except for the dumb desktop Cashew, I have no idea why that isn't just a removable panel, and it is concerning that the devs care to keep it so badly when it has strong community backlash. But it is the right direction to go in. I don't think the Ubuntu model is going to peter out much longer on the fumes it runs on - the end game just isn't there. Ubuntu TV might get good when it lands, so maybe it can take a niche as a DVR OS that also functions as a powerful desktop, but it won't make it in the mobile space, and it alienates its core users by stuffing ideology down ones throat. Also, Linux's strength is in a collaborative platform like KDE or XFCE, not in some principled elite council delegation platform like Gnome or Unity.
So I'm going to put my OSS contributions towards the KDE stack in the future. I prefer C++, so qt is a natural fit. razorQT is a neat fork, but ultimately it is too similar to the kde platform to compete. I feel anyone who would migrate over to razor would be better suited just optimizing KDE to make it run better than to try to start over from scratch again.
If KDE becomes the first comprehensive desktop with Wayland support, that will be the final nail in the Gnome coffin. The future of the Linux desktop is pretty much here, and it isn't a distribution, it is a software suite.
I don't really like how much effort KDE puts into competing in the application space though. In the end, the desktop is there to frame applications, not permeate them, and while Calibre is a great book management framework, I think the KDE project spreads itself too thin trying to provide the desktop and the user experience front. So while I may be running a KDE desktop, I'll be using Thunderbird, Firefox, Deluge, Skype, Clementine (which is a fork of Amarok... heh) and Libre Office rather than the underdeveloped KDE alternatives, because theose other projects have focus on one product that makes it all the better. That is what makes FOSS best - people congregate around projects with clear objectives and goals, and make them happen, not nebulous archives of thousands of projects trying to reproduce the work of another million developers in other projects. So if I end up getting a KDE development setup going, it will be working on the core. Though Dolphin seems to be a pretty good file manager.
Also, GTK applications in a kde dark theme look like ass. I'm going to have to blog about fixing that.
2012/12/26
Software Rants 7 : Function Signatures
While thinking about Al, one thing that would be really nice if all object definitions behaved the same, akin to C++11 universal initializing syntax. For one, classes and functions should be defined like normal data, so if we use a syntax like int x = 5, you want to have class foo = ?. The first question on this front is what is the minimum syntax to define any data. The best way to deduce how to best go about this is to look how it is done in other languages, and it isn't that complex.
- In the C syntax languages with static typing, an integer is always int x = 5, or if you want to heap applocate it, you do int *x = malloc(sizeof(int)); *x = 5;
- In Python and Ruby, you use dynamic typing and forgo the int part, so it is x = 5, but in map declarations it is x : 5.
- Perl uses $ name declarations, so it has $x = 5.
- Haskell just defines numbers like Int foo -> 5.
- In Shell, it is x=5.
- In Javascript it is usually var, let, or nothing x = 5, but in maps it is x : 5.
Regardless, the syntax is consistent. In Al, you would have static strict typing, in that int x : 5.3 will error of an unspecified cast from float to int with loss of precision. int x : "s" fails. auto x : 5 resolves to an integer type, and you can drop the auto and just have x : 5 which behaves like auto.
As an aside bitwise operations like | & and ^ are becoming more and more unutiized next to their logical counterparts, I'd definitely reserve | & and ^ for or, and, and exponentiation respectively. If I were to have glyphic bitwise operations I'd use || && or ^^ for those, if anything. I'd probably just reserve bitxor, bitand, and bitor as keywords and forgo the glyphs since they are so situational.
So if we have a syntax of int x : 5, a function would be func foo : ?. We want templated function definitions, so our function should be something like func< int (int) > in C++ but the syntax int ( int) isn't conductive of a comma separated value list like a template specification. Once again, we want to minimize the syntax, so the smallest definition of data with all unnecessary information remove would be something like:
int < int, int foo : int < int x, int y { }. If < wasn't less than, it would just be the function signifier to deliminate a return value and the function arguments. This syntax leaves something wanting though, so we try the verbose optimally readable way:
func[int]<int, int> foo : [int](int x, int y) {
}
This looks a *lot* like C++ lambdas. On purpose. The capture group just fills the roll of the return type declaration, but if we want a function signature we need that information. Function templates get uglier:
template<Z> func[Z]<int, Z> foo : template<Z>[Z](int x, Z z) {
}
This happens because the rvalue requires a template definition but so does the l value because that acts as the signature. We redefine the signature twice. This is absurdly redundant, so what we want is the concept that if we have an rvalue function we never redefine a signature because the static typing of the function declaration already brought that up.
template<Z> func[Z]<int, Z> foo : func(x, z) {
}
So the arguments types were defined in the signature, and we named them in the definition. The problem here is that if you were to declare but not define foo for some time (given we even allow that, we would probably want this language to forbid nulls, and all functions are inherently a type of reference into code space) then when you actually do define it, you end up with something like:
foo : func(x, z) {
}
And that definition gives no information to the original signature.
Of course, in practice, you can't do this. You can't have late binding on a function because it occupies built up code space, not stack or heap space. Defining it in two places is practically worthless because the compiled code locks foo to be its function at compile time and you can't reassign it because it means you have code pages no longer named. That means that raw function declarations are inherently constant and final. Meanwhile, something like :
ref<func> bar; bar : ref(foo) is valid. You are taking references to defined functions, but you must name them at least once and they can't go unnamed.
The same thing happens with classes. It might be why in traditional language syntaxes classes and functions don't behave like the data types they represent, because the definitions are disjoint from implementations. While classes and functions are data types, they are data type definitions - you instantiate instances of them. They are inherently constant and final and bound to their definition names. So if you use a global declarative syntax like foo : func, you introduce ambiuity without making the full definition something really ugly like:
const final template<Z> func[Z]<int, Z> foo : func(x, z) {}.
So let us save some time and call it template<Z> func foo : Z(int x, Z z) {}. Maybe have state defined like func:private,extern,static foo(int x, float y) : Zoo {}. It is kind of backwards, because it implies func : type of return is the signature rather than thing : value, but the value of a function is its return time if it is pure, for example.
2012/12/23
Hello World in Al and why?
I wrote up a pastebin post that specifies a bit of my little pet language idea. Just to get into the nitty gritty compared to the status quo languages and new contenders:
- C: C is way too old, and has a ton of problems. Glyphic syntaxes like pointers, textual inclusion, no baked in template or polymorphism to facilitate dynamic code, no objects, no access modification to help team based development. Function objects are convoluted and error prone due to their existence as effectively a void*.
- C++: While an improvement on C, a lot of what C++ does also falls flat. It tacks on object orientation rather than being systemic with it, so interacting with C data types like ints is convoluted. Structs are preserved as just default-public versions of classes that are default-private. The smart pointers I feel properly implement memory management, but it is a little too little too late. There is a lot of duplication of purpose in the standard because it is very committee driven. As a result, this language is like a bathtub for a kitchen sink with a tendency to spring leaks.
- Haskell: The adherence to the functional ideal is admirable, but I fundamentally disagree with the treatment of the process of constructing tasks for the execution of physical transistors as if it were only math. Statefulness is an inherent property in computers, and while it introduces overhead, it is necessary for any programming model, and Haskell is no exception in the line of functional languages that try to hide state in a leaky way. Also, being single paradigm, it isn't all encompassing enough.
- Go: The syntax is pretty cumbersome, it lacks static typing, its memory model isn't fine grained enough for what a real systems language needs to guarantee behaviors. It throws all its proverbial utility into parallel programming, but by doing so they make the entire language a pain in the butt to use and full of non-determinism. So it is a good niche language for massively parallel projects, but otherwise insufficient for a generic compiled language.
- OCaml: lacks static typing, and has a really glyphic syntax. I really do think this is a big deal, static typing makes determinism so much easier. If you are object oriented having everything discretized into nice object boxes is a miracle for debugging and maintainability. I do like how you can use OCaml as a script or as a native binary.
- Rust: again, inferred variables. Not allowing null pointers is an interesting proposition I would definitely want to look into. The concurrency model of message passing I feel is the appropriate solution, and it has an emphasis on async functionality for small execution units. I'd rather build a threadpool and queue tasks into that in a program, but to each their own.
- The code should be deterministic. No construct should be more than a sentence to explain how it runs when assembled, and as such the compiler should never be overtly complex.
- Garbage collection is appropriate in some use cases. The best way to do it is let users opt into using a garbage collector, like most standard library functionality that would break the previous policy.
- A threading model based off the usage of functions as first class objects and the availability of a concise and easy to use socket interface to send efficient messages across processes. Inter-process communication can also be done over that socket layer, done over some internalized message passing, or memory can be shared through references. A thread would not have visibility on another threads stack, but would see global variables and could be given references to stack or heap objects explicitly.
- Smart pointers! They solve the memory problem nicely.
- Don't fear references / pointers. Rather than having the glyphic * and & syntaxes of the C's, we can use a ref<> template object to refer to a pointer to something. I like the non-nullable ideal, so maybe this object needs to be constructed with a reference, and attempting to set it to null throws an exception.
- Exceptions are good! Goto as well, none of the execution mode switchers are inherently flawed. Code is naturally jumping all over the place, just minimize the hard to trace parts.
- The glyphs should be minimized. You want an easily read language with an optional whitespace significant dialect.
- Module based inclusion - importing a module could mean importing a text file, something already compiled, an entire archive of files or binaries, or even a function in a file or binary. This means you have a unified access view into other namespaces.
- Access modifiers! They make peer programming so much easier.
- Designed with the intent to be both callable and interfacable with the other Altimit languages. You can't kitchen sink all problems. This language is meant to be able to give you the tools to maximize performance without needing to have overly complex syntax or requiring more work on your part than necessary. But by default, it needs to be deterministic and safe. The bytecode language would be easily sandboxed for application development, and the script language would be easily used as glue, easily live interpreted, and high on programmer productivity. Using them together could give you a powerful toolkit that nobody else seems to try to build cleanly.
I wonder if I'm ever going to go anywhere with all this crazy talk. We shall see. I might write more pastebin samples.
2012/12/22
Software Rants 1: Everything is wrong all the time
It occurs to me that this post was a draft for a long while. I wrote a paragraph, tossed it out because it was rubbish (wait, I thought all my musings on this blog were) and probably should have a part 1 to a series that would be more tongue in cheek to have a part 0.
But to write this entry I have to get behind why I wrote the title, couldn't come up with anything sound for this post, and gave up on it for 2 months. In simplest terms, I am an idiot and I have a hard time understanding complicated things I haven't spent years developing habits and muscle memory to recognize. As a result, when in software-land, there are a trifecta of problems that contribute to this - software complexity is huge, people try to minimize it, the means by which they minimize it specialize the industry and make trying to be a "master of computers" like I keep trying to pull off a lost cause. It is why I keep going back to wanting to do my own computing environment, because the most fundamental problem is that none of this was ever expected. New things happen every day, and the unexpected contributes to all the complexity in this industry today.
If I wanted to hack on a software project someone else made, I would need to get it out of source control, so I need to understand the syntax and behavior of that properly. I'm putting it on a hard drive, so I need to know how disk caching works, how the IO bus handles requests, how it determines spin speed, how a laser magnetizes a platter spinning at thousands of RPM, why they settled on 5v power (even though the damn power connector gives 3.3, 5, and 12 volt power, and nobody uses 3.3 volts, I wonder why, hey power supplies why you gotta break AC into 3 DC voltages and barely use 2 of them?). How the file system works (b trees, allocation tables (distributed or contiguous) file distribution on the drive (aka Windows vs Linux style), how solid state storage memory works using capacitors rather than transistors like processor cache, how the south bridge and north bridge interconnect sends data, how SAS and SATA differ, how an operating system can recognize a SATA bus, determine the verison of the connection, schedule writes, treat a PCI, USB, SATA, or IDE hard drive as basically the same thing even when they are radically different. And we haven't even gotten to the files in a folder yet, just where we are putting them. We didn't even touch on TCP, UDP stacks, the different sockets, different variations of CAT cable, how hertz modulation over an ethernet line determines the bandwidth, how to differentiate power from data, analog vs digital signal processing.
The amount of stuff is huge. And a lot of it is hard stuff. Yet somehow a 5 year old can use a tablet computer to play a game, utilizing all of this and more (graphical interconnects, switching video and physical memory, caching, synchronization, SIMD instructions, opengl driver implementations, DMI interfacing) and doesn't know a wink of it. And it may be my greatest crime that there is almost nothing I can use without understanding to a degree that I could put it back together by hand with a welding iron and a few billion dollars worth of silicon fabrication and imprinting of transistors.
So the necessary time commitments to understand the entire beast are honestly too large. So I prioritize the things I care about most, and often end up iteratively moving through the process - a CPU composed of ALUs, FPUs, caches, registers, a TLB, maybe a power modulator, the northbridge logic, a memory interconnect, physical RAM operating at certain timings, refresh rates, clock rates, channel sizes, how the operating system utilizes page tables to manage memory, how TLBs cache pages in layered cache on the processor.
And just the hardware itself is overwhelming. So you get past that and you arrive in software land, and start at a bios that exists in battery run ROM on a motherboard (and don't get me started on all that circuitry) that initializes memory and devices, and depending on if you are running EFI or BIOS you end up either executing code from a partition table or you search partitions for a system partition to find executables to run.
And on top of all this hardware complexity, we have dozens of ways of writing an OS, different binary formats, different assembly languages for different processors, different dialects of C that usually ends up at the base of it all (because we are stupidly set in our ways, to be honest. C is so flawed... ) We pile on top of that extreme language complexity (C++) or extreme environment complexity (Java / C#) or extreme interpreting complexity (Python, JS) where something is extremely complicated to understand but essential to actually understanding what is going on. And then you have your programming paradigms, and have Perl, Haskell, Clojure, and a dozen other functional languages or other strange syntaxes out there people use just to make your brain explode. Yet people much smarter than myself can read these like real second languages.
It might be the fault that after 6 years of Spanish going in one ear and out the other, I am firmly grounded in English with no hope to speak anything else. Mainly because my brain is crappy. But in the same sense, I like my programming languages being new vocabularies rather than new languages that break my brain. But I don't think it is even my problem entirely - it is evident from the amount of failure in software space, the extreme desire for "talent", and the general cluelessness of both developers and customers in building and using this stuff that things have, just under the surface, gotten really out of hand. And I really think that the best way to fix it is a clean slate now, so we don't end up with the Latin Alphabet problem (which coincidentally is the other blog I'm posting at the same time as this one).
So even though this post comes out 6 entries into this series on software, it does establish the baseline for why I rant about the things I do - everything is broken, and wrong, and nobody wants to take the time to just do it right. Mainly because we are fleshy meatbags that only persist for a blink of time in space, require the flesh of living things to persist, and have an unhealthy obsession with rich people doing stupid crap.
But to write this entry I have to get behind why I wrote the title, couldn't come up with anything sound for this post, and gave up on it for 2 months. In simplest terms, I am an idiot and I have a hard time understanding complicated things I haven't spent years developing habits and muscle memory to recognize. As a result, when in software-land, there are a trifecta of problems that contribute to this - software complexity is huge, people try to minimize it, the means by which they minimize it specialize the industry and make trying to be a "master of computers" like I keep trying to pull off a lost cause. It is why I keep going back to wanting to do my own computing environment, because the most fundamental problem is that none of this was ever expected. New things happen every day, and the unexpected contributes to all the complexity in this industry today.
If I wanted to hack on a software project someone else made, I would need to get it out of source control, so I need to understand the syntax and behavior of that properly. I'm putting it on a hard drive, so I need to know how disk caching works, how the IO bus handles requests, how it determines spin speed, how a laser magnetizes a platter spinning at thousands of RPM, why they settled on 5v power (even though the damn power connector gives 3.3, 5, and 12 volt power, and nobody uses 3.3 volts, I wonder why, hey power supplies why you gotta break AC into 3 DC voltages and barely use 2 of them?). How the file system works (b trees, allocation tables (distributed or contiguous) file distribution on the drive (aka Windows vs Linux style), how solid state storage memory works using capacitors rather than transistors like processor cache, how the south bridge and north bridge interconnect sends data, how SAS and SATA differ, how an operating system can recognize a SATA bus, determine the verison of the connection, schedule writes, treat a PCI, USB, SATA, or IDE hard drive as basically the same thing even when they are radically different. And we haven't even gotten to the files in a folder yet, just where we are putting them. We didn't even touch on TCP, UDP stacks, the different sockets, different variations of CAT cable, how hertz modulation over an ethernet line determines the bandwidth, how to differentiate power from data, analog vs digital signal processing.
The amount of stuff is huge. And a lot of it is hard stuff. Yet somehow a 5 year old can use a tablet computer to play a game, utilizing all of this and more (graphical interconnects, switching video and physical memory, caching, synchronization, SIMD instructions, opengl driver implementations, DMI interfacing) and doesn't know a wink of it. And it may be my greatest crime that there is almost nothing I can use without understanding to a degree that I could put it back together by hand with a welding iron and a few billion dollars worth of silicon fabrication and imprinting of transistors.
So the necessary time commitments to understand the entire beast are honestly too large. So I prioritize the things I care about most, and often end up iteratively moving through the process - a CPU composed of ALUs, FPUs, caches, registers, a TLB, maybe a power modulator, the northbridge logic, a memory interconnect, physical RAM operating at certain timings, refresh rates, clock rates, channel sizes, how the operating system utilizes page tables to manage memory, how TLBs cache pages in layered cache on the processor.
And just the hardware itself is overwhelming. So you get past that and you arrive in software land, and start at a bios that exists in battery run ROM on a motherboard (and don't get me started on all that circuitry) that initializes memory and devices, and depending on if you are running EFI or BIOS you end up either executing code from a partition table or you search partitions for a system partition to find executables to run.
And on top of all this hardware complexity, we have dozens of ways of writing an OS, different binary formats, different assembly languages for different processors, different dialects of C that usually ends up at the base of it all (because we are stupidly set in our ways, to be honest. C is so flawed... ) We pile on top of that extreme language complexity (C++) or extreme environment complexity (Java / C#) or extreme interpreting complexity (Python, JS) where something is extremely complicated to understand but essential to actually understanding what is going on. And then you have your programming paradigms, and have Perl, Haskell, Clojure, and a dozen other functional languages or other strange syntaxes out there people use just to make your brain explode. Yet people much smarter than myself can read these like real second languages.
It might be the fault that after 6 years of Spanish going in one ear and out the other, I am firmly grounded in English with no hope to speak anything else. Mainly because my brain is crappy. But in the same sense, I like my programming languages being new vocabularies rather than new languages that break my brain. But I don't think it is even my problem entirely - it is evident from the amount of failure in software space, the extreme desire for "talent", and the general cluelessness of both developers and customers in building and using this stuff that things have, just under the surface, gotten really out of hand. And I really think that the best way to fix it is a clean slate now, so we don't end up with the Latin Alphabet problem (which coincidentally is the other blog I'm posting at the same time as this one).
So even though this post comes out 6 entries into this series on software, it does establish the baseline for why I rant about the things I do - everything is broken, and wrong, and nobody wants to take the time to just do it right. Mainly because we are fleshy meatbags that only persist for a blink of time in space, require the flesh of living things to persist, and have an unhealthy obsession with rich people doing stupid crap.
Thinking about Alphabets
Since I've wrote a bunch about basically throwing away 30 years of work done by engineers significantly smarter than me, it occurs that you should really question everything when devising a new computer environment, besides just making a new character set that eliminates the more pointless glyphs of low order Unicode. One aspect of that might be going as far as to redefining the alphabet used.
After all, when complaining about things being outdated and obsoleted by new technology and ideas, the 2,700 year old glyph system (albeit with added and removed glyphs over time) that forms the base of all language in the western world is a good candidate for reconsideration. A lot of characters in the set are redundant - C and K, Y and I (and E). In that sense, I am a fan of the International Phonetic Alphabet, which is a glyph system representing the pronounceable vocabulary of the human vocal tract. It includes both the lung-based sounds, it has an extensions and sub-groups for clicks and other non-pulmonary sounds, and in the end it represents the goal of written language - to trans-code spoken language. We can read a large spectrum of glyphs, and if we wanted we could encode them in a wide color gaumet, but our audible range is much more limited - the IPA has 107 characters, but in practice only around ~30 of them are significant, and if you got technical enough to create an alphabet with the discrete independent elements of spoken language, you could probably manage around that number.
But this isn't an infallible problem with a simple solution - the reason students don't hear about the IPA is because it is what many things with the word international in them the glyph system it uses is a hodge-podge mix of a dozen languages and dialects since some don't use the full range of human enunciation. The result is that a lot of characters in the IPA are absurd multicharacter strings like ʥ, tᶣ, and ŋ̋. Even though the modern Latin derived English alphabet leaves much to be desired, the glyphic complexity pretty much limited to a worst case of m or j. So one objective of such a universal enunciation based alphabet, besides representing the proper human audible range, while not having redundant characters, is to have the simplest set of glyphs possible.
A good example of this is I. A vertical bar is a capital i. A vertical bar with a half foot is a capital L. T is also pretty simple, as are N, Z, and V. Thees all have 3 or fewer strokes in their structure, and have little subtle interrupt in their form. In the same way humans read words by recognizing the entire glyph structure rather than the individual letters, having the least complex, most definitive glyphs represent the alphabet makes it the easiest to learn, recognize, and write.
The amount of work required to actually scientifically define the appropriate subset of the IPA to define all distinct audible tones of human speech, combined with the most minimalist and simple glyphic representation of that tone set, is something beyond the scope of my brain. But in many ways it is an inevitable evolution for mankind to eventually optimize our speech and writing, in the same way most of the world is currently switching over to English as a common language. Hopefully once we solve the thousand different languages problem, we can evolve up to a much more logical form of communication in both written and verbal form. It makes software engineers cry less.
2012/12/21
Some math about that "digital interface to rule them all"
I remarked in my future OS post that we should standardize on one digital interface format for everything digital. One way to solve the most pervasive problem in that domain (that the bandwidth use cases of devices differs quite a bit) can be solved with dynamic frequency clocking of the interface buses. An easy way to solve this is to have a hot-plug handshake between devices - when a connection is made, each side sends standardized information about itself (what it does, what frequencies its capable of, etc) and the interface uses the lowest frequency both support that sufficiently fills bandwidth requirements of the interconnect. So you could in theory have a multi-gigahertz interconnect where low bandwidth devices (like input devices) could run at only a few hundred hertz.
Some general numbers:
The most bandwidth intensive activity I can conceptualize for this interconnect, besides just going all out maximizing the bandwidth for usage as a supercomputer interconnect, is about 10 GB/s or 80 gb/s. I get this number using 16:10 Ultra-HD at 48 bit color depth (12 bits per color channel + 12 bit alpha channel) at 180hz (this comes from the idea of using 3d displays - 90hz is a good standard that most human eyes can't differentiate very well, just like 4k is a good resolution where most eyes won't see the difference at 250 PPI from a distance of 2 feet - I wouldn't ever use a 3d display, but the standard should anticipate that). Given that current Displayport can reach 18gb/s, increasing that 5 fold in time for this "concept" to actually matter is completely feasible. Worst case scenario, you just include more packet channels. Comparatively, we have pretty much clobbered the limits of audio and codecs like Opus just make compression even better over time, so I'm not worried about 15 speaker surround sound being the next big thing.
But just as a consideration, if you were to go completely audio-phile crazy and used 20 mbit/s lossless audio on 33 speakers. That still only constitutes 660 mbit, less than a gigabit. In practice, if you are transferring raw video without some kind of encoding compression, it will be the absolute dominator in bandwidth utilization.
So a target of 10GB/s sounds good, especially considering that also works as an interface bandwidth target per lane inside the chip architecture. If running at peak throughput, you would be effectively running a cpu-northbridge interconnect over a wire.
If we use fiber channel for this connector standard, it can be the ultra-low-latency needed to be as multi-purpose as it needs to be. However, while it could be ultra-high bandwidth with low latency when needed, you could also disable all but one data channel and run it at a few hundred hertz for a few megabytes of bandwidth, which is also a keyboard or mouse should need.
I could also see color coded cables indicating the peak frequency on the line supported - a 100' cable probably wouldn't be able to run at whatever frequency is needed to supply 10GB/s without active modulation, whereas a 20' cable should be able to. This also means it makes a good ethernet standard, because realistic network bandwidth won't pass a gigabit for a long while, and getting a gigabit on a multiplexed cable like this will be cakewalk at really low frequencies.
I really don't even want to consider analog interfaces with this configuration. It should be all digital video, audio, etc. You could always stick an analog converter chip on top of this interface like a traditional PCI device anyway.
I also would only want one standard connector, preferrably as small as possible, with an optional lock mechanism. Just from my personal view on the matter, but having 3 standards of usb, micro usb, hdmi, mini hdmi, micro hdmi, mini displayport, displayport, etc is just absurd. If the maximum connectivity can be obtained on a smaller connector foot the extra few cents to build a better circuit.
Just as a footnote, real world 100gbit Ethernet should eventually hit market, and that would be the perfect standard for this.
Some general numbers:
The most bandwidth intensive activity I can conceptualize for this interconnect, besides just going all out maximizing the bandwidth for usage as a supercomputer interconnect, is about 10 GB/s or 80 gb/s. I get this number using 16:10 Ultra-HD at 48 bit color depth (12 bits per color channel + 12 bit alpha channel) at 180hz (this comes from the idea of using 3d displays - 90hz is a good standard that most human eyes can't differentiate very well, just like 4k is a good resolution where most eyes won't see the difference at 250 PPI from a distance of 2 feet - I wouldn't ever use a 3d display, but the standard should anticipate that). Given that current Displayport can reach 18gb/s, increasing that 5 fold in time for this "concept" to actually matter is completely feasible. Worst case scenario, you just include more packet channels. Comparatively, we have pretty much clobbered the limits of audio and codecs like Opus just make compression even better over time, so I'm not worried about 15 speaker surround sound being the next big thing.
But just as a consideration, if you were to go completely audio-phile crazy and used 20 mbit/s lossless audio on 33 speakers. That still only constitutes 660 mbit, less than a gigabit. In practice, if you are transferring raw video without some kind of encoding compression, it will be the absolute dominator in bandwidth utilization.
So a target of 10GB/s sounds good, especially considering that also works as an interface bandwidth target per lane inside the chip architecture. If running at peak throughput, you would be effectively running a cpu-northbridge interconnect over a wire.
If we use fiber channel for this connector standard, it can be the ultra-low-latency needed to be as multi-purpose as it needs to be. However, while it could be ultra-high bandwidth with low latency when needed, you could also disable all but one data channel and run it at a few hundred hertz for a few megabytes of bandwidth, which is also a keyboard or mouse should need.
I could also see color coded cables indicating the peak frequency on the line supported - a 100' cable probably wouldn't be able to run at whatever frequency is needed to supply 10GB/s without active modulation, whereas a 20' cable should be able to. This also means it makes a good ethernet standard, because realistic network bandwidth won't pass a gigabit for a long while, and getting a gigabit on a multiplexed cable like this will be cakewalk at really low frequencies.
I really don't even want to consider analog interfaces with this configuration. It should be all digital video, audio, etc. You could always stick an analog converter chip on top of this interface like a traditional PCI device anyway.
I also would only want one standard connector, preferrably as small as possible, with an optional lock mechanism. Just from my personal view on the matter, but having 3 standards of usb, micro usb, hdmi, mini hdmi, micro hdmi, mini displayport, displayport, etc is just absurd. If the maximum connectivity can be obtained on a smaller connector foot the extra few cents to build a better circuit.
Just as a footnote, real world 100gbit Ethernet should eventually hit market, and that would be the perfect standard for this.
2012/12/20
Software Rants 6: How to Reinvent the Wheel of Servicing
In my post about new OS paradigms, I remarked about how you won't replace IP as *the* network protcol and how we should all just bow down to its glory.
However, one thing that can easily change (and does, all the time, multiple times a week) is change how we interact over that protocol. It is why we even have URI's, and we have everything from ftp:// to http:// to steam:// protoctols over IP packets. I want to bring up some paralells I see between this behavior, "classic" oprating system metaphors, and the relatively modern concept of treating everything as files circa Plan 9 and my stupid ramblings.
If I was writing an application for a popular computing platform, I would be using a system call interface into the operating system, some kind of message bus service (like dbus) for communicating with most service, I would personally use some kind of temp file as an interchange but I could also open a Unix socket as a means of IPC. Or maybe I go really crazy and start using OS primitives to share memory pages. Any way you slice it, you are effectively picking and choosing protocols - be it the Unix socket "protocol", the system call "protocol", etc. In Plan 9 / crazy peoples world, you forgo having protocols in favor of a a file system, where you can access sockets, system calls, memory pages, etc as files. You use a directory tree structure rather than distinct programmatic syntaxes to interface with things, and the generic nature improves the interchangeability, ease of learning, and in some cases it can be a performance gain since you are using significantly less overhead in using a kernel VFS manager to handle the abstractions.
If I took this concept to the net, I wouldn't want to specify a protocol in an address. I would want the protocols to be abstracted by a virtual file system, so in the same way I mentioned /net/Google.com/reader should resolve as the address of Google's reader, you could be more specific and try something like /net/Google.com:80/reader.https (this is a generic example using the classic network protocols) where you can be specific about how resources are opened (in the same way you use static file system typing to declare how to handle files). But this treats Google.com as a file system in and of itself - and if you consider how we navigate most of these protcols, we end up treating them as virtual file servers all the same. The differentiation is in how we treat the server as a whole.
In current usage, interacting with ftp://mozilla.org and http://mozilla.org produces completely different results, because ftp requests are redirected to an ftp server and http ones are directed to an http server. But https doesn't inherently mean use a different server, because it just means sticking a TLS layer on top of the communication, but the underlying behavior of either end resolves the same - packets from one are generated, boxed in an encrypted container, shipped, decrypted on the receiving end, and then processed all the same. That is in many ways more elegant than the non-transparent designation of what server process to interact with at an address based solely off the URI designation.
So what I would rather see is, in keeping with that VFS model, a virtual mount of a remote server under syntax like /net/Google.com producing a directory containing https, ftp, mail, jabber, etc, where an application would be able to easily just mount a remote server, and depending on the visible folders derive supported operations just off that.
Likewise, authentication becomes important. /net/zanny@google.com would be expected to produce (with an authentication token, be it a cached key or a password) *my* view of this server, in the same way users and applications would get different views of a virtual file system.
This leads to a much more cleaner distinction of tasks, because in the current web paradigm, you usually have a kernel IP stack managing inbound packets on ports, where to send them, and such. You register Apache on port 80 and 443 and then it decodes the packets received on those ports (which now sounds even more redundant, because you are using a url specifier and a port, but the problem becomes that ports are not nearly as clear as protocols).
So in a vfs network filesystem, determining the available protocols on a webserver should be simpler, by just looking at a top level directory of the public user on that server, instead of querying a bunch of protocols for responses. Be it via file extensions or ports, it would still be an improvement.
However, one thing that can easily change (and does, all the time, multiple times a week) is change how we interact over that protocol. It is why we even have URI's, and we have everything from ftp:// to http:// to steam:// protoctols over IP packets. I want to bring up some paralells I see between this behavior, "classic" oprating system metaphors, and the relatively modern concept of treating everything as files circa Plan 9 and my stupid ramblings.
If I was writing an application for a popular computing platform, I would be using a system call interface into the operating system, some kind of message bus service (like dbus) for communicating with most service, I would personally use some kind of temp file as an interchange but I could also open a Unix socket as a means of IPC. Or maybe I go really crazy and start using OS primitives to share memory pages. Any way you slice it, you are effectively picking and choosing protocols - be it the Unix socket "protocol", the system call "protocol", etc. In Plan 9 / crazy peoples world, you forgo having protocols in favor of a a file system, where you can access sockets, system calls, memory pages, etc as files. You use a directory tree structure rather than distinct programmatic syntaxes to interface with things, and the generic nature improves the interchangeability, ease of learning, and in some cases it can be a performance gain since you are using significantly less overhead in using a kernel VFS manager to handle the abstractions.
If I took this concept to the net, I wouldn't want to specify a protocol in an address. I would want the protocols to be abstracted by a virtual file system, so in the same way I mentioned /net/Google.com/reader should resolve as the address of Google's reader, you could be more specific and try something like /net/Google.com:80/reader.https (this is a generic example using the classic network protocols) where you can be specific about how resources are opened (in the same way you use static file system typing to declare how to handle files). But this treats Google.com as a file system in and of itself - and if you consider how we navigate most of these protcols, we end up treating them as virtual file servers all the same. The differentiation is in how we treat the server as a whole.
In current usage, interacting with ftp://mozilla.org and http://mozilla.org produces completely different results, because ftp requests are redirected to an ftp server and http ones are directed to an http server. But https doesn't inherently mean use a different server, because it just means sticking a TLS layer on top of the communication, but the underlying behavior of either end resolves the same - packets from one are generated, boxed in an encrypted container, shipped, decrypted on the receiving end, and then processed all the same. That is in many ways more elegant than the non-transparent designation of what server process to interact with at an address based solely off the URI designation.
So what I would rather see is, in keeping with that VFS model, a virtual mount of a remote server under syntax like /net/Google.com producing a directory containing https, ftp, mail, jabber, etc, where an application would be able to easily just mount a remote server, and depending on the visible folders derive supported operations just off that.
Likewise, authentication becomes important. /net/zanny@google.com would be expected to produce (with an authentication token, be it a cached key or a password) *my* view of this server, in the same way users and applications would get different views of a virtual file system.
This leads to a much more cleaner distinction of tasks, because in the current web paradigm, you usually have a kernel IP stack managing inbound packets on ports, where to send them, and such. You register Apache on port 80 and 443 and then it decodes the packets received on those ports (which now sounds even more redundant, because you are using a url specifier and a port, but the problem becomes that ports are not nearly as clear as protocols).
So in a vfs network filesystem, determining the available protocols on a webserver should be simpler, by just looking at a top level directory of the public user on that server, instead of querying a bunch of protocols for responses. Be it via file extensions or ports, it would still be an improvement.
Subscribe to:
Posts (Atom)