2013/01/13

Software Rants 9: Sensible Filesystem Layout

I run Arch as my main OS now, and it is pleasing how Lib and Lib64 in / symlink to /usr/lib. So in this post I'm going to express my ideas behind filesystem layout and a way to make one that makes sense.

First, the aspects of a good VFS:
  • The ability to "search" on target directories to find things - libraries, executables, pictures - almost any search-able thing should have some simple directory you can search on and find the thing you are after.
  • A layout that facilitates not searching at all, because search is expensive. You want the ability to say, with certainty, that the the foo library is in /something/dir/libs/foo. Even if foo isn't on this machine, there should be one path to foo that produces the library you want, in any environment (Windows completely fails on this, and the /usr/local and /usr/share nonsense on Linux does too, good thing almost nobody uses those anymore).
  • Directory names that make sense. System32, etc, SysWOW64, usr, etc completely screw this up. So does Android, it calls the internal memory sdcard, and any sd card you actually plug into the device is external, and that is just from the non-root view of the filesystem.
  • The ability to mount anywhere, and the ability to mount anything, be it actual storage media, a web server, a socket layer, process views, etc.
  • The filesystem should be naturally a tree, with special files (symlinks, which should just be "links" because if you mount anything, you can "link" to a website under /net like /http/google.com/search?q=Google, or you could link to the root directory). Or to a process id. This creates graphs, but because graphs aren't inherent in the filesystem and are provided by specialized files, you can navigate a filesystem as a tree without risking loops if you ignore links.
  • Forced file extensions or static typing by some metric. Don't leave ambiguity in what a file does, and make extensionless files illegal, and require definitions of what files do what - ex: if you have a binary data backage, use the extension .bin rather than nothing, because .bin would be registered as application specific binary data. If you have a utf-8 encoded text file, use .txt - the file would have a forced metadata header containing the encoding. Once you have an extension, you can embed file specific metadata that can be easily understood and utilized. Without extensions, you read the file from the beginning hoping to find something useful, or rely on metadata in the filesystem, which is often not transferable, especially over network connections.
With these considerations, here is a layout I'd like:
  • Root
    • Boot
    • Resources
      • http
        • google.com
          • search?q=bacon
      • ftp
      • smb
      • ssh
      • temp
      • AltimitSocket
        • IPC Here 
      • TCP
        • 80
          • http
    • Users
      • Zanny
        • Programs
        • Videos
        • Pictures
        • Music
        • Documents.dir
        • Config.dir
        • libs
      • root
      • guest (or default)
    • Groups
      • default
      • printers
      • display
      • audio
      • mount
      • admin
      • network
      • devices
      • virtual
    • System
      • power
      • hypervisor
      • firmware
      • proc
      • dev
        • usb
        • mice
        • keyboards
        • touchscreens
        • displays
        • disks
          • by-label
            • Storage.disk
          • by-uuid
        • printers
        • microphones
        • speakers
        • memory
        • mainboard
        • processors
        • accelerators
Boot should be kind of obvious, it contains the kernel payload and anything the kernel needs to immediately set up the running environment, which would probably include drivers to propagate the base file system, plus whatever init is.

Resources is the collection of trasport protocols the machine supports, and subaddressing these directories accesses their resources (by address or dns resolution). This would include any locally mounted media not forming some other component of the filesystem.
The implication is that all resources are treated similarly. The remote ones are mounted by protocol, and local disks can (if you want) be mounted by fs type or the driver used to control them, the same way ftp, http, etc are all served by different daemons.

The socket layer is also in resources, and can provide external socket based network access or local IPC with message passing. Different daemons will process writes or reads from their own directories provided here. Resources is thus very dynamic, because it represents accessing everything provided by device controllers and daemons.

Users are the discretization of anything on a per-user basis. This includes programs, libraries, configurations, etc. Root isn't a special user, it is just a member of every group, and owns the top level directories. Each user has a personal Configuration directory, to hold application configuration.

Groups are an abstraction, like many things in this vfs - they can either be policy controls or file systems to be merged into the homes of the users that are members of them. For example, all users are members of default, and new user creation would inherit all application defaults from the default group. Until a user overrides default configuration, you could just have a symlink to default configurations, avoiding some redundancy. Any user who inherits a configuration from default could also have systemwide configuration changed to match. You could even create a default user to force usage of applications as default. Thus, if you ran something as default, you would always run it in its default configuration, and if a user doesn't have execute privileges on something they might have to run it as default. Sounds very nice in a corporate or locked down setting. I parethensize guest with default, because in general you want some "base" user everyone else inherits from. If you have a public user, that might be the guest account, or it might be a dedicated default account. Applications installed to this user would thus be accessable from everyone, and if they have execute privlidges in that user they could then have their own configurations and states stored locally for applications in one place. 

Likewise, libraries could be treated the same way. The default or guest user could have their ~/libs as the fallback of all other library searches through any other user and any groups they are members of (that act as skeleton users). If you don't have a dedicated guest or default user, you could have the default group be its own filesystem containing a libs folder to search on, as could any other group. The idea here is that the user and group policy holds that you have a cascade search pattern from the perspective of a user - first the user itself, then the groups it is a member of in some profile defined precedence. This has the nice capacity to run applications, like in Linux, in user sandboxes. If the user has no grooup policy outside itself, and has all the applications it needs local to itself with any libraries, you could in effect jail and sandbox that session so it can't effect other users. You could even give it no access pirivlges to other top level directories to prevent it from having any outside interaction.

 This also has a nice effect of providing easy mandatory access control - you can have default MAC in the default group, and per-user execution control for the things they are groups of, and elevated access control in the root account. I would definitely expect any next-gen OS to have MAC and security at the deepest levels, including the filesystem - that is why this VFS has per-user and per-application views of the environment.

Devices are the hardware components of the system to be accessible in their hardware form by daemons to control them. Daemons can individually publicize the control directories to other processes so they can either hide or show themselves. They can make virtual writable or read only files for block IO directly with them - the idea is that "displays" and "accelerators" would provide the resources for a display manager to provide a graphical environment, by showing a display accelerator (GPU) and any screens it can broadcast to (even networked ones (provided by link), which might be found under /Network/miracast/... as well).

System is another abstraction, provided by the kernel or any listening daemons on it. You can expect hardware devices to be shown here, including hardware disks to mount, or network controllers to inhereit. Since resources is an abstraction, hardware controllers for these devices use the System devices folder to access them, and their memory. In practice, a normal user shouldn't need a view on System at all, since application communication should be done over sockets, so /proc is only for the purpose of terminating other processes. An application can have a view of /proc to show itself, its children, and anything that makes itself visible. You shouldn't need signals, since with the socket layer you can just write a signal message to an application. The difference is rather than having dedicated longjumps to handle signals, an application needs some metric of processing the messages it receives. I think it is a better practice than having an arbitrary collection of signals that may or may not be implemented in program logic and have background kernel magic going on to jump a program to execute them.

I think this is much better than what we have in any OS, even Plan 9. Even if you don't consider a generic user, from a sysadmin standpoint, discretizing the difference between users, groups, and resources is a useful abstraction. I'd almost consider moving System itself into resources, since it is just the kernel providing utilities itself. You might want to allow applications to generate their own /Resources/ directories, maybe under a user subdirectory, to allow even more generic sharing and access to other processes goods.

2013/01/01

2013

One, January 1st 12:00 AM EST DST is dumb. My New Years is now the Winter Solstice at 0:00 GMT.

I have three major objectives this year:

  1. Get a job in my industry. After 6 months I haven't found anything, but I will press on. I'm really starting to lean towards freelancing because I really don't want to be locked into a 40 hour a week job. That is so much of my time that I would rather spend working on my own projects and learning.
  2. Get active in some project. Probably KDE, but I want it somewhere in the complete package Linux stack. I'm not going to try to reinvent the wheel by forking what other people have done and letting it atrophy out in obscurity, I'm going to engage with persistent projects to make them better. It is the only way to get the desktop metaphor mature.
  3. Tear down my mother's old house. She pays it off this year, and the main motivator for me not going 12 hours a day job hunting is that I don't want to move out just to have to come back in a few weeks or months to help move everything out and tear the thing down. But it has to be done, this is the year to do it, I'll see it through to fruition.
 Thinking back to 2012, I learned more in 6 months out of college than I did in 18 years of schooling, sans my senior year of High School, when I had plenty of time to read wikipedia and took a bunch of actually informative AP courses. My knowledge density of practical things increases exponentially when I'm doing self research. I am, however, dissatisfied that I didn't document my insights better or archive my coding appropriately.

 I graduated. I think college is overpriced garbage for credentials that become meaningless when everyone and their mother with money can just toss them out lackadaisically. It is an outdated education model in an era when we can engage with knowledge at an infinite level through instantaneous ubiquitous unlimited communication through networks. In college, here is my take away on a per semester basis in college in my major:

  • Freshman 1: Writing functions, very basic data types, Monte Hall problem.
  • Freshman 2: Object orientation, static types, basic containers, big O.
  • Sophomore 1: Stack overflows, basic assembly, more containers, testing.
  • Sophomore 2: Design patterns, C, Swing, threading in Java.
  • Junior 1: Basic kernel concepts, how shells and pipes work.
  • Junior 2: Data visualization algorithms, openMP, source control.
Since I graduated, I learned C++11, lambdas, move semantics, templates, how compilers behave, shell script, Python3, Regular Expressions, encryption, a bunch of Mono, Javascript in entirety, how to write Json, how web servers work, a truck load of Linux sysadmin tools when building computers and learning Arch, and basic html and css.

I really feel that a lecture environment stifles creativity and specialization to a fault. It limits the students to the views of the teacher, and they have the time commitment of the total student body split between those professors. And I had small class sizes. My largest CS class was by far CS1, and since then the most students in one course was AI with like ~20 students. The average was 10. I can't fault my professors, they put in the effort. They also were teaching computer science, which is mostly the theory of computation, and it isn't directly applicable to the field.

But that is the problem. The theory is great, but we aren't living in times of leisure and excess, we are losing that at a tremendous rate. We don't have the money to blow on these 4 year degrees that don't teach anything essential to livelihood. I don't feel inherently enlightened by going to college, I was self-teaching myself astronomy and physics my last two years of high school. I learned about Quarks, Neutron Stars, the fundamental forces, and such through wikipedia. I learned C++ from the cppreference that I have been contributing to. Teaching yourself what you  want to know is more possible and easier than ever before, and it needs to take hold in culture as the way to learn, because it is the only way to truly learn and enjoy it. At least for me, and anyone like me. Maybe someone likes the lectures, the tangential topics, the boring information digesting. I didn't, and I look forward to the future.

I don't think there will be any dramatic tech shifts in 2013, by the way. I hope I come to eat these words, but it seems like 2013 is the maturity of Android as a gaming platform with the release of Tegra 4, and Google's global conquest really comes to fruition as they take over the computing space with their mobile OS. Windows will flounder, qt5 will be awesome, I hope to see (maybe by my hand) KDE running on Wayland. I don't think we will get consumer grade robotics, 3d printing, or automated vehicles this year. We might see the hinting at something maybe coming in 2014, but this is a year of transition to the next thing. I hope I can get involved in whatever that is, not for profit, but for importance. I want to do big things. I live in fear of doing insignificant things in my time, and it is the biggest factor holding me back.