2012/12/31

Software Rants 8: The "Linux" Desktop

This is mostly a post for the sake of copy-pasta in the future.

For one, I have fallen to the darkside: qt5 and KDE have won me, after spending a bit of time tweaking a KDE install I can't get over that the ideology underlying all the eyecandy is what I'm after. I am still hesitant by the stupid reliance on having their own libraries for everything, but in recent years the KDE camp seems to be getting more inclusive, so I guess now is the time to jump on that bandwagon. I know that Plasma Active and Plasma Workspaces is the future, even if one needs maturity and the other desperately needs optimizing.

But there are other realities to the Linux desktop that are coalescing - and I think we are in the home stretch of the final maturity of the platform, at least when Wayland hits the ground running. All the old and ill thought out technologies of the 90s are mostly gone, and as we plateau the efficiency of the classic computing paradigm, the requisite technologies to support that reality are finally emerging. Here I want to talk about what they are, and why I feel that way.

  • Linux Kernel: For its monolithic, toss everything in kernel space, write it in Assembly, and make it the most illegible mess of a massive project ever, it does work, and now that pretty much every device and server exists in some elevated kernel-daemon unsolicited love affair, it is fast and stable. Be it the DRM and DRI for video, ALSA for audio, the IP stack and IP tables for networking, the file system implementations, or the core CPU scheduling / memory management, hardware is finally coming under the control of the kernel for the first time. The only real great leap I still see on this front is the implementation of openCL pervasively throughout the kernel and support infrastructure to utilize graphics hardware on consumer products appropriately. We still run all this stuff cpu side, and a lot of it can see some gpgpu optimization. This is still a few years out, and it will require the usage of gallium throughout the underlying system as an emulation layer on server and ancient hardware without openCL capable gpu hardware, but a slight memory overhead and constant instruction cost across the board will absolutely be worth taking advantage of the most pervasive modern hardware inclusion - a large collection of weak, high latency, high bandwidth parallel compute units, that are becoming increasingly generic for purpose.
  • Comprehensive user space daemon collection: This is a meta topic. I Think the Linux space is finally stabilizing on a collection of portable servers to intermediate the kernel hardware control that are sufficiently feature dense to support any use case. Their continued active development means they are much more likely to adapt to new technology faster than having KDE / Gnome / XFCE / LXDE / Openbox / etc try to do everything their way and have no portability.
  • Wayland: Once X dies and Wayland takes over, video on Linux becomes "complete". Video drivers then just need to implement the various dialects of openGL and CL, and not worry about integrating with a rendering technology from the 80s. The simplification of visuals will be the greatest boon since completely fair scheduling. I absolutely want to get involved on this front - I see a cohesive, beautiful and simplistic delegation of responsibility emerging that can and probably will prove revolutionary, especially as gaming comes to the platform in force. I hope Valve motivates the adoption of Wayland quickly as a consequence.
  • Pulseaudio: It might be latency heavy, but that can be optmized. Having a central audio manager though is essential, and the generic nature of Pulse means it is pretty much an unstoppable force in audio at this point. As long as it continues to let audio nuts stick Jack in (maybe they could even cohabitate the space better, letting certain applications get Pulse passthrough to Jack, or supporting Jack as an independent audio sink). 
  • Systemd: Old init styles are out, and systemd does a lot right by being significantly based in file manipulation. To enable and disable services, you just add or break file system links. Systemd might be a kitchen sink, but considering it sits on top of a kitchen sink kernel, it seems appropriate. Systemd is rapidly becoming the user space kernel, which isn't necessarily a bad thing. However, the configuration and speed is superior to the competition.
  • Dbus: The last two used dbus as its IPC channel. KDE has adopted it, Gnome made it, it is here to stay as the main protocol for IPC. Message passing is the way to go, and dbus can optmize itself enough internally to make it perfectly reasonable in 99.999% of use cases. It might not be that generic socket layer I liked in my earlier postings, but a lot of about Linux isn't pure, but it still works.
On top of these resources, we put fille managers, compositors, desktop managers, and other goodies. I'm siding with KDE - qt is much better a framework than GTK because C++ is a much more usable language than C is with its superior syntax and support for almost every programming concept under the sun. While I think it is a horrible break in Unix philosophy, the integration of webkit and v8 into the qt to support web rendering internally and javascript via QML means it is ready for the integrated web world of the future. GTK is still struggling along on this front, and the entire Gnome project is floundering in the face of trying to jump into a mobile market already saturated with Android.

I think KDE has the right ideas. It might be slow, the default theme might be ugly as hell, but it is so freaking configurable in a really intuitive way that I can't fault it. XFCE, LXDE, and any other project that aims at "simplicity" I feel is really copping out nowadays. Simple can't mean "doesn't provide a complete package" but it is being used as an excuse. Simple is when you have good defaults, and multiple layers of configuration (in XFCE's defense, it nails this - you have the defaults, you can tweak the panels by context menus, you can then go into a system settings gui tree, then you can enter a gconf database editor and tweak even deeper, there are 3 layers of configurability, each incrementally more fine grained, larger, and less comprehensive, but it nails the format).

KDE is not golden - I have avoided it for a long time. They depend a lot on eyecandy but the optimizations just aren't there. But the ideology is in the right place, having targeted generic computing and mobile computing as completely different use cases, having configurability as a top priority, and having reasonable defaults. Except for the dumb desktop Cashew, I have no idea why that isn't just a removable panel, and it is concerning that the devs care to keep it so badly when it has strong community backlash. But it is the right direction to go in. I don't think the Ubuntu model is going to peter out much longer on the fumes it runs on - the end game just isn't there. Ubuntu TV might get good when it lands, so maybe it can take a niche as a DVR OS that also functions as a powerful desktop, but it won't make it in the mobile space, and it alienates its core users by stuffing ideology down ones throat. Also, Linux's strength is in a collaborative platform like KDE or XFCE, not in some principled elite council delegation platform like Gnome or Unity.

So I'm going to put my OSS contributions towards the KDE stack in the future. I prefer C++, so qt is a natural fit. razorQT is a neat fork, but ultimately it is too similar to the kde platform to compete. I feel anyone who would migrate over to razor would be better suited just optimizing KDE to make it run better than to try to start over from scratch again.

If KDE becomes the first comprehensive desktop with Wayland support, that will be the final nail in the Gnome coffin. The future of the Linux desktop is pretty much here, and it isn't a distribution, it is a software suite.

I don't really like how much effort KDE puts into competing in the application space though. In the end, the desktop is there to frame applications, not permeate them, and while Calibre is a great book management framework, I think the KDE project spreads itself too thin trying to provide the desktop and the user experience front. So while I may be running a KDE desktop, I'll be using Thunderbird, Firefox, Deluge, Skype, Clementine (which is a fork of Amarok... heh) and Libre Office rather than the underdeveloped KDE alternatives, because theose other projects have focus on one product that makes it all the better. That is what makes FOSS best - people congregate around projects with clear objectives and goals, and make them happen, not nebulous archives of thousands of projects trying to reproduce the work of another million developers in other projects. So if I end up getting a KDE development setup going, it will be working on the core. Though Dolphin seems to be a pretty good file manager.

Also, GTK applications in a kde dark theme look like ass. I'm going to have to blog about fixing that.

2012/12/26

Software Rants 7 : Function Signatures

While thinking about Al, one thing that would be really nice if all object definitions behaved the same, akin to C++11 universal initializing syntax.  For one, classes and functions should be defined like normal data, so if we use a syntax like int x = 5, you want to have class foo = ?.  The first question on this front is what is the minimum syntax to define any data.  The best way to deduce how to best go about this is to look how it is done in other languages, and it isn't that complex.

  • In the C syntax languages with static typing, an integer is always int x = 5, or if you want to heap applocate it, you do int *x = malloc(sizeof(int)); *x = 5;
  • In Python and Ruby, you use dynamic typing and forgo the int part, so it is x  = 5, but in map declarations it is x : 5.
  • Perl uses $ name declarations, so it has $x = 5.
  • Haskell just defines numbers like Int foo -> 5.  
  •  In Shell, it is x=5.
  • In Javascript it is usually var, let, or nothing x = 5, but in maps it is x : 5.
As a consistent notion, the equals sign is used, except in maps, where colons are used.  As a grammatial lex, : defines an is relationship, and = defines equality.  I am absolutely considering : as the = statement of definitions and letting = just be logical equals like == in most languages.

Regardless, the syntax is consistent.  In Al, you would have static strict typing, in that int x : 5.3 will error of an unspecified cast from float to int with loss of precision.  int x : "s" fails.  auto x : 5 resolves to an integer type, and you can drop the auto and just have x :  5 which behaves like auto.

As an aside bitwise operations like | & and ^ are becoming more and more unutiized next to their logical counterparts, I'd definitely reserve | & and ^ for or, and, and exponentiation respectively.  If I were to have glyphic bitwise operations I'd use || && or ^^ for those, if anything.  I'd probably just reserve bitxor, bitand, and bitor as keywords and forgo the glyphs since they are so situational.

So if we have a syntax of int x : 5, a function would be func foo : ?.  We want templated function definitions, so our function should be something like func< int (int) > in C++ but the syntax int ( int) isn't conductive of a comma separated value list like a template specification.  Once again, we want to minimize the syntax, so the smallest definition of data with all unnecessary information remove would be something like:

int < int, int foo : int < int x, int y { }.  If < wasn't less than, it would just be the function signifier to deliminate a return value and the function arguments.  This syntax leaves something wanting though, so we try the verbose optimally readable way:

func[int]<int, int> foo : [int](int x, int y) {
}

This looks a *lot* like C++ lambdas.  On purpose.  The capture group just fills the roll of the return type declaration, but if we want a function signature we need that information.  Function templates get uglier:

template<Z> func[Z]<int, Z> foo : template<Z>[Z](int x, Z z) {
}

This happens because the rvalue requires a template definition but so does the l value because that acts as the signature.  We redefine the signature twice.  This is absurdly redundant, so what we want is the concept that if we have an rvalue function we never redefine a signature because the static typing of the function declaration already brought that up.

template<Z> func[Z]<int, Z> foo : func(x, z) {
}

So the arguments types were defined in the signature, and we named them in the definition.  The problem here is that if you were to declare but not define foo for some time (given we even allow that, we would probably want this language to forbid nulls, and all functions are inherently a type of reference into code space) then when you actually do define it, you end up with something like:

foo : func(x, z) {
}

And that definition gives no information to the original signature.

Of course, in practice, you can't do this.  You can't have late binding on a function because it occupies built up code space, not stack or heap space.  Defining it in two places is practically worthless because the compiled code locks foo to be its function at compile time and you can't reassign it because it means you have code pages no longer named.  That means that raw function declarations are inherently constant and final.  Meanwhile, something like :

ref<func> bar; bar : ref(foo) is valid.  You are taking references to defined functions, but you must name them at least once and they can't go unnamed.

The same thing happens with classes.  It might be why in traditional language syntaxes classes and functions don't behave like the data types they represent, because the definitions are disjoint from implementations.  While classes and functions are data types, they are data type definitions - you instantiate instances of them.  They are inherently constant and final and bound to their definition names.  So if you use a global declarative syntax like foo : func, you introduce ambiuity without making the full definition something really ugly like:

const final template<Z> func[Z]<int, Z> foo : func(x, z) {}.

So let us save some time and call it template<Z> func foo : Z(int x, Z z) {}.  Maybe have state defined like func:private,extern,static foo(int x, float y) : Zoo {}.  It is kind of backwards, because it implies func : type of return is the signature rather than thing : value, but the value of a function is its return time if it is pure, for example.

2012/12/23

Hello World in Al and why?

I wrote up a pastebin post that specifies a bit of my little pet language idea.  Just to get into the nitty gritty compared to the status quo languages and new contenders:

  • C: C is way too old, and has a ton of problems.  Glyphic syntaxes like pointers, textual inclusion, no baked in template or polymorphism to facilitate dynamic code, no objects, no access modification to help team based development.  Function objects are convoluted and error prone due to their existence as effectively a void*.
  • C++: While an improvement on C, a lot of what C++ does also falls flat.  It tacks on object orientation rather than being systemic with it, so interacting with C data types like ints is convoluted.  Structs are preserved as just default-public versions of classes that are default-private.  The smart pointers I feel properly implement memory management, but it is a little too little too late.  There is a lot of duplication of purpose in the standard because it is very committee driven.  As a result, this language is like a bathtub for a kitchen sink with a tendency to spring leaks.
  • Haskell: The adherence to the functional ideal is admirable, but I fundamentally disagree with the treatment of the process of constructing tasks for the execution of physical transistors as if it were only math.  Statefulness is an inherent property in computers, and while it introduces overhead, it is necessary for any programming model, and Haskell is no exception in the line of functional languages that try to hide state in a leaky way.  Also, being single paradigm, it isn't all encompassing enough.
  • Go: The syntax is pretty cumbersome, it lacks static typing, its memory model isn't fine grained enough for what a real systems language needs to guarantee behaviors.  It throws all its proverbial utility into parallel programming, but by doing so they make the entire language a pain in the butt to use and full of non-determinism.  So it is a good niche language for massively parallel projects, but otherwise insufficient for a generic compiled language.
  • OCaml: lacks static typing, and has a really glyphic syntax.  I really do think this is a big deal, static typing makes determinism so much easier.  If you are object oriented having everything discretized into nice object boxes is a miracle for debugging and maintainability.  I do like how you can use OCaml as a script or as a native binary.
  • Rust: again, inferred variables.  Not allowing null pointers is an interesting proposition I would definitely want to look into.  The concurrency model of message passing I feel is the appropriate solution, and it has an emphasis on async functionality for small execution units.  I'd rather build a threadpool and queue tasks into that in a program, but to each their own.
So just to summarize my goals with this thought experiment:
  • The code should be deterministic.  No construct should be more than a sentence to explain how it runs when assembled, and as such the compiler should never be overtly complex.
  • Garbage collection is appropriate in some use cases.  The best way to do it is let users opt into using a garbage collector, like most standard library functionality that would break the previous policy.
  • A threading model based off the usage of functions as first class objects and the availability of a concise and easy to use socket interface to send efficient messages across processes.  Inter-process communication can also be done over that socket layer, done over some internalized message passing, or memory can be shared through references.  A thread would not have visibility on another threads stack, but would see global variables and could be given references to stack or heap objects explicitly.
  • Smart pointers!  They solve the memory problem nicely.
  • Don't fear references / pointers.  Rather than having the glyphic * and & syntaxes of the C's, we can use a ref<> template object to refer to a pointer to something.  I like the non-nullable ideal, so maybe this object needs to be constructed with a reference, and attempting to set it to null throws an exception.
  • Exceptions are good!  Goto as well, none of the execution mode switchers are inherently flawed. Code is naturally jumping all over the place, just minimize the hard to trace parts.
  • The glyphs should be minimized.  You want an easily read language with an optional whitespace significant dialect.
  • Module based inclusion - importing a module could mean importing a text file, something already compiled, an entire archive of files or binaries, or even a function in a file or binary. This means you have a unified access view into other namespaces.
  • Access modifiers!  They make peer programming so much easier.
  • Designed with the intent to be both callable and interfacable with the other Altimit languages.  You can't kitchen sink all problems.  This language is meant to be able to give you the tools to maximize performance without needing to have overly complex syntax or requiring more work on your part than necessary.  But by default, it needs to be deterministic and safe.  The bytecode language would be easily sandboxed for application development, and the script language would be easily used as glue, easily live interpreted, and high on programmer productivity.  Using them together could give you a powerful toolkit that nobody else seems to try to build cleanly.
I wonder if I'm ever going to go anywhere with all this crazy talk.  We shall see.   I might write more pastebin samples.

2012/12/22

Software Rants 1: Everything is wrong all the time

It occurs to me that this post was a draft for a long while.  I wrote a paragraph, tossed it out because it was rubbish (wait, I thought all my musings on this blog were) and probably should have a part 1 to a series that would be more tongue in cheek to have a part 0.

But to write this entry I have to get behind why I wrote the title, couldn't come up with anything sound for this post, and gave up on it for 2 months.  In simplest terms, I am an idiot and I have a hard time understanding complicated things I haven't spent years developing habits and muscle memory to recognize.  As a result, when in software-land, there are a trifecta of problems that contribute to this - software complexity is huge, people try to minimize it, the means by which they minimize it specialize the industry and make trying to be a "master of computers" like I keep trying to pull off a lost cause.  It is why I keep going back to wanting to do my own computing environment, because the most fundamental problem is that none of this was ever expected.  New things happen every day, and the unexpected contributes to all the complexity in this industry today.

If I wanted to hack on a software project someone else made, I would need to get it out of source control, so I need to understand the syntax and behavior of that properly.  I'm putting it on a hard drive, so I need to know how disk caching works, how the IO bus handles requests, how it determines spin speed, how a laser magnetizes a platter spinning at thousands of RPM, why they settled on 5v power (even though the damn power connector gives 3.3, 5, and 12 volt power, and nobody uses 3.3 volts, I wonder why, hey power supplies why you gotta break AC into 3 DC voltages and barely use 2 of them?).  How the file system works (b trees, allocation tables (distributed or contiguous) file distribution on the drive (aka Windows vs Linux style), how solid state storage memory works using capacitors rather than transistors like processor cache, how the south bridge and north bridge interconnect sends data, how SAS and SATA differ, how an operating system can recognize a SATA bus, determine the verison of the connection, schedule writes, treat a PCI, USB, SATA, or IDE hard drive as basically the same thing even when they are radically different.  And we haven't even gotten to the files in a folder yet, just where we are putting them.  We didn't even touch on TCP, UDP stacks, the different sockets, different variations of CAT cable, how hertz modulation over an ethernet line determines the bandwidth, how to differentiate power from data, analog vs digital signal processing.

The amount of stuff is huge.  And a lot of it is hard stuff.  Yet somehow a 5 year old can use a tablet computer to play a game, utilizing all of this and more (graphical interconnects, switching video and physical memory, caching, synchronization, SIMD instructions, opengl driver implementations, DMI interfacing) and doesn't know a wink of it.  And it may be my greatest crime that there is almost nothing I can use without understanding to a degree that I could put it back together by hand with a welding iron and a few billion dollars worth of silicon fabrication and imprinting of transistors. 

So the necessary time commitments to understand the entire beast are honestly too large.  So I prioritize the things I care about most, and often end up iteratively moving through the process - a CPU composed of ALUs, FPUs, caches, registers, a TLB, maybe a power modulator, the northbridge logic, a memory interconnect, physical RAM operating at certain timings, refresh rates, clock rates, channel sizes, how the operating system utilizes page tables to manage memory, how TLBs cache pages in layered cache on the processor. 

And just the hardware itself is overwhelming.  So you get past that and you arrive in software land, and start at a bios that exists in battery run ROM on a motherboard (and don't get me started on all that circuitry) that initializes memory and devices, and depending on if you are running EFI or BIOS you end up either executing code from a partition table or you search partitions for a system partition to find executables to run.

And on top of all this hardware complexity, we have dozens of ways of writing an OS, different binary formats, different assembly languages for different processors, different dialects of C that usually ends up at the base of it all (because we are stupidly set in our ways, to be honest.  C is so flawed... )  We pile on top of that extreme language complexity (C++) or extreme environment complexity (Java / C#) or extreme interpreting complexity (Python, JS) where something is extremely complicated to understand but essential to actually understanding what is going on.   And then you have your programming paradigms, and have Perl, Haskell, Clojure, and a dozen other functional languages or other strange syntaxes out there people use just to make your brain explode.  Yet people much smarter than myself can read these like real second languages.

It might be the fault that after 6 years of Spanish going in one ear and out the other, I am firmly grounded in English with no hope to speak anything else.  Mainly because my brain is crappy.  But in the same sense, I like my programming languages being new vocabularies rather than new languages that break my brain. But I don't think it is even my problem entirely - it is evident from the amount of failure in software space, the extreme desire for "talent", and the general cluelessness of both developers and customers in building and using this stuff that things have, just under the surface, gotten really out of hand.  And I really think that the best way to fix it is a clean slate now, so we don't end up with the Latin Alphabet problem (which coincidentally is the other blog I'm posting at the same time as this one).

So even though this post comes out 6 entries into this series on software, it does establish the baseline for why I rant about the things I do - everything is broken, and wrong, and nobody wants to take the time to just do it right.  Mainly because we are fleshy meatbags that only persist for a blink of time in space, require the flesh of living things to persist, and have an unhealthy obsession with rich people doing stupid crap.

Thinking about Alphabets

Since I've wrote a bunch about basically throwing away 30 years of work done by engineers significantly smarter than me, it occurs that you should really question everything when devising a new computer environment, besides just making a new character set that eliminates the more pointless glyphs of low order Unicode.  One aspect of that might be going as far as to redefining the alphabet used.

After all, when complaining about things being outdated and obsoleted by new technology and ideas, the 2,700 year old glyph system (albeit with added and removed glyphs over time) that forms the base of all language in the western world is a good candidate for reconsideration.  A lot of characters in the set are redundant - C and K, Y and I (and E).  In that sense, I am a fan of the International Phonetic Alphabet, which is a glyph system representing the pronounceable vocabulary of the human vocal tract.  It includes both the lung-based sounds, it has an extensions and sub-groups for clicks and other non-pulmonary sounds, and in the end it represents the goal of written language - to trans-code spoken language.  We can read a large spectrum of glyphs, and if we wanted we could encode them in a wide color gaumet, but our audible range is much more limited - the IPA has 107 characters, but in practice only around ~30 of them are significant, and if you got technical enough to create an alphabet with the discrete independent elements of spoken language, you could probably manage around that number.

But this isn't an infallible problem with a simple solution - the reason students don't hear about the IPA is because it is what many things with the word international in them the glyph system it uses is a hodge-podge mix of a dozen languages and dialects since some don't use the full range of human enunciation.  The result is that a lot of characters in the IPA are absurd multicharacter strings like Ê¥, tᶣ, and Å‹̋.  Even though the modern Latin derived English alphabet leaves much to be desired, the glyphic complexity pretty much limited to a worst case of m or j.  So one objective of such a universal enunciation based alphabet, besides representing the proper human audible range, while not having redundant characters, is to have the simplest set of glyphs possible.

A good example of this is I.  A vertical bar is a capital i.  A vertical bar with a half foot is a capital L.  T is also pretty simple, as are N, Z, and V.  Thees all have 3 or fewer strokes in their structure, and have little subtle interrupt in their form.  In the same way humans read words by recognizing the entire glyph structure rather than the individual letters, having the least complex, most definitive glyphs represent the alphabet makes it the easiest to learn, recognize, and write.

The amount of work required to actually scientifically define the appropriate subset of the IPA to define all distinct audible tones of human speech, combined with the most minimalist and simple glyphic representation of that tone set, is something beyond the scope of my brain.  But in many ways it is an inevitable evolution for mankind to eventually optimize our speech and writing, in the same way most of the world is currently switching over to English as a common language.  Hopefully once we solve the thousand different languages problem, we can evolve up to a much more logical form of communication in both written and verbal form.  It makes software engineers cry less.

2012/12/21

Some math about that "digital interface to rule them all"

I remarked in my future OS post that we should standardize on one digital interface format for everything digital.  One way to solve the most pervasive problem in that domain (that the bandwidth use cases of devices differs quite a bit) can be solved with dynamic frequency clocking of the interface buses.  An easy way to solve this is to have a hot-plug handshake between devices - when a connection is made, each side sends standardized information about itself (what it does, what frequencies its capable of, etc) and the interface uses the lowest frequency both support that sufficiently fills bandwidth requirements of the interconnect.  So you could in theory have a multi-gigahertz interconnect where low bandwidth devices (like input devices) could run at only a few hundred hertz.

Some general numbers:

The most bandwidth intensive activity I can conceptualize for this interconnect, besides just going all out maximizing the bandwidth for usage as a supercomputer interconnect, is about 10 GB/s or 80 gb/s.  I get this number using 16:10 Ultra-HD at 48 bit color depth (12 bits per color channel + 12 bit alpha channel) at 180hz (this comes from the idea of using 3d displays - 90hz is a good standard that most human eyes can't differentiate very well, just like 4k is a good resolution where most eyes won't see the difference at 250 PPI from a distance of 2 feet - I wouldn't ever use a 3d display, but the standard should anticipate that).  Given that current Displayport can reach 18gb/s, increasing that 5 fold in time for this "concept" to actually matter is completely feasible.  Worst case scenario, you just include more packet channels.  Comparatively, we have pretty much clobbered the limits of audio and codecs like Opus just make compression even better over time, so I'm not worried about 15 speaker surround sound being the next big thing.

But just as a consideration, if you were to go completely audio-phile crazy and used 20 mbit/s lossless audio on 33 speakers.  That still only constitutes 660 mbit, less than a gigabit.  In practice, if you are transferring raw video without some kind of encoding compression, it will be the absolute dominator in bandwidth utilization.

So a target of 10GB/s sounds good, especially considering that also works as an interface bandwidth target per lane inside the chip architecture.  If running at peak throughput, you would be effectively running a cpu-northbridge interconnect over a wire.

If we use fiber channel for this connector standard, it can be the ultra-low-latency needed to be as multi-purpose as it needs to be.  However, while it could be ultra-high bandwidth with low latency when needed, you could also disable all but one data channel and run it at a few hundred hertz for a few megabytes of bandwidth, which is also a keyboard or mouse should need.

I could also see color coded cables indicating the peak frequency on the line supported - a 100' cable probably wouldn't be able to run at whatever frequency is needed to supply 10GB/s without active modulation, whereas a 20' cable should be able to.  This also means it makes a good ethernet standard, because realistic network bandwidth won't pass a gigabit for a long while, and getting a gigabit on a multiplexed cable like this will be cakewalk at really low frequencies.

I really don't even want to consider analog interfaces with this configuration.  It should be all digital video, audio, etc.  You could always stick an analog converter chip on top of this interface like a traditional PCI device anyway.

I also would only want one standard connector, preferrably as small as possible, with an optional lock mechanism.  Just from my personal view on the matter, but having 3 standards of usb, micro usb, hdmi, mini hdmi, micro hdmi, mini displayport, displayport, etc is just absurd.  If the maximum connectivity can be obtained on a smaller connector foot the extra few cents to build a better circuit.

Just as a footnote, real world 100gbit Ethernet should eventually hit market, and that would be the perfect standard for this.

2012/12/20

Software Rants 6: How to Reinvent the Wheel of Servicing

In my post about new OS paradigms, I remarked about how you won't replace IP as *the* network protcol and how we should all just bow down to its glory. 

However, one thing that can easily change (and does, all the time, multiple times a week) is change how we interact over that protocol.  It is why we even have URI's, and we have everything from ftp:// to http:// to steam:// protoctols over IP packets.  I want to bring up some paralells I see between this behavior, "classic" oprating system metaphors, and the relatively modern concept of treating everything as files circa Plan 9 and my stupid ramblings. 

If I was writing an application for a popular computing platform, I would be using a system call interface into the operating system, some kind of message bus service (like dbus) for communicating with most service, I would personally use some kind of temp file as an interchange but I could also open a Unix socket as a means of IPC.  Or maybe I go really crazy and start using OS primitives to share memory pages.  Any way you slice it, you are effectively picking and choosing protocols - be it the Unix socket "protocol", the system call "protocol", etc.  In Plan 9 / crazy peoples world, you forgo having protocols in favor of a a file system, where you can access sockets, system calls, memory pages, etc as files.  You use a directory tree structure rather than distinct programmatic syntaxes to interface with things, and the generic nature improves the interchangeability, ease of learning, and in some cases it can be a performance gain since you are using significantly less overhead in using a kernel VFS manager to handle the abstractions.

If I took this concept to the net, I wouldn't want to specify a protocol in an address.  I would want the protocols to be abstracted by a virtual file system, so in the same way I mentioned /net/Google.com/reader should resolve as the address of Google's reader, you could be more specific and try something like /net/Google.com:80/reader.https (this is a generic example using the classic network protocols) where you can be specific about how resources are opened (in the same way you use static file system typing to declare how to handle files).  But this treats Google.com as a file system in and of itself - and if you consider how we navigate most of these protcols, we end up treating them as virtual file servers all the same.  The differentiation is in how we treat the server as a whole.

In current usage, interacting with ftp://mozilla.org and http://mozilla.org produces completely different results, because ftp requests are redirected to an ftp server and http ones are directed to an http server.  But https doesn't inherently mean use a different server, because it just means sticking a TLS layer on top of the communication, but the underlying behavior of either end resolves the same - packets from one are generated, boxed in an encrypted container, shipped, decrypted on the receiving end, and then processed all the same.  That is in many ways more elegant than the non-transparent designation of what server process to interact with at an address based solely off the URI designation.

So what I would rather see is, in keeping with that VFS model, a virtual mount of a remote server under syntax like /net/Google.com producing a directory containing https, ftp, mail, jabber, etc, where an application would be able to easily just mount a remote server, and depending on the visible folders derive supported operations just off that.

Likewise, authentication becomes important.  /net/zanny@google.com would be expected to produce (with an authentication token, be it a cached key or a password) *my* view of this server, in the same way users and applications would get different views of a virtual file system. 

This leads to a much more cleaner distinction of tasks, because in the current web paradigm, you usually have a kernel IP stack managing inbound packets on ports, where to send them, and such.  You register Apache on port 80 and 443 and then it decodes the packets received on those ports (which now sounds even more redundant, because you are using a url specifier and a port, but the problem becomes that ports are not nearly as clear as protocols).

So in a vfs network filesystem, determining the available protocols on a webserver should be simpler, by just looking at a top level directory of the public user on that server, instead of querying a bunch of protocols for responses.  Be it via file extensions or ports, it would still be an improvement.

2012/12/17

Lifes Too Short, or why using anything but C++ is insulting to your users

I was watching a presentation from altdevblog about C# for games (I hope it takes the gaming world by storm, C# is beautiful).  Repeatedly, when discussing "hard" concepts (callbacks, manual memory management, threading) the author (Miguel) would qualify the difficulty, and the expected debugging nightmares and resulting ripping out of hair of doing hard things in software with "lifes too short".  Really great talk about Mono in the game world, it is a really ingenious platform especially since it supports everything ever.

I think it is a dangerous sentiment.  It basically says do things the easy way because it is a mountain to climb.  While I will agree on the intent, that developers should trade efficiency and determinism for time and ease of development.  Which is great, especially in games.  Maybe not in services.  And definitely not in kernels.  It is not a black and white case of just use the high level language to save time, it is a debate between the difficulties of building a project and the amount of work it is expected to perform.

This actually applies to games.  A single player without multiplayer can reasonably just throw C# at the problem and crank the game out quickly.  If it only lasts 30 - 40 hours, in a computing environments lifetime, that is peanuts.  But when you build a game meant to last hundreds of hours, you are actually not just costing cycles on your users machines, but you are wasting their time in exchange for your own, and the resources to power the redundant cycles, and the time of those who produce the power to run the machines.  It isn't just about developer productivity, but about everyones optimal use of time.


So I look at opportune shortcuts as a question of cost - it is why I would almost always look at C++ when thinking of some mass market product, but I'd look at Python for anything niche.  And I'd look at C# to produce business software, or non-service continually running processes.  Everything has a use case and a cost, so just thinking about ones own life being short might be a short sighted justification for paying a performance cost.

Also, I think this was the most flamebait title I have written yet.  Yay.

2012/12/16

Document and Serialization Format Thoughts and Examples

I made a few posts on reddit in a thread about Python in the browser about why HTML / xml suck and almost anything else is better. Assuming that, I explored some options I would like to describe here as alternative document syntaxes.

1. JSON-like whitespace-insignificant functional orientation:

jsondoc {
  title : "Ninjas Are Awesome",
  p(id:"fact") : {. : "A ninja's natural enemy is a",
                  strong : "PIRATE", "!"}
  ul(id:"ninja_weapons") : { li : "sword", 
                             li : "kung fu", 
                             li : "throwing star"}
}

This I feel has the most chance to catch on - or something like it. JSON-esque maps. The only real difference between standards JSON and this syntax is the introduction of arguments on keys that act as attributes. The map element . denotes the wildcard content of a tag if you use a map instead of value for a "tag". Alternately, you could derivative from JSON standard more, and that leads me to my next example.

2. YAML-like whitespace significant:

title :: Python
list ::
- article : id=1 :
    author :: Xanny
    date :: 2012-12-15
    title : id=pirates : Ninjas are awesome
    post : id=bacon,class=words : >
        I talked about this in another fork of this 
        comment thread a bit.  You absolutely need to 
        have the capabilities of the language...
- article : id=2 :

This syntax is YAML derived but introduces attributes between the key - value syntax.   It is a little verbose with all the double colons, and in my considerations of this syntax it became apparent you didn't need to have the : > syntax for multiline descendant text bodies, so I refined it with a few changes. > is a synonym for ::, and a value section that only contains a : is considered an indicator of a descendant multiline element.  Also, instead of using minus signs as an array element delimiter, I use commas instead, since it uses comma separated values in the arguments.  Also, a keyless argument is assumed to be the id, and arguments without quotes are ,>: terminated.

title > Python
list >
  , article : 1 :
      author > Xanny
      date > 2012-12-15
      title : pirates : Ninjas are awesome
      post : class = words, bacon >
        I talked about this in another fork of this 
        comment thread a bit.  You absolutely need to 
        have the capabilities of the language...
  , article : 2 :

I like this, but I can't feel it would fly very well, so I also have a whitespace insignificant dialect of this syntax:

doc {
title > Python,
  list > [
    article : 1 : {
      author > Xanny,
      date > 2012-12-15,
      title : pirates : Ninjas are awesome,
     post : class = words, bacon : I talked about this in another fork of this comment thread a bit.  You absolutely need to have the capabilities of the language...
  } ,
  article : 2 : ...
  ]
}

Here, commas are string and element delimiters everywhere, maps are curly braces denoted. Because of a concise grammar, you would only need to escape commas, colons, and the two kinds of braces. One thing to note is this language never utilizes parenthesis. I feel something like this might easily become more popular. An alternative might be to keep the argumentative behavior from the json dialect, and reintroduce parenthesis:

doc {
  title : Python,
  list : [
    article(1) : {
      author : Xanny,
      date : 2012-12-15,
      title(pirates) : Ninjas are awesome,
      post(class=words, bacon) : I talked about this in another fork of this comment thread a bit.  You absolutely need to have the capabilities of the language...
  } ,
  article(2) : ...
  ]
}

The big deal here is that by introducing overhead of parenthesis the key : value syntax remains succinct, and it easily allows for a data serialization format to be used where you just discard the arguments syntax, or maybe even have the parsing behavior definied that arguments are just key:value pairs to be added to the constructed map (here, in python syntax) like so:

title(pirates) : Ninjas are awesome, 
>>> 
"title" : {"id" : "pirates", "body" : "Ninjas are awesome"}

The same could work for the yaml syntax, where colons past the first are disregarded (except ::\w+\n which denotes multiline text follows). Or you could use a glyph exclusively for multiline text, like & and * which go unused in yaml).

It really comes back to that vision of one data format to rule them all (that isn't the current ruler, xml) for documents, serialization, message passing, etc. Both JSON and YAML are significantly better than xml, and in keeping with that unified protocol ideology, unified textual language is a natural extension.

As a footnote, I had to rewrite this blog in completely manual html since it kept malforming the code -> pargraph transistions and inserting redundant spaces with tons of unneeded tag duplication. So the source of this should be pretty. Come on Google, get your shizzle together.

And as a final note, I'd probably go with a choice between the last two. I'd easily see this better-markup-language (bml) have extensions .bmlw for the whitespace dependent version (better markup language (with) whitespace-significance) and .bmlb (better markup language (with) brace-significance) for the whitespace agnostic.

2012/12/15

GPU Fan Controller Thingy Post-Mortem

Forgot to write a blog post about this.  Took about an hour of coding, and around 4 reading Python documentation on proper usage of the subprocess module.

It works, I guess that is the point.  I'm not going to try to integrate it as a service into upstart or systemd since it runs fine unprivileged as a startup application.  I would try to make a nice GUI for it but Kepler GPUs don't support fan speed control through coolbits anymore (go figure) so it seems user fan speed control is now unsupported by Nvidia, which defeats the purpose of the project.

It is easy to use though if anyone wants to grab it since most GPU default fan profiles are awful.  It is just a array of temps to percentage speeds that gets built into a temperature map and every recheck time (default 5 seconds) the script will poll for the temperature, index the temperature in the array of fan speeds, and if the speed it gets differs from the current one it will update the speed.

I worked to minimize calls into nvidia-settings, because it almost certainly has a lot more overhead than my little script to incite.  So I only change the fan speed when necessary.  I found that it made more sense to constantly turn fan speed control back on every update than to potentially silently lose the ability to change fan speed (or to implement a way to parse for errors on speed update, and then to turn it back on - that might work, but I didn't get conclusive evidence it will actually error out if fan speed controlling gets disabled in some circumstances so I didn't persue it further).

In the end, it works for purpose.  It is to my knowledge the most efficient implementation on the net (even though it has a lot of memory usage with a temperature array, but looking into trying to do an array by 5c changes in temperature the math just gets more complicated than finding an index in an array of temp : speed tuples that crosses a thermal boundary to use).  It doesn't have a neat GUI but I lost the drive to persue that when I found out Nvidia no longer supports fan control.  I will need to investigate more thoroughly how AMD is doing on that front, because to me it seems completely unacceptable to not allow the user to control fan speeds - the defaults on these GPUs are absolutely awful and in my case my gtx 285 will crash on the default profile from overheating.  I even get severe graphical errors from it running near 80c and the fan doesn't even kick in at all until then.

Also, using gitorious instead of github for now, I like the site design more and prefer open source to closed.  Github seems to be getting too big for its own good in my book, and are starting to fork away from standard git in a lot of ways.

<<Add Link to gitorious repo when I reupload it to my original gitorious account>>