Zanny's Realm of Rants

2013/09/17

Software Rants 16: Binary Firmware and Trade Secrets

Almost a year ago, I built my grandparents a new pc with an a10-5800k processor, on the pretense that AMD is hurting and they contribute a foss driver, they deserve some support.

A year later, my grandparents are still regularly using catalyst on their Suse install, because whenever I plug in their living room tv via hdmi, the driver fails and I get two black screens, or the tv has color bars on it until I mode set the normal desktop monitor, and then we are back to all black.

Looking forward, in future computer builds, I am looking at parts not for features or performance or reliability but I look for parts that aren't binary blobs doing whatever the hell they want with no way to audit their behavior, in kernel space.

The problem is there is nobody on my side in this. Intel's chipsets are completely proprietary, with no open source support for them at all. Their processors are obfuscated trade secrets with opcodes to payload encrypted firmware blobs to modify the microcode to do whatever the hell they want at runtime, and their network controllers have binary blob firmware that might be broadcasting who knows what.

Sadly, though, they are the only vendor with an actual free gpu driver. No firmware blobs, no bullshit, relatively open specification. They don't support gallium, but really, thats their call. They waste their time, and everyone elses time, not using one stack, but at least they are trying.

However, on the other side of the fence, you have AMD - they participate in coreboot, their cpus are still trade secret proprietary mess but not as bad as Intel, but they use prorprietary firmware blobs with their gpus, and thus you can't reimplement their driver without reverse engineering that crap.

When it comes to processors in general, they are all bullshit trade secret messes. There is not one open spec cpu, even the Leeong chip from China licenses MIPS. So if you are stuck on some proprietary garbage, you might as well use the x86 chips, since at least Intel isn't profiteering off of IP law with their cpu design (AMD's contract with them is pennies on the chips they make).

So why can't I buy an AMD chipset with an Intel APU? I want open firmware across the board, because I want an open system where I can look at exactly how everything works and tweak it to my desire. But there is no way to build a system without someone restricting my freedom with my own hardware, and it sucks.

Meanwhile, Nvidia is off in la-la land, their proprietary blob graphics cards can blow me, but Nouveau is more open than AMD chips, since the reverse engineered firmware is foss! And they have open source drivers for their Tegra APUs, and they just license ARM cores.

I hope they consider making a Tegra 5+ based NUC, because I'd be interested in that. Kepler graphics on a foss driver + firmware, on a chipset with coreboot, and arm v8 cores I could live with. But right now all our options suck. If I had connections, i definitely have the desire, I'd make an open hardware company. But you wouldn't want to found it in the US, because patent and trademark trolling would crush you before you got off the ground.

2013/09/13

The Future of the Computing Platform

I'm in love with NUCs. Even though I usually despise the Intel marketing terminology (because they always trademark it, and because it usually sounds silly like ultrabook) in this case they nailed it on the head.

Firstly, the desktop. Or more specifically, I'd call this the class without an internal battery, because size is the topic. ATX and even micro ATX mainboards are now complete overkill. The only reason they even exist is exclusively graphics cards - and only the most insane users run multi-card configurations (note - the difficulty with multi card configurations is probably on the foundation of how hackneyed the entire platform is, but ranting about hardware architecture is something I have already done, and something I will probably do again in the future. Regardless, if you discount the graphics card slots, what can you really put in a pci slot anymore? Let's list them:

Discrete Audio Cards: Only for audiophiles. And if you care, just get a mainboard with beefier integrated audio. Asus even has a z87 board with a discete class audio solution embedded into the board proper. You don't need a coprocessor because pcm to onboard analog signaling is dirt cheap, and you often end up using digital audio out anyway in which case you can often just transport an aac or the pcm stream anyway. Note: spdif fuckers, get opus support in digital audio standards yesterday and junk all that other crap.
TV Tuner Cards: One, tv is dying, two, there are usb tuners. When I built my grandparents rig I got them an internal pci tuner under the assumption it gets better quality. In hind sight, probably not. On the first point, broadcast television is already going the way of the Dodo and I would never build a new system around converting dvb-t or a signals to mpeg2.
Raid Cards: The integrated raid controllers on most mid range motherboards are sufficient for 3 - 5 disk raid 5, and really, if you are even thinking of consumer raid this is where you are looking. Servers already have entirely different pcb form factors anyway, so you can keep dedicated pcie raid cards there just fine. Additionally, raid 0 with ssds has no performance improvement (maybe sata express will change that in a higher per channel bandwidth world) but I doubt it, because principally ssds already have radical random reads and writes. This means you might as well raid 1 with ssds rather than raid 5, and SSDs aren't like mechanical disks where one memory sector going bad kills the drive. You just don't need that redundancy class even in a homebrew home server. And again, if you do, the integrated raid is often good enough.
PCIE SSDs: These have a claim to fame in how sata 6gbps is at its limit and the only way to get more bandwidth in consumer hardware right now is pci express lanes. However, the interconnect isn't really targeting block devices, and the big 16x slots are overkill for 1GB/s sequential read ssds.
Network Switches: This is one that is harder to replace because a four way splitter has peak bandwidth requirements of 4gb/s. It isn't a block device so you can't stick it on sata 6gbps, even though that would make sense. Though, again, who is building a system with a network switch? I intend to, but I'm weird. Niche markets shouldn't dictate consumer standards.
Port Expansion: ie, more usb3 ports on a hub card. Like integrated audio, this should also be integrated to satisfatory levels. If you need more, you are a niche.

Overall, there are just not many cases where 99% of people would want an expansion card *besides* for a gpu. Likewise, this isn't 2005 anymore, and you aren't likely to upgrade your processor but keep the same motherboard. You pack a processor, mobo, and set of ram into a system, and leave it like that. You might get more ram (ie, a second set if you only bought one) but even that is rare, and you could have just bought double capacity from the start.

I predict a new world of form factors - for the classical enthusiast class, a small mitx size board taking up to 2 sodimm slots and a socket in the 20 - 30mm range rather than 35 - 50 (since large dies are overheating anyway), mpcie and msata completely replacing standard pcie slots, and 2.5" mechanical drives becoming standard under-the-board connected disks. The only exception is gpus, which won't be able to easily migrate from high bandwidth 16x lanes with space for heatsinks and dedicated fans running on 12v to pcie 1x at 3.3v with no fan options unless the device makes room. But APUs are now proving themselves very capable, and unified memory is a huge benefit for simpler implementation. And all 3 big players have apus (Nvidias are just ARM based).

Even smaller than that, I expect soldered boards of combined pcb + cpu + memory to become common, since you rarely upgrade any of those parts, and becuse all 3 are always mandatory in any build. It enables smaller units when you don't need to worry about standard ports and sockets.

The enthusiast market won't go away, but likewise these big behemoth motherboards are also dinosaurs. There are connectors for these smaller form factors, and NUCs are paving the way, although the combination of case and pcb isn't great. What you really want is a case and a motherboard standard with a simpler power connection architecture (so you could have an external or internal power brick). But they are the future, and it will be small.

2013/08/18

Software Rants 15: The Window Tree

I've had a barrel of fun (sarcasm) with X recently, involving multi-seat, multi-head, multi-gpu - just in general, multiples of things you can have multiples of, but most of the time don't, so the implementations of such things in X are lacking at best, utterly broken at worst.

I also am becoming quite frustrated with opensuse, trying to fix varous graphical and interface glitches to get a working mutlihead system for my grandparents.

But I look towards Wayland, and while I appreciate the slimming, I have to be worried when thing like minimizing, input, and network transport are hacks on top of a 1.0 core that already happened. It reeks of a repeat of the X behavior that lead to the mess we have now.

So I want to talk about how I, a random user with no experience in writing a display protocol or server, would implement a more modern incarnation.

First, is to identify the parts of the system involved. This might have been a shortcoming in Wayland - the necessary parts were not carved out in advance, so they needed to be tacked on after the fact. You can represent this theoretical system as an hourglass, with two trees on either side of a central management framework. In Linux terms, this would be through DRI and mode setting, but the principle is that you must communicate virtual concepts like desktops and windows (and more) into physical devices, and do so in a fluid, organic, interoperable, hotpluggable fashion. This might be one of Waylands greatest weaknesses, in how its construction doesn't lend itself to using an arbitrary protocol as a display pipe.

You would have a collection of display sinks - physical screens, first and foremost, but also projectors, recorders, a remote display server, cameras to record from, etc. They are all presented as screens - you can read a screen with the necessary permissions (through the display server). To write a screen, you must also use the display server. You can orient these screens in a miriad of ways - disjoint desktops, running in seperate sessions - you might have disparate servers, with each one managing seperate displays, and inter-server display connectivity is achieved through either the general wide-band network transport (rdp, udp, etc) or over a lower latency / overhead local interconnect (dbus). Servers claim ownership of the displays they manage, and are thus a lower level implementation of this technology than a userspace server like X or even the partial kernel implemented wayland - it supplants the need for redunant display stacks, where right now virtual terminals are not managed by a display server, but by the kernel itself, in this implementation virtual terminals would just be another possible desktop to provide by the display server.

Obviously, this server needs EGL and hardware acceleration where possible, or use llvmpipe. The system needs to target maximal acceleration when available, account for disparate compute resources, and not assume the state of its executation environment at all - this means you could have variable numbers of processors, with heterogeneous compute performance, bandwidth, latency, an arbitrary number of (hot pluggable) acceleration devices (accessable through DRI or gl) that may, or may not, be capable of symmetric bulk processing of workloads. Multiple devices can't assumed to be clones, and while you should correlate and optimize for displays made available through certain acceleration devices (think pci gpus with display outs vs the onboard outs vs usb converters vs a laptop where the outs are bridged vs a server where the outs are on another host) you need to be open to accelerate in one place and output in another, as long as it is the most optimal utiliziation of resources.

So this isn't actually a display server at all, it is the abolition of age old assumptions about the state of an operating computer system that prevent the advancement of the overall state of kernel managed graphics. Tangentially, this relates to my Altimit conceptualizations - in my idealized merged firmware / os model where drivers are shared and not needlessly replicated between firmware and payload, the firmware would initialize the core components of this display server model and use a standardized minimum set of accelerated apis to present the booting environment on all available output displays (you wouldn't see network ones until you could initialize that whole stack, for example). Once the payload is done, the os can reallocate displays according to saved preferences. But the same server would be running all the way through - using the same acceleration drivers, same output protocols, same memory mapping or the port sinks.

Sadly, we aren't there yet, so we can't get that kind of unified video (or unified everything as the general case). Instead, we look towards what we can do with what we have now - we can accept that the firmware will use its own world of display management and refactor our world (the kernel and beyond world) to use one single stack for everything.

So once you have this server running, you need to correlate virtual desktops and terminals to displays. The traditional TTY model is a good analogy here - when the server starts (as a part of the kernel) it would intiialize a configured number of VTs allocated to displays in a configured way (ie, tty1 -> screen0, tty2 -> screen1 which clones to remote screen rscreen3, tty4 -> recorder0 which captures the output, tty5 -> screen2 which clones to recorder1, etc). tty6 could be unassociated, and on screen0, with its associated input device keyboard, you could switch terminals like you do now. You could have the same terminal opened on multiple displays, where instead of having a display side clone, you have a window side clone (ie, not all outputs to, say, screen0 and 1 are cloned, but tty15 outputs to both of them as simultaneous display sinking).

A window manager would start and register itself with the server (probably over dbus) as a full screen application and request either the default screen or a specific screen - recalling that screens need not be physical, but could also be virtual, as is the case in network transport or multidisplay desktops. It is provided information about the screen it is rendering to, such as the size, the DPI, brightness, contrast, refresh, etc - with some of these optionally configurable over the protocol. This window manager may also request the ability to see or write to other screens beyond its immediate parent, and the server can manage access permissions accordingly per application.

On that desktop (which to the server is a window, occupying a screen as a "full screen controlling application" akin to most implementations of a full screen application, and whenever it spawns new windowed processes, it allocates them as its own child windows. You get a tree of windows, starting with a root full screen application, which is bound to displays (singular or plural) to render to. It could also be bound to a null display, or no display at all - in the former, you render to nothing, in the latter, you enter a freeze state where you suspend the entire window tree under the assumption you will rebind that application later.

In this sense, a program running in full screen on a display, and a desktop window manager, are acting the same way - they are spawned, communicate with the central server as being a full screen application running on some screen, and assume control. If you run a full screen application from a desktop environment, it might halt the desktop itself, or more likely it moves it to the null screen where it can recognize internally it isn't rendering and thus stop.

I think it would actually require some deeper analysis if you even want an application to be able to unbind from displays at all - additionally, you often have many applications in a minimized state, but you want to give a full screen application ownership of the display it runs on (or do you?) so you would need to create virtual null screens dynamically for any application entering a hidden state.

Permissions, though, are important. You can introduce good security into this model - peers can't view one another, but parents can control their children. You can request information about your parent (be it the server itself, or a window manager) and get information about the screen you are runnning on only with the necessary permissions to do so. Your average application should just care about the virtual window it is given, and support notification of when its window changes (is hidden, resized, closed, or maybe even if it enters a maximized state, or is obscured but still presented). Any window can spawn its own children windows, to a depth and density limit (to prevent a windowed application from assulting the system with forking) set by the parent, which is set by its parent, so on and so forth, to a display manager limit on how much depth and bredth of windows any full screen application may take.

The full screen application paradigm supports traditional application switching in resource constrained environments - when you take a screen from some other full screen application, the display server will usually place it is a suspended state until you finish / close, or a fixed timer limit on your ownership expires (a lot like preemptive multiprocessing with screens) and control is returned. Permissions are server side, and cascade through children, and while they can be diminished, they require first cllass windows with privildge assention to raise them.

You can also bind any other output device to an applications context. If you only want sound playing out of one desktop environment, you can control hardware allocation server side accordingly. Same with inputs - keyboards, mice, touchscreens, motion tracking, etc - can all be treated as input, be it digital keycoding, vector motion, or stream based (like a webcam or udp input), and assigned to whatever window you want, from the display server itself (which delegates all events to all fullscreens). Or you can bind them to a focus, which in the same way you have default screens, you can have a default window according to focus, and delegate events into it (from the application level, it would be managed by the window manager).

You could also present a lot of this functionality through the filesystem - say, sys/wm, where screens/ can correspond to the physical and virtual screens in use (similar to hard drives, or network transports, or audio sinks), and sys/wm/displays where the fullscreen parents reside, such as displays/kwin, or displays/doomsday, or displays/openbox. These are simultaneously writable as parents and browasable as directories of their own children windows, assuming adequete permissions in the browser. You could write to another window to communicate with it over the protocol, or you could write to your own window as if writing to your framebuffer object. Since the protocols initial state is always to commune with ones direct parent, you can request permissions to view, read, or write your peers, corresponding to knowing their existence, viewing their state, and communicating over the protocol to them. As a solution to the subwindow tearing problem, the server understands movements in parent are recursive to children, such that moving, say, one window 5px means a displacement in all children by 5px, and a corresponding notification to that windowit has moved, and that it has moved due to a parents movement.

The means why which you write your framebuffer are not the display servers problem - you could use egl, or just write the pixels (in a format established at window creation) between the server and the process. Access to acceleration hardware, while visible by the display server, is a seperate permissions model, probably based off the user permissions of the executing process rather than a seperate permissions heirarchy per binary.

In practice, the workflow would be as follows: the system would boot, and udev would resolve all display devices and establish them in /sys/wm/screens. This would include udev network transport displays, usb adapted displays, virtual displays (cloned windows, virtual screens combined from multiple other screens in an established orientation, duplicate outputs to one physical screen as overlays) and devices that are related to screens and visual devices like cameras, or in the future holographics, or even something far out like an imaging protocol to transmit scenes to the brain.

Because output screens are abstracted, the display manager starts after udev's initial resolution pass and uses configuration files to create the defult virtual screens. It doesn't spawn any windowing applications, though.

After this step, virtual terminals can be spawned and assigned screens (usually the default screen, which in the absence of other configuration is just a virtual screen spanning all physical screens that has the various physical properties of the lowest common denominator of feature support amongst displays). In modern analogy, you would probably spawn VTs with tty1 running on vscreen0, and 2-XX in suspend states, ready to do a full screen switch when the display server intercepts certain keycodes from any input device.

Then you could spawn a window manager, like kwin, which would communicate with this display server and do a full screen swap with its own display configuration - by default, it would also claim vscreen0, and swap out tty1 to suspend. It would request all input delegation available, and run its own windows - probably kdm as a window encompassing its entire display, while it internally manages stopping its render loops. It would spawn windows like plasma-desktop, which occupy space on its window frame that it assigns over this standard protocol. plasma-desktop can have elevated permissions to view its peer windows (like kdm) and has lax spawn permissions (so you can create thousands of windows on a desktop with thousands of their own children without hitting any limits). If you run a full screen application from plasma-active, it can request a switch with the display server on the default screen, its current screen, or find out what screens are available (within its per-app permissions) to claim. If it claims kwins screen, kwin would be swapped into a suspend state, which cascades to all its children windows. Maybe it also had permissions and spawned an overlay screen on vscreen0, and forked a seperate full screen application of the form knotify or some such, which would continue running after vscreen0 was taken by another full screen application, and since it has overlay precedence (set in configuration) notifications could pop up on vscreen0 without vblank flipping or tearing server-side.

Wayland is great, but I feel it might not be handling the genericism a next generation display server needs well enough. Its input handling is "what we need" not "what might be necessary" which might prompt its obsolescence one day. I'd rather sacrifice some performance now to cover all our future bases and make a beautiful sensible solution than to optimize for the now case and forsake the future.

Also, I'd like the niche use cases (mirroring of a virtual 1920x1080 60hz display onto a hundred screens as a window, or capturing a webcam as a screen to send over telepathy, or having 3 DEs running on 3 screens with 3 independent focuses and input device associations between touchscsreens, gamepads, keyboards, mice, trackpads, etc) to work magically. To fit in as easily as the standard use case (one display, that doesn't hotplug, one focus on one window manager with one keyboard and one tracking device).

2013/08/01

Reddit Rants 2: So I wrote a book for a reddit comment

http://www.reddit.com/r/planbshow/comments/1je0xj/regulation_dooms_bitcoin_plan_b_17_bitcoin_podcast/cbe7mv1

So I really like the planb show! I guess I like debating macroeconomics. I can't post the entire conversation here because its a back and forth. 2500 words on the second reply, though!

2013/07/27

Software Rants 14: Queue Disciplines

Rabbit hole time! After getting my TR-WDR3600 router to replace a crappy Verizon router/dsl modem combo in preparation for my switch to cablevision internet (for soon ye of 150 KB/s down and 45 KB/s up ye days be numbered) I have dived headfirst into the realm of openwrt, and the myriad of peculiarities that make up IEE 802.

My most recent confrontation was over outbound queueing - I found my experience using my (constantly bottlenecked) DSL connection pitiful in terms of how well page loads were performing and how responsive websites were under load, so I investigated.

I found a pitiful amount of documentation besides the tc-X man pages on the queue algorithms the kernel supports. I was actually reading the source pages (here are my picks of interest).

So of course I go right for the shiny new thing, codel. It is a part of the 3.3 kernel buffer bloat tuning. It has to be better than the generic fifo queue, right? The qos package of luci in openwrt always uses hfsc, for example, so it requires elbow grease and an ssh connection to get fq_codel running.

Well, not really. It is just ssh root@router.lan tr qdisc add dev eth0.2 root fq_codel. But it is the thought that counts.

What did make me happy was the reinforcement of my purchase decision by the Atheros driver (ath71xx) being one of the few kernel 3.3 supported BQL drivers. So that was good. It is currently running on my wan connection, we'll see how it works.

What I found interesting was that apparently the networking in Linux is a real clusterfuck. Who would have known. The bufferbloat problem from a few years ago was, and still is, serious business. And according to documentation 802.11n drivers are much, much worse than just ethernet switches.

It was an educational learning process, though. Codel is a near-stateless, near-bufferless, classless queue discipline that is supposed to handle network variability well and work out of the box, which is exactly what the next generation of network routing algorithms needs. And if it works well, I hope it takes over the world, because fifo queues are so 2002.

2013/07/15

ffmpeg syntax to extract audio from an mp4

ffmpeg -i <inputfile.mp4> -acodec copy <outfile.aac/.mp3>

Just keeping this on file. I need it way too often and always forget it.

2013/06/29

Software Rants 13: Python Build Systems

So after delving into pcsx2 for a week and having the wild ride of a mid-sized CMake project I can officially say any language that makes conditionals require a repetition of the initial statement dumb as hell. But CMake proves a more substancial problem - domain languages that leak, a lot.

Building software is a complex task. You want to call external programs, perform a wide variety of repetitious tasks, do checking, verifying, and on top of that you need to be able to keep track of changes to minimize time to build.

Interestingly, that last point leads me to a tangent - there are 3 technologies that are treated pretty much independently of one another but overlap a lot here. Source control, build management, and packaging all involve the manipulation of a code base and its outputs. Source control does a good job managing changes, build systems create conditional products for circumstance, and packagers prepare the software for deployment.

I think it would be interesting if a build system took advantage of the presence of the other two dependencies of a useful large software project - maybe using git staging to track changes in the build repository. Maybe the build system can prepare packages directly, rather than having an independent packaging framework - after all, you need to recompile most of the time anyway.

But that is aside the point. The topic is build systems - in particular, waf. Qmake is too domain specific and has the exact same issues as make, cmake, autotools, etc - they all start out as domain languages that mutate into borderline turing complete languages because their domain is hugely broad and complex, and it has evolved more complex over time. This is why I love the idea of python based build systems - though at the same time, it occurs to me most python features go unused in a build system and just waste processor cycles too.

But I think building is the perfect domain of scripting languages - python might be slow, but I could care less considering how pretty it is. However, my engagements with waf have made me ask some questions - why does it break traditional pythonic software development wholesale (from bundling the library with source distribution, to expecting fixed name files of wscript that provide functions with some wildcard argument that acts really magical).

What you really want is to write proj.py and use traditional pythonic coding practices with a build system library, probably from pypi, You download the library, do an import buildsystem, or from buildsystem import builder or something, rather than pigeonhole a 2 decade old philosophy of files without extensions in every directory with a fixed name.

Here is an example I'd like to write in this theoretical build system covering pretty much every aspect off the top of my head:

# You can play waf and just stick the builder.py file with the project,
# without any of the extensionless fixed name nonsese.
from builder import recurse, find_packages, gcc, clang
from sys import platform

subdirs = ('sources', 'include', ('subproj', 'subbuilder.py'))
name = 'superproj'
version = '1.0.0'
args = (('install', ret='inst'),)
pkg_names = ('sdl', 'qt5', 'cpack')

builder.lib_search_path += ('/lib','/usr/lib','/usr/local/lib', '~/.lib', '/usr/lib32', '/usr/lib64', './lib')

# Start here, parse the arguments (including optional specifiers in args) a lot of the builder. global members
# can be initialized with this function via default arguments.
todo = builder.init('.', opt=args, build_dir='../build')

if(todo = 'configure'):
# builder packages are an internal class, providing libraries, versioning, descriptions, and headers.
# when you call your compiler, you can supply packages to compile with.
pkgs += builder.find_packages(pkg_names)
pkgs += find_packages('kde4')
utils += builder.find_progs('gcc', 'ld', 'cpp', 'moc')
# Find a library by name, it will do case insensitive search for any library file of system descript,
# like libpulseaudio.so.0.6 or pulseaudio.dll. It would cache found libraries already and not repeat
# itself on subsequent builds.
libs += builder.find_lib('pulseaudio')
otherFunction()
builder.recurse(subdirs)
elif(todo = 'build'):
# You can get environments for various languages from the builder, supplying them with
cpp = builder.env.cpp
py = builder.env.py
qt = builder.env.qt # for moc support

# you can set build dependencies on targets, so if the builder can find these in the project tree
# it builds them first
builder.depends('subproj', 'libproj')

# builder would be aware of sys.platform
if platform is 'linux': # linux building
qt.srcs('main.cpp', 'main.moc')
qt.include('global.hpp')
qt.pkgs = pkgs['qt5']
qt.jobs = 8 # or the .compile syntax
qt.cc = gcc # set the compiler
qt.args = ('-wstring',)
# qt.compile would always run the MOC
qt.compile(jobs=8,cc=gcc,args=self.args+(, '-O2', '-pthread'),warn=gcc.warn.all,out=verbose)
# at this point, you have your .o files generated and dropped in your builder.build_dir directory.
builder.recurse(subdirs, 'build')
if platform is 'darwin': # osx building
if platform is 'win32': # windows building
elif todo='link':
# do linking
elif todo='install':
# install locally
elif tood='pack':
# package for installation, maybe using cpack

Basically, you have a library to enable building locally, and you use it as a procedural order of operations to do so, rather than define black box functions you want some builder program to run. There could also be prepared build objects you could get from such a library, say, builder.preprocess(builder.defaults.qt) would supply an object to parse whatever operation is being incited (so you would use it regardless of the calling function on your script) to do the boilerplate for your choice platform.

I imagine it could go as far as to include anything from defaults.vsp to defaults.django or defaults.cpp or defaults.android. It would search on configure, include on build, and package on pack all the peripheral libraries complementing the choice development platform in one entry line.

The principle concerns with such a schema are mainly performance. You want a dependency build graph in place so you know what you can build in parallel (besides inherently using nprocs forked programs to parse each directory independently, where the root script starts the process, so you need builder.init() in any script that is meant to start a project build, but if you recurse into a subproject that calls that function it doesn't do anything a second time).

You would want to support a lot of ways to deduce changes, besides just hashes, you could use file system modification dates, or maybe even git staging and version differences (ie, a file that doesn't match the current commit version is assumed changed). You would cache such post-changes afterwards. You would probably by default use all available means and the user can turn the up for speedups with potential redundant recompilation (ie, if you move a file, its modification date changes, the old cache is invalidated, but if they hash the same it is assumed the same file moved and isn't recompiled).

If you support build environments, you can support radically different languages. I just think there are some shortcomings in both scons and waf that prevent them from truly taking advantage of their pythonic nature, and using all the paradigms available to python is one of them, I feel.

2013/06/24

Magma Rants 5: Imports, Modules, and Contexts

One of the largest issues in almost any language is the trifecta of imports, packaging, and versioning. For Magma, I want it to be a well thought out design that enables portable, compartmentalized code, interoperability between code, and the ability to import both precompiled and compiled object code.

First, we inherit the nomenclature of import <parent>:<child>, where internally referencing such a module is through the defined <parent>:<child> namespacing. Imports are filesystem searched, first locally (with a compiler limited depth, blacklist, and whitelist available) then on the systems import and library paths. You can never define a full pathname import to a static filesystem object with the import clause, but the internal plumbing in std:module includes the necessary woodwork to do raw module loading.

The traditional textual headers and binary libraries process still works. You don't want to bloat deployment libraries with development headers, though if possible I'd make it an option. Magma APIs, with the file suffix of .mapi, are the primary way to provide an abstract view of a library implementation.

In general practice though, we want to avoid the duplication of work in writing headers and source files for every part of a program to speed up compile times. This is mostly a build system problem, in that you want to verify (via hash) a historic versioning of each module, so if it changes its hash you know to recompile it. This means you should should write APIs for libraries or externalized code - which is what a c++ header really should be for.

In addition, an API only describes public member data - you don't need to describe the memory layout of an object in an API so that the compiler can resolve how to allocate address space, you just specify the public accessors. When you compile a shared object, the public accessors are placed in a forward table that a linker just needs to import out. Note that since a library can contain multiple api declarations in one binary, the format also has a reference table to the API indexing arrays.

The workflow becomes one of importing APIs where needed, and using compiler flags and environment variables to search and import the library describing that api. One interesting prospect might be to go the other way - to require compiled libraries be named the same as their apis, and to have one api point to one binary library with one allocator table. It would mean a lot of smaller libraries, but that actually makes some sense. It also means you don't need a seperate linker declaration because any imported api will have a corresponding (for the linkers sake) compiled binary of the same name in the library search path.

I really like that approach - it also introduces the possibility of delayed linking, so that a library isn't linked in until its accessed, akin to how memory pages work in virtual memory. You could also have asynchronous linking, where accessing the libraries faculties before it is pulled into memory causes a lock. Maybe an OS feature?

As a thought experiment I'm going to document what I think are all the various flaws in modern shared object implementations and how to fix them in Altimit / Magma:

You need headers to a library to compile with, and a completely foreign binary linkable library or statically included library to link in at build or run time.
You need to describe the complete layout of accessible objects and functions in a definition of a struct or class, so that the compiler knows the final size of an object type.
You need to make sure the header inclusions and library search path contained the desired files, even on disparate runtime environments.
Symbol tables in binaries can be large and cumbersome to link at runtime and can create sizable load times.

2013/06/06

Magma Rants 4: Containers and Glyphs

Containers are the most core pervasive aspect of any languages long term success. In Magma, since () denotes scope blocks (and can be named), and [] are only used for template declarations, {} and [] are available stand alone to act as container aliases like in Python. [] is an std:array, the primitive statically sized array of homogenous elements. If it has multiple types in one array, it uses std:var as the array type and uses the natural conversion from[object] conversion available in var if it a user defined type, or an overridden more precise conversion function.

{X, X, X} is for unique sets, and {(X,Y),(X,Y)} is for maps. In the same line of thinking, the language tries to find a common conversion type these objects fit in (note: the compiler won't trace the polymorphic inheritance tree to try to find a common ancestor) and casts them, or throws them in vars. The indexing hash functions for sets and maps that determine uniqueness are well defined for std types and you can implement your own as a template override of std:hash[T](T), which needs to return a uint.

Python (since I love Python) also includes the immutable list type () as a tuple, but since Magma [] is already a static contiguous std:array and not an std:dynArray, there is no performance benefit. Note that, like everying in Magma, [] and {} are implied const and can be declared muta {} or muta [] to construct a mutable version.

One of the principle goals I had in thinking about Magma is that a majority of languages overload and obfuscate the implication of glyph patterns, which makes compilation time consuming in a complex parser since syntax is very situational depending on the surrounding text in a source file. Additionally, any time you use multiple sequential glyphs to represent some simple concept (equality as ==, scope as ::, // for comments) I feel is a failure of the language to properly balance glyph allocation and behavior. Albeit, in the current documentation on Magma, I'm using == for logical equality because I turned = back into the assignment operator instead of :, solely on the basis because += is way too baked into my brain to see +: and not think that is strange, and it allowed me to use : for scope and . for property access (which are different, Java).

In conceptualizing Magma, I drafted out all the available glyphs on a standard keyboard and assigned those to language functions. As a result, glyphs like $ became available substitutes for traditional named return in other languages, and made function declarations more obvious because you declare a return type in a function definition (or fn).

2013/06/05

Magma Rants 3: Powerful variance and generics

Magma uses the same compile time template checking that C++ uses - templates (defined with square braces [] in class and function definitions). The distinction between polymorphism and templates is, I feel, still valuable, and unlike Go, I don't see anything inherently wrong with native compiled templates in the C++ vein - if a template usage doesn't support, at compile time, the functions and typing the template uses, it is a compiler error. The implementation will try to coerce types using the object generic functions object.to[T](T) and object.from[T](T), if either direction is defined (because either class could define the conversion to another type) the cast is done. This avoids the ambiguity of dynamic casting in C++, because there is a well defined set of potential casts for every object and the only difference between static_cast and dynamic_cast are if the casts themselves are implemented as const or not. Const casting still exists but requires the "undefined" context to allow undefined behavior (ie, mutating a passed-in const object can be very bad). Const cast is found in std:undefined:ConstCast().

From the other direction, Magma contains std:var, which is the stratos autoboxing container type. It is used pervasively as a stand in for traditional polymorphic [Object] passing because you can retrieve the object from a var with compile time guarantees and type safety, and var includes a lot of additional casting functionality not found in native Magma casts in strings and numbers. If you have a heterogeneous collection, you almost always want the contents to be vars, unless you have a shared common restricting ancestor to denote a subset of objects. You can still query the type of a var, and it delegates all interactions with it besides the ? operator to the descendant. If you really need to call query() / ?, call var:get[T](). You can also get the contents get function by getting it and calling get on it.

Magma also has the auto keyword as a type specifier to reduce an rvalue equality, in the same way C++ does. It statically reduces type from an rvalue statement and parses as such.