2014/05/07

Hardware Rants 1: An Open Future

I keep going back to this (probably ignorant, and stupid) idealization that we are still wasting our time on ancient technology. We still use tar. But this is not about software, it is about hardware.

x86 is already approaching its 30th birthday, and it was never a well designed beast to begin with. Itanium was meant to replace it, and I wish it did, but today we don't have that luxury. I think in most ways Itanium failed because it didn't innovate enough.

There is this massive disconnect in consumer electronics. Everyone does things their own way - custom silicon from the ground up, mostly because the free software ideology has never translated into a free hardware one. There is literally no modern transistor based machine fabricated with a modern process that is also open source. Which is the whole problem, because if being open gimps your ability to function, you lose the benefits.

I have been thinking in recent months about, architecturally, how to tackle this. We are still building on all these legacy proprietary monstrosities, so if we toss all that out for a moment and tackle the parts on their own, we might be able to come to some reasonable conclusions about where to go from here.

First, firmware. You need to start with something. In hardware terms, you have ROM you payload into a cpu, and it goes from there. I think one principle to take to heart in future innovations in the computer space is the ability to package all the tech together in one die. Modern Intel CPUs, for example, have the entire northbridge that ten years ago was its own discrete circuitry onboard. I don't see any reason a newer architecture could not go further with this - having a real system on a chip is hugely advantageous, because it lets you make assumptions.

What would we want on a SOC? Bandwidth channels to external interfaces (I'll get to that), a firmware, something to signal the chip to start and stop, local memory (that may one day simply be enough for most systems) to use before attaching external memory, voltage regulation, and processing "space". I say space because that can include branch predictors, cache, fpus, simd core arrays, or generic pipelined cores. I'd call it core soup, though, because you go beyond AMD's HSA and just treat everything as a first class processor. 

Which I think is really important. We are still developing graphics hardware with each generation having a (usually) proprietary microcode that is generated by (usually) proprietary drivers on the cpu side. That is so bad it hurts.

I really want to talk about interconnects, though. Modern systems are a hodge podge of disparate technologies -  legacy IDE, PCI, serial, and parallel. System specific inter-processor interconnects like QPI. Memory address interfaces like DMI. PCI Express, which has done good in becoming a pretty standard transport interface - it is included in Thunderbolt, for example. Sata and SAS, and don't get me started on how I have no idea how we got into this mess where enterprise uses a different hard disk interface from the consumer space. Or why SATA still requires dedicated power.

In truth, though, all these interfaces are just trade offs along a few spectrum:

  • Bandwidth, where you need wider busses and increased synchronization mechanisms or timing control to regulate it, and you almost always are frequency limited already and need to deliver more bandwidth via parallel interfaces.
  • Latency, which is the difference between real time transports and packet based ones.
  • Distance, when you have cabling traveling meters rather than milimeters, that needs to be flexible, you can't send as high frequency or low jitter signaling as you can on a pcb.
  • Interface, where SATA has protocol operators described as electric signals, DMI is just direct processor based hardware to address a numeric node in a memory array. The later is not portable, but the former is.
The modern problem is we have all these interfaces centered around the first three, when all we really should be discretizing is the last one. Thirty years ago manual clocking or bus rate timings were a real issue, but today dynamic frequency and timing has crept into our processors like a child's plaything, but our interfaces never take advantage of that tech. Mainly because they are old.

In principle, what we really need are the following interfaces to almost any computer:
  • High bandwidth, low latency interconnect between central system components (memory, other processors, graphics accelerators, local storage)
  • Low bandwidth, high jitter tolerance, peripheral interconnect to peripherals that don't demand large bandwidth needs, that minimizes latency.
In theory, you could have one negotiation API for all hardware on any interface, and communicate to it with its choice protocol. It is all electrons on copper anyway, right? We can adopt the PCI worlds concept of lanes here - you have bandwidth lanes off the CPU nodes, and those lanes can be linked to memory, other processors, graphics accelerators, local storage, etc. You make trade offs - less local devices need to run at lower frequencies with more spectrum spread, sacrificing bandwidth for transmit distance. All this could be done by some basic negotiation protocol - the board itself has rom sectors describing each bus's locality, and the rated speeds and lanes for each device.

The serial version is just a single lane, with the same semantics. Rather than have addressing protocols at the controller level, you can have them on the devices and have communication negotiation with the host.

Each lane would have power delivery - probably on the order of 5v 1.1a per lane for 5.5w pull each, so a 30 lane gpu could pull 165w before it needs external power, or an ssd could use 5.5. I could imagine a justification here for having a higher power variant, or just having power tuning on the chipset (which is what we are going for, after all) running off 12v.

Speaking of voltage, this is an area I am just not involved in, but I'd definitely look into depreciating the 3.3 / 5 / 12 volt rails for one common carrier DC voltage that the board itself can distribute appropriately. And getting rid of 24pin connectors. Those things are so stupidly huge.

You would want internal and external variants of these interfaces - though I'd imagine you could just stick the same connectors in each, really, like esata. You would want to include some kind of mounting mechanism for heavy equipment, though.

And the most important policy in all this is that such a system would need to be open. Open standards, open consortium run technology. No patents, open design documentation - at least in the reference implementation.

As an aside, every standards organization that doesn't maintain a reference implementation is lazy. Looking at you, Khronos group. Adopt Mesa, and if it isn't matching your feature bulletins than you might want to slow down pushing the envelope.

As usual, I'm interested in this goodness. If you have money and want to pay me to work on this full time, contact me via the usual channels. I got some passion here I'd need to spend a year draining to the bone to get out of my system.