2012/12/26

Software Rants 7 : Function Signatures

While thinking about Al, one thing that would be really nice if all object definitions behaved the same, akin to C++11 universal initializing syntax.  For one, classes and functions should be defined like normal data, so if we use a syntax like int x = 5, you want to have class foo = ?.  The first question on this front is what is the minimum syntax to define any data.  The best way to deduce how to best go about this is to look how it is done in other languages, and it isn't that complex.

  • In the C syntax languages with static typing, an integer is always int x = 5, or if you want to heap applocate it, you do int *x = malloc(sizeof(int)); *x = 5;
  • In Python and Ruby, you use dynamic typing and forgo the int part, so it is x  = 5, but in map declarations it is x : 5.
  • Perl uses $ name declarations, so it has $x = 5.
  • Haskell just defines numbers like Int foo -> 5.  
  •  In Shell, it is x=5.
  • In Javascript it is usually var, let, or nothing x = 5, but in maps it is x : 5.
As a consistent notion, the equals sign is used, except in maps, where colons are used.  As a grammatial lex, : defines an is relationship, and = defines equality.  I am absolutely considering : as the = statement of definitions and letting = just be logical equals like == in most languages.

Regardless, the syntax is consistent.  In Al, you would have static strict typing, in that int x : 5.3 will error of an unspecified cast from float to int with loss of precision.  int x : "s" fails.  auto x : 5 resolves to an integer type, and you can drop the auto and just have x :  5 which behaves like auto.

As an aside bitwise operations like | & and ^ are becoming more and more unutiized next to their logical counterparts, I'd definitely reserve | & and ^ for or, and, and exponentiation respectively.  If I were to have glyphic bitwise operations I'd use || && or ^^ for those, if anything.  I'd probably just reserve bitxor, bitand, and bitor as keywords and forgo the glyphs since they are so situational.

So if we have a syntax of int x : 5, a function would be func foo : ?.  We want templated function definitions, so our function should be something like func< int (int) > in C++ but the syntax int ( int) isn't conductive of a comma separated value list like a template specification.  Once again, we want to minimize the syntax, so the smallest definition of data with all unnecessary information remove would be something like:

int < int, int foo : int < int x, int y { }.  If < wasn't less than, it would just be the function signifier to deliminate a return value and the function arguments.  This syntax leaves something wanting though, so we try the verbose optimally readable way:

func[int]<int, int> foo : [int](int x, int y) {
}

This looks a *lot* like C++ lambdas.  On purpose.  The capture group just fills the roll of the return type declaration, but if we want a function signature we need that information.  Function templates get uglier:

template<Z> func[Z]<int, Z> foo : template<Z>[Z](int x, Z z) {
}

This happens because the rvalue requires a template definition but so does the l value because that acts as the signature.  We redefine the signature twice.  This is absurdly redundant, so what we want is the concept that if we have an rvalue function we never redefine a signature because the static typing of the function declaration already brought that up.

template<Z> func[Z]<int, Z> foo : func(x, z) {
}

So the arguments types were defined in the signature, and we named them in the definition.  The problem here is that if you were to declare but not define foo for some time (given we even allow that, we would probably want this language to forbid nulls, and all functions are inherently a type of reference into code space) then when you actually do define it, you end up with something like:

foo : func(x, z) {
}

And that definition gives no information to the original signature.

Of course, in practice, you can't do this.  You can't have late binding on a function because it occupies built up code space, not stack or heap space.  Defining it in two places is practically worthless because the compiled code locks foo to be its function at compile time and you can't reassign it because it means you have code pages no longer named.  That means that raw function declarations are inherently constant and final.  Meanwhile, something like :

ref<func> bar; bar : ref(foo) is valid.  You are taking references to defined functions, but you must name them at least once and they can't go unnamed.

The same thing happens with classes.  It might be why in traditional language syntaxes classes and functions don't behave like the data types they represent, because the definitions are disjoint from implementations.  While classes and functions are data types, they are data type definitions - you instantiate instances of them.  They are inherently constant and final and bound to their definition names.  So if you use a global declarative syntax like foo : func, you introduce ambiuity without making the full definition something really ugly like:

const final template<Z> func[Z]<int, Z> foo : func(x, z) {}.

So let us save some time and call it template<Z> func foo : Z(int x, Z z) {}.  Maybe have state defined like func:private,extern,static foo(int x, float y) : Zoo {}.  It is kind of backwards, because it implies func : type of return is the signature rather than thing : value, but the value of a function is its return time if it is pure, for example.

No comments:

Post a Comment