The next applications language?

The next applications language?

I stumbled across one of Douglas Crockford’s recent presentations on YouTube a couple days ago, while I was soaking up Harmony (ECMAScript 6) info. He covers quite a bit, I recommend watching it, at least if you’re ever going to write any code, ever. 

What really started me thinking though was a seemingly simple question he asked: Why do application languages have number types? It is simply embarrassing that I’ve never asked myself the same question about Java, given that I’ve probably written something in Java eight or nine out of every ten days averaged over the last decade.  And I know WHY we have short, long, int etc. I’ve had to code 16bit addition and multiplication on an 8bit micro-controller, etc etc. 

However, for years I’ve played the ‘int or long?’ game. Usually I just defaulted to int, as that’s what the language encourages. Now, much of time we’re talking about data structures that eventually touch a database, so there was some (perceived) reason to care about type size. But not always, and, well… SO WHAT! Marshaling a Long down to an Integer on the database isn’t any harder than marshaling an Integer – nor is it prone to more error. Not that there’s much reason for using Integer instead of a Long in the database, either.

Here’s the thing about Java land though (and most of C++ land nowadays): the pointer to your data takes up as much space as your largest numeric datatype (I’m ignoring tight loops and local variables for this discussion). I’m not even sure if Java compilers will even bother trying to optimize the memory space of an object composed of, say, two integers.  Technically it’s possible to store those two ints into one 64bit location… but I would be really surprised if Java does this. Even with a short. Or a boolean. So, what do you win by choosing to store a value in a Short? 

One could ague that it is a way of expressing what the code is supposed to represent. For example, using a Long to store a persons age, in years, seems (to us old-folk) like overkill – and in Java, it would imply that we meant millis, at least to most experienced Java developers . But does using a Byte actually buy us anything? Nope. Even if we had an unsigned Byte type in Java, which would represent age pretty well while also providing a nice bit of ‘what if’ buffer, the type itself is simply not prominent enough to really communicate meaning. Naming the field ageInYears would be much more clear.

However, if we use a byte and people start living to be 300 years old – or we repurpose the code to track giant sea turtles or mythical figures – the byte breaks. And it breaks, as Douglas points out, in the worst of all possible ways: by silently rolling around to the literally most-wrong value the field can ever represent. This is GOTO level insanity. That Java was created with this behavior – even after decades of C providing example after example highlighting why overflow is a terrible design flaw – is kind of appalling. At least Swift forces you to opt in – progress!

Douglas Crockford makes clear argument for why having multiple Integer types in an applications language (as opposed to a systems language – ‘bare metal’)  is worse than useless. The gist being that the choice itself is false, and the only effect of choosing smaller types is increased chance of errors. So why the hell would you choose anything but Long? You shouldn’t. You aren’t deploying your application on 8bit processors, so stop worrying about it.

Mr. Crockford goes further though, extending the argument to decimal types. My initial response was “No way! We need two types! Integer and Decimal. The world will end without the choice!” I find that I am coming around though. As he points out, it’s the fact that we’re stuck with 2’s compliment arithmetic that drives us to ‘need’ an integer type.  If we could count on addition, subtraction and multiplication of integer values to always produce integer results we could abandon the “integer crutch”. Interestingly, the more I consider it, the more I’d be willing to bet that hotspot analysis could do a pretty excellent job of determining when the compiler can ‘assume’ a value is only ever going to hold an integer value – if it even matters, from an efficiency standing. Recall, we’re talking about  high-level applications code:  typically not incredibly math-heavy, and ease of parallelization is at *least* an order of magnitude more important than single threaded performance.

With a few utility methods, hopefully provided by shortcut operators, doing integer math (specifically, division) on a floating point decimal number would end up consuming far less cognitive overhead than dealing with multiple data types. ESPECIALLY as we’d be able to get rid of ‘money’ types, and other 2’s compliment induced, floating-point-hell spawned hacks so common in modern software development.

So I’m game. I’m with Mr. Crockford in adding ‘Single Numeric Type’ to my own list of ‘must haves’ for my ‘ideal’ next major applications language. 

What else would I add to that list, today? Well, lambdas, obviously. 

A larger concern to me of late: optional type safety, a la Dart. What’s more, I don’t think we should limit our definition of ‘type’ as we do today, at least at the application level. When I declare a variable, I should be able to specify not just that it is supposed to represent a number, but also the entire range of values which it can ever hold. This should be absolutely integral to the language, not hacked on via addons, no matter how well implemented (e.g. Hibernate Validations, Matchers, etc). 

A good application language should let me plug along, figuring out the shape of my data as I go – one of the more powerful features of JavaScript. Once we’ve decided to started to push real data into a long term data store I should be able to ‘freeze’ that data structure, and enforce its shape. For example:

let personType = { username: isString().minLength(10).maxLength(20).isEmail(), etc: 'etc' };
let aPerson = PersonType.create( {username: 'sue@home.net' });

As a bonus, a well designed validation-enforcing language should not have much trouble actually generating valid (and intentionally invalid), fake data from well defined object types. Which is to say, if you can configure your application’s test framework to use your actual type definitions to generate data that is actually representative of your real world data, and then use that data to test your code, well, you can be pretty sure that you’ve defined your datatypes as intended, right?

As an example, consider the above personType: would you expect ‘joe@joe’ to be a valid value for ‘username’, if this were JavaScript? Probably not, right? Sucks to be you! The ’email’ validator on the HTML5 input allows it. Because it is a valid internal email address. You’d need to provide an additional customer validator (and associated generator)  – at least in this example. Hopefully not in our idealized ‘real’ future application language.

Speaking of validations: If you’ve dealt much with such frameworks you’ll probably get the motivation for my next desired language feature: No thrown exceptions. I’m a bit on the fence with respect to ‘core’ language exceptions. But definitely no user-thrown exceptions. Even in Java land, firmly in the middle tier, tracking and handling exceptions is just horribly prone to introducing further errors. In JavaScript, in a web app, it’s simply absurd: exceptions have no place in event-driven programming. As parallelism grows (exponentially) in importance, more and more code will either _be_ event-driven, or will behave in a similar fashion. Thrown exceptions are the new GOTO – although not as violently embraced. Actually, it was the Go specification and discussion that made me realize just how broken exception handling is as a concept. 

The reason validations plays into this isn’t obvious, unless you’ve been forced to gather the results of multiple validation failures and then throw an exception to notify “the system” that something ‘exceptional’ happened. The paradigm is this weird mix of “forced early return” and “global event handling”: you can only be certain that the exception will bubble back up your current thread – though it could leak into the invoking thread, if forwarded by the framework or some cautions developer. At every step up the stack, the exception becomes less and less relevant, less and less likely to be handled (at least intentionally) – and more and more dangerous. 

Furthermore, is a weak password really an  ‘exceptional’ state? Is *any* invalid user input? Would you ever add an event to a global event queue: “Invalid password!” No. At least, I hope not. You might publish a message on a message bus, with the username in the header. But aborting your entire call stack and rewinding it to… where? 

The Go solution is essentially multi-variate return with an error object as a secondary return value. I don’t think I can improve much on this, possibly because I love the concept so much, and am thus biased towards it :~).  Save exceptions for truly exceptional behavior: Out of memory errors, corrupt stack/heap, etc. More importantly, implement real exceptions more akin to events than today’s exceptions: fire off the ‘exception event’. If a listener captures the exception then the listener can decide what to do with it: abort and retry, send the external invoker the equivalent of a ‘500, Server error’, etc. If nothing handles the event, provide a default that makes sense for the given environment (server side: abort the thread. Client side: “It’s dead, Jim” – etc). 

There’s so much to hope for, known and unknown, in a future language. What’s on your list?