Before my now-previous employer folded, we were being courted by a buyer (boy, was that time period an emotional roller-coaster!). This would have changed our development direction somewhat, and so at the behest of my boss I spent a couple of weeks exploring language options for our upcoming greenfield projects.

I was asked to explore Clojure and Go, and compare them to our existing language, PHP. I decided to evaluate the languages by porting a fairly stand-alone portion of our existing app into both Go and Clojure, and comparing it to the PHP version.

The Basic Problem

That relatively stand-alone portion of our app was our recipe importer. It would allow users to import recipes from external sites, for use within our system.

There are a number of standards for recipe formats: the hRecipe microformat, schema.org's Recipe format, RDF, data-vocabulary.org's Recipe format, and probably a few others. In addition, most of the sites we explicitly supported either fail to mark up the entire recipe, interpreted the spec they purported to follow in strange ways, or merely referenced a recipe from another site, requiring us to follow the link chain to a parsable source.

In other words: it's a real (non-toy) problem, it's sufficiently complicated to get a reasonable feel for a language, and it's simple enough it can be ported within a relatively short timeframe.

The Method

I spent a week in each of Clojure and Go. This was not enough time for a full feature-parity port with the PHP version, but sufficiently long to present most of the complexity of the problem (inheritance, parser registration, and so forth).

Inheritance Is… Uh… Missing?

Neither Go nor Clojure support "classical" OO inheritance. It may be my brain is broken through years of shoehorning things into inheritance hierarchies, but it seems to me this particular problem happens to be a pretty good fit for classic OO. "Site X uses format Y, with these overrides."

So the biggest stumbling block in both languages was simply coming up with a way to model the problem. To be perfectly honest, the solution I ended up in both languages looks pretty much like classic OO shoehorned into whatever the language does support (embedded structs in Go, multimethods in Clojure), and would almost certainly not be considered idiomatic by people with more experience in each language. (But then, neither would my PHP, and I've been doing that for a very long time.)

Clojure

As a lisper bent on proving that a lisp was a viable option, I started with Clojure.

I spent approximately three days figuring out how to even model the problem without classic OO inheritance, and the following two building three generic parsers and two site-specific ones.

It was one of the most enjoyable weeks of my tenure. Clojure is as flexible as you'd expect a Lisp to be, the code was compact, reasonably fast, and I was reminded of how much I'd been missing interactive development.

When I presented it alongside the PHP it was a port of, reactions were mixed. Some developers remarked that they had a very difficult time understanding it, and that it would likely prevent the developers who worked on other codebases (e.g., our mobile developers) from swooping in to make a quick fix. My boss, who had apparently spent a significant amount of time attempting to do something useful in Clojure without much success, remarked that the comparison made Clojure much easier to understand for him than it had been previously.

Go

Next I tried Go. Having already faced a lack of inheritance, and presented with Go's much smaller field of options to choose from, I was able to jump right in, and hit parity with the Clojure version in short order.

I did not particularly enjoy programming in Go. While Go is strongly opinionated in surprisingly good ways, the edit, compile, debug cycle is obnoxious and slow (goauto may have helped reduce this pain), the code ends up verbose and repetitive, and there are little to no facilities for abstracting out patterns.

The Go code received less objection when presented. The lines were significantly longer than either the PHP or Clojure versions, and vertically it was very slightly less dense than the PHP version due to all the error checks.

Subjective Comparisons

Code Feel
PHP Clojure Go

Is it any wonder I prefer lisps? So much meatier.

Learning Curve / Programmer Effectiveness
Go was the easiest to pick up, but there isn't really anywhere left to go once you do. Clojure is definitely harder to become proficient with, but you can leverage that into increased programmer efficiency later. PHP has so many pitfalls that anyone starting out is likely to end up negatively productive at first, writing a horribly insecure buggy mess.
Developer Emotions
Clojure inspired a mixture of enthusiasm and curiosity, with strong concerns for the practicality of finding and training developers in a language which is so different from mainstream ones. Go had an enthusiastic champion in my boss, but was largely met with ennuitic "meh"s by other developers (golang is basically boring-by-design). PHP was loved by none and hated by some, with the only argument for it being "we've already got a bunch of code written in PHP".

Recommendation

I liked Clojure. It's fun to write in, it feels very productive, and the power lisps provide to build your own abstractions is unparalleled. But as varied as the quality of code was in our PHP, providing room for even more variability is a hard case to argue. While I would have loved to recommend Clojure, asking a team of devs who think in terms of C++ to make the transition was more than I could have sold.

Go, on the other hand, solves a lot of problems that we had with PHP. Stuff the wrong type? Go would have caught that. Inconsistently-formatted code? gofmt would have fixed that. Bugs because PHP did something unexpected unless you thoroughly read and retained every aspect of the documentation? Go's house is not built from bundles of thorns.

So while Go may not be the language for me, it would have been the right call for our team.

Available For Work

As you may have guessed by the mention of my previous employer folding, I am presently available for work. Feel free to contact me if you have something you think I may be interested in.

Apologies for the short notice--I've been rushed just trying to get the darn thing ready--but I will be giving a talk at Iowa Code Camp on Saturday, November 1st.

If you can't attend, or want to review my talk before/after I give it, you can find it here. Feel free to offer any feedback. It's much too late to make major changes at this point, but feedback is welcome regardless.

(Technically the talk is about PHP, but tagged Lisp because I blame Common Lisp for inspiring a bunch of it.)

I was highly amused by the reactions to my previous post on the programming and php subreddits. So I spent a little time to isolate the typehint conversion optional argument issue.

It’s not me, it’s PHP. And frankly, I don’t understand the point of the backtrace going to the trouble of making things references if doing things with those references isn’t even supported.

That bug description isn’t entirely accurate as to what’s going on, however, as some test cases show.

<?php

function f(int $a, int $b = null, int $c = null) {
  error_log(json_encode([$a, $b, $c]));
  error_log(json_encode(func_get_args()));
}

function changeArg($code, $str) {
  preg_match('/^Argument (\\d+) /', $str, $m);
  $i = $m[1]-1;
  $bt = debug_backtrace(null, 2);
  ++$bt[1]['args'][$i];
  return true;
}

set_error_handler('changeArg', E_ALL);

f(3, 2, 1);

Outputs:

[4,2,1]
[4,3,2]

Note that the required argument changes, but the optional argument does not. Also note that func_get_args() returns the altered argument, in disagreement with the actual argument.

Where it gets interesting is if we alter that $i to be $i+1. That is, if we change the argument after the one we got an error about.

<?php

function f(int $a, int $b = null, int $c = null) {
  error_log(json_encode([$a, $b, $c]));
  error_log(json_encode(func_get_args()));
}

function changeArg($code, $str) {
  preg_match('/^Argument (\\d+) /', $str, $m);
  $i = $m[1]-1;
  $bt = debug_backtrace(null, 2);
  if (count($bt[1]['args']) > $i+1) {
    ++$bt[1]['args'][$i+1];
  }
  return true;
}

set_error_handler('changeArg', E_ALL);

f(3, 2, 1);

Outputs:

[3,3,2]
[3,3,2]

Then the change sticks, and the optional argument is modified.

By switching to $i-1, you’ll notice that even the required argument fails to be modified. Which means in the default case where one tries to modify the argument an error was triggered about, the argument’s reference is being broken slightly earlier for optional arguments than for required arguments. That, to me, strongly suggests a bug, because the behavior is inconsistent.

But my use-case isn’t supported, and presumably it doesn’t affect any other use-cases, so I’ll either have to live with it or write a code pre-processor which fixes up the issue. Ah well.

I'm left confused about why there exists such a thing as continuable errors if fixing the error and continuing on isn't supported, but ... PHP isn't exactly known for it's strong sense of feature coherency.

Controller Methods

What we call a “controller method”, at least, is simply a function which is called on the web side via AJAX and returns some value back (generally either JSON or a snippet of HTML).

When I started at my current employer, controller methods looked like this:

class MyController extends Controller {
  function f() {
    if (!$this->required('foo', 'bar', 'baz')) return;
    if (!is_numeric($this->requestvars['foo'])) return;
    // lots of code
    if (isset($this->requestvars['quux'])) {
      // do something with quux
    }
    // more code
    return $someString;
  }
}

Aside from being hideous, this has a number of glaring problems:

  1. The error handling is terrible.
  2. Validation is hard, and thus incredibly easy to screw up or forget entirely.
  3. Even something that should be easy―merely figuring out how to call the method―requires reading and understanding the entire method.

This just won’t do.

Surely it would be much nicer if controller methods could simply be defined like a regular function. Fortunately, PHP offers some manner of reflexive capabilities, meaning we can ask it what arguments a function takes. We can then match up GET/POST parameters by name, and send the function the proper arguments.

In other words, we can define the function more like:

class MyController extends Controller {
  function f($foo, $bar, $baz, $quux = null) {
    // lots of code
    if (isset($quux)) {
      // do something with quux
    }
    // more code
    return new Response\HTML($someString);
  }
}

And have it actually work. That’s much nicer. Now, we can call the method from PHP as easily as we call it from JavaScript, and we don’t have to read the entire function to figure out what arguments it takes.

(The astute reader will also notice I’ve moved to returning an object, so the response has a type. This is super-handy, because now it’s easy to ensure we send the apropriate content-type, enabling the JS side to do more intelligent things with it.)

Of course, this only tells us which arguments it takes, and whether they’re optional or required. We still need easier data validation. PHP provides type hints, but they only work for classes. Or do they?

Type Hints

In a brazen display of potentially ill-advised hackery (our code is a little more involved, but that should give you the general idea), I added an error handler that enables us to define non-class types to validate things.

So now we can do this:

class MyController extends Controller {
  function f(
    int $foo,
    string $bar,
    string $baz,
    int $quux = null
  ) {
    // lots of code
    if (isset($quux)) {
      // do something with quux
    }
    // more code
    return new Response\HTML($someString);
  }
}

And all the machinery ensures that by the time f() is executing, $foo looks like an integer, as does $quux if it was provided.

Now the caller of the code can readily know what the value of the variables should look like, and the programmer of the function doesn’t really have an excuse for not picking a type because it’s so easy.

Of course, this isn’t sufficient yet either. For instance, if I’d like to be able to pass a date into the controller, it has to be a string. Then the writer of the controller has to convert it to an appropriate class. Surely it’d be much nicer if the author of the controller method could say “I want a DateTime object”, which would be automagically converted from a specially-formatted string sent by the client.

Type Conversion via Typehints

Because PHP provides references via the backtrace mechanism, we can modify the parameters a function was called with.

class MyController extends Controller {
  function f(
    int $foo,
    string $bar,
    DateTime $baz,
    int $quux = null
  ) {
    // lots of code
    if (isset($quux)) {
      // do something with quux
    }
    // more code
    return new Response\HTML($someString);
  }
}

So while $baz might be POSTed as baz=2014-08-16, what f() gets is a PHP DateTime object representing that date. Due to the implementation mechanism, even something as simple as:

$mycontroller->f(1, “bar”, “2014-08-16”);

Will result in $baz being a DateTime object inside f().

Caveat

There is an unfortunate caveat, and I have yet to figure out if it’s a quirk of the way I implemented things, or a quirk in the way PHP is implemented, but optional arguments do not change. That is, SomeClass $var = null will result in $var still being a string. func_get_args() will contain the altered value, however.

Multiple Inheritance and Method Combinations

PHP is a single inheritance language. Traits add some ability to build mixins, which is super-handy, but has some annoying restrictions. Particularly around calling methods―in particular, you can’t define a method in a trait, override it in a class which uses a trait, and then call the trait method from the class method. At least, not easily and generally.

Plus there’s no concept of method combinations. It’d be really handy to be able to say “hey, add this stuff to the return value” (e.g., by appending to an array) and have it just happen, rather than having to know how to combine your stuff with the parent method’s stuff.

While I’m sad to say I don’t have this working generally across any class, I have managed to get it working for a particular base class where it’s most useful to our codebase. Subclasses and traits can define certain methods, and when called, the class heirarchy will be automatically walked and the results of calling each method in the heirarchy will be combined.

trait BobsJams {
  static function BobsJams_getAdditionalJams() {
    return [ new CranberryJam(), new StrawberryJam() ];
  }
}

trait JimsJams {
  static function JimsJams_getAdditionalJams() {
    return [ new BlackberryJam() ];
  }
}

class Jams {
  function getJams() {
    return (new MethodCombinator([], 'array_merge'))
      ->execute(new ReflectionClass(get_called_class()), "getAdditionalJams");
  }
}

class FewJams extends Jams {
  static function getAdditionalJams() {
    return [ new PineappleJam() ];
  }
}

class LotsOJams extends FewJams {
  use BobsJams;
  use JimsJams;

  static function getAdditionalJams() {
    return [ new OrangeJam() ];
  }
}

(new LotsOJams())->getJams();
// => [ OrangeJam, CranberryJam, StrawberryJam, BlackberryJam, PineappleJam ]

(The somewhat annoying prefix on the traits’ method names is to avoid forcing users of a trait to deal with name collisions.)

Naturally, all the magic of the getJams() method is hidden away in the MethodCombinator class, but it just walks the class hierarchy―traits included―using the C3 Linearization algorithm, calls those methods, and then combines them all using the combinator function (in this case, array_merge).

This, as you might imagine, greatly simplifies some code.

Oh, but you’re not impressed by shoehorning some level of multiple inheritance into a singly-inherited language? Fine, how about…

Context-Sensitive Object Behavior

Web code tends to be live, while mobile code is harshly asynchronous (as in: still needs to function when you have no signal, and then do something reasonable with data changes when you do have signal again), so what we care about changes between our Mobile API and our Web code, and yet we’d still like to share the basic structure of any given piece of data so we don’t have to write things twice or keep twice as much in our heads.

Heavily inspired by Pascal Costanza’s Context-Oriented-Programming, we define our data structures something like this:

class MyThing extends Struct {
  public $partA;
  public $userID;
  // ...
  function getAdditionalDefaultContextualComponents() {
    return [ new MyThingWebUI(), new MyThingMobileAPI() ];
  }
}

class MyThingWebUI extends Contextual {
  public $isReadOnly;
  // ...
  function getApplicableLayer() { return "WebUI"; }
}

class MyThingMobileAPI extends Contextual {
  public $partB;
  // ...
  function getApplicableLayer() { return "MobileAPI"; }
}

The two Contextual subclasses define things that are only available within particular contexts (layers). Thus, within the context of WebUI, MyThing appears from the outside to look like:

{
  "partA": "foo",
  "userID": 12,
  "isReadOnly": false,
}

But within the Mobile API, that same $myThing object looks like:

{
  "partA": "foo",
  "userID": 12,
  "partB": "bar",
}

In addition to adding new properties, each layer can also exclude keys from JSON serialization, add aliases for keys (thus allowing mobiles to send/fetch data using old_key, when we rename something to new_key), and probably a few other things I’m forgetting.

Conclusion

PHP is remarkably malleable. error_handlers can be used as a poor-man’s handler-bind (unlike exceptions, they run before the stack is unwound, but you’re stuck dispatching on regular expressions if you want more than one); scalar type hints can be provided as a library; and traits can be abused to provide a level of multiple inheritance well beyond what was intended. While this malleability is certainly handy, I miss writing code in a language that doesn’t require jumping through hoops to provide what feel like basic facilities. But I’m also incredibly glad I can draw from the well of ideas in Common Lisp and bring some of that into the lives of developers with less exposure to the fantastic facilities Lisp provides.

Bonus!

My employer is desperate for user feedback, and as such is offering a free eight week trial. So if you want to poke at stuff and mock me when things don’t work very well (my core areas are nutritional analysis for recipes and food-related search results), that’s a thing you can do.

If you're outside the US, I should warn you that we have a number of known bugs and shortcomings you're much more likely to hit (we use a US-based product database; searching for things outside ASCII doesn't work due to MySQL having columns marked as the wrong charset; and there's a lot of weirdness around time because most user times end up stored as unix timestamps). The two bugs will be fixed eventually, but since they're complicated and as the US is our target market they're not exactly at the top of the list.

November 2021

S M T W T F S
 123456
78910111213
14151617181920
212223 24252627
282930    

Syndicate

RSS Atom

Most Popular Tags