In Depth Tutorial on Writing a Slackbot

(This is a repost of an article I wrote on Monsoon’s blog prior to Capital One acquiring us.)

At Monsoon (my employer), we are avid users of Slack. It’s a great collaboration tool in addition to adding a new social dimension to the office. We just crossed 500k messages sent over the platform and we’ve only been on it for a few months!

We recently held a 4-hour slackathon at Monsoon where people were tasked with writing the most useful Slack bot they could think up. The winner was a secret polling script that we use to vote on controversial topics such as what to name our teams or who the coolest person in the office is. We chose to do our event using Hubot, a popular open source bot framework written by Github. Half of the participants were mobile developers, so JavaScript, the hubot scripting language, was foreign to them. We spent a few minutes prior to the event training everybody on how to write scripts to ensure an even playing field.

I’d like to share our Slack tutorial with the rest of the community.

Setting it up takes 5 minutes

To get started with the tutorial, you’ll need to setup your machine by installing Hubot. The most important step is the first two: clone the repo and then run:

$ npm install

Once you’ve done that, you’re ready to write scripts! In the scripts folder you’ll find a file called slackbot-examples.coffee. This example script is a more feature-rich set of examples than the default that comes with Hubot. We’ll go over these examples in greater detail below.

The first thing to notice is that this is a “coffee” file. CoffeeScript is a language that compiles into JavaScript. It is popular with some communities due to its terseness compared to JavaScript. If you don’t like it, you’re welcome to write scripts in JavaScript by naming your file .js instead of .coffee.

Talking to the Bot – robot.respond

In the next step, we start working with a bot.  All bot behaviors start with a listener. The first one we’ll review listens for messages directed at the bot.

When you mention the bot directly in a room via @botname, followed by the command, the bot will execute the above block of code. In this case, it will look for the text “@botname sleep it off.” This behavior will also trigger if you privately message the bot with the text “sleep it off.”

Either of these will trigger the bot to run the command msg.send ‘zzz…’

Making the Bot Say Stuff – msg.send

Now that the bot is listening for messages directed at it, let’s see if we can get it to talk back.  The msg.send command tells the bot to send out a message to the current chat room (that told it to “sleep it off”). In this case, the bot will say, “zzz…” publicly. msg.send always replies in the same channel it detects the original message.

It Sees Everything – robot.hear

We’ve already programmed a bot to respond to messages directed at it, but you can program a bot to listen to conversation anywhere in the office, and respond to a specific word or phrase.  The second type of listener is robot.hear, a blanket listener that reacts to a phrase regardless of who it is directed at.

In this example, we are using the regular expression looking for the word (and not just a phrase containing) “up.” If anybody says “up,” this block will trigger. It will also trigger if you direct “up” at the bot; both of these would trigger the behavior:

Michi: up
Michi: @botname up

It Can Remember – robot.brain

Bots can also store information for retrieval later.  In the “up” example above, the robot initializes and/or increments a value which we save as everything_uppity_count. It does nothing more. In this case, a user can say, “up” all they want and nothing will seemingly happen while the counter increases. This is done through the “brain,” which is a simple key-value store.

Note that the “brain” uses Redis to store its contents. This way, if the bot restarts, the data is preserved. If Redis is not running, the bot will still function, but all data is lost next time the bot restarts.

In the second example of robot.hear, the bot retrieves the current value of everything_uppity_count and displays it via msg.send. As a reminder, this means the robot would just reply in the chat room that it heard the “are we up?” statement.

Calling People Out – msg.reply

Bots can add tailored prefixes to their responses. You can use the command msg.reply for this. msg.reply probably does *not* do what you think it does. Rather, it acts similarly to msg.send except that it prefixes whoever authored the original message it is replying to.

In the above example, the script will simply reply to the original sender as illustrated in the following theoretical exchange:

Michi: What’s up!
Bot: @Michi What’s up!

Note that the reply is in the channel where you sent the original message. If this was a private chat room between you and the bot, the reply would have appeared there.

Replying to Private Messages – Advanced msg.send

Handling private messages is a little more tricky. This is because Hubot doesn’t treat private messages differently from any other types of message. Instead, you have to examine the room that the message is sent in.

In order to reply to a private message, you need to check if the room shares the name with the bot. If the channel names are the same, it is implied that the channel is a private channel. We’ve provided the helper methods to accomplish this:

Starting New Private Conversations – robot.messageRoom

Sending unsolicited private messages is more straightforward. Just remember that private messages are just another room named after a user. To accomplish this, simply tell the bot to message a room:

Notice that an error can be thrown if the channel is invalid. In that case, it’s a good idea to catch the error so that the bot does not crash.

More Examples!

The example script file also includes an example of how to run a web service in the bot to listen for external data sources (such as a github webhook) and how to trigger/watch custom events. Take a look – and when you’ve finished, hopefully you’ll have as much fun designing bots and expanding your office interactions and conversations as we have had here at Monsoon!

Adding BasicAuth to the Kue Dashboard in an Express App

Do you use Kue and want to put it in its own folder that is password protected using BasicAuth or some other type of mechanism?

I know I’m not alone on this problem, yet nobody seems to have posted the concise answer. Here’s the solution.

Explanation

Modules are their own little world

In Express (Node), there’s a couple of important points to know:

  1. Route declaration order matters
  2. Auth strategies are attached globally or to individual routes
  3. Modules tend to come with their own routes (Kue certainly does)

If we want to attach an auth strategy to Kue, you’d naturally want to attach an auth strategy to its entire scope. First, I tried locking down the entire module:

module.exports = function (app, config, passport) {
  var kue = require("kue");
  var auth = express.basicAuth(function(user, pass, callback) {
    var result = (user === 'username' && pass === 'password');
    callback(null /* error */, result);
  });
  // any kue related settings can go here
  kue.app.set('title', 'Jobs');
  // create a wrapper to add auth on since without it we can't globally wrap kue's paths
  kue.app.use(auth)
  // bind the subApp to the desired path
  app.use('/secret_location/kue', kue.app)
};

Global auth can’t hook onto Kue

This fails to prompt for authentication. Why? Honestly, somebody better at Express can explain. I believe it has something to do with the way Kue is written since app settings are first-come first-serve.

Sub app it

In the gist above, you can see I create a sub app by passing in Express and creating an instance inside the file. I then apply the global auth settings (“subApp.use(auth)”) to this app before wrapping it. See the relevant code here:

  var subApp = express()
  // add authentication to the entire sub app
  subApp.use(auth)
  // re-add kue.app (but dont put it in its own folder)
  subApp.use('', kue.app)
  // bind the subApp to the desired path
  app.use('/secret_location/kue', subApp)

Notice that I add the kue.app using a blank string as a first argument. That tells Express to put this sub-app in the same folder path as the app. Then, I bind that app to “/secret_location/kue.”

This works.

Adding a Centralized Event Dispatcher on Backbone.js

This article contains my solution to adding a simple global event dispatcher to Backbone. It should also help noobies (myself included) understand how Backbone events work (which has one big gotcha).

Sharing Events = Global Dispatcher

Today, I came across a fairly mundane – and probably common – problem: I have two views that need to talk to each other. An example for this would be when you have a status bar in one corner that gets updated when a user completes an action in another area. It turns out that Backbone (as far as I can tell) does not support this use case very well out of the box. Well, it does: it expects you to build that yourself.

Backbone supports a model where you publish and subscribe to things happening in your views/models (commonly known as an “event dispatcher“). For example, you can make your application do stuff when a user clicks on something or if a model attribute changes. The native solution localizes these events (probably a good thing) to the model/view that you’re working with. This works for simple stuff, but when you need it in other parts of your application, things break down. The short answer is that you need to build a global event dispatcher A.K.A. global event “hub.” Everything publishes (“trigger”) and listens (“bind”) to events from this object.

This is a common pattern. In fact, I found two solid articles on this topic. The first explains in detail why you need an event dispatcher. The second shows you how to make the dispatcher natively accessible by all of your Backbone classes.

Event Dispatcher Gotcha: Native Events

However, I felt both solutions weren’t quite there. The first forces you to pass around an Event object everywhere you need it. This is nice from a decoupling standpoint, but highly error prone. The second solution is nice, but makes the caller oblivious to the global event namespace they are triggering/binding to. This is dangerous because Backbone has native generic events that are fired when certain things happen. For example, when you edit a Model, it automatically fires a “change” event into its event dispatcher (which bubbles up to its Collection too). This means if that dispatcher was global, every object would see a “change” firing every time every Model changed. Not good.

Goals

I came up with my own solution that accomplishes three main goals:

  1. Retain native Backbone event triggering/binding
  2. Allow the developer to trigger/bind to global events
  3. Not require the developer to pass things around

Solution Code

(If you can’t read CoffeeScript, just use this converter to convert it back to JavaScript)

The following (very simple) code modifies your Backbone definition to attach the global dispatcher to all objects. The dispatcher is easily accessible from any Model, Route, View, or  Collection through the global_dispatcher parameter. I thought about having it trigger events to the local dispatcher as well, but I decided that keeping them fully separate was in everybody’s best interest (to prevent events from accidentally colliding):

Example Code

The next section is a simple class I wrote that will demonstrate how these events work. I am defining a simple Collection and a Model. Note that I named the global and local events the SAME. This was to help demonstrate that the namespaces are in fact completely separate (as in, just because they are named the same doesn’t mean a local event will ever trigger a global event with the same name).

The following snippet illustrates triggering the events attached to the Collection. Note that we add a Model into this collection (which has no impact on the output). The trigger_stuff method triggers both a local and global event (in that order). The output shows that the events were picked up in the order fired (local, then global). Note that the Collection also listens to the “model_custom_action” event, which coincides with an event the Model triggers. This is very important.

The order of operations looks like this:

  1. Fire a local event
  2. The Collection picks it up
  3. END OF FIRST EVENT
  4. Fire global event
  5. The Collection’s globally attached event handler picks it up
  6. END OF SECOND EVENT

The next snippet shows what happens when you fire an event on a Model inside a Collection. It is also why I decided to keep things separate. This is a little hairy, so pay special attention.

The order of operations looks like this:

  1. Fire a local event
  2. The Model picks it up
  3. The Collection picks it up since all events in children bubble up
  4. END OF FIRST EVENT
  5. Fire global event
  6. The Collection’s globally attached event handler picks it up
  7. The Model’s globally attached event handler picks it up
  8. END OF SECOND EVENT

Notice that in step #6, the Collection’s binding fires BEFORE the Model’s. This is key.

Step #3 fires because there is a LOCAL binding to the “model_custom_action” event in the Collection. The local binding is reacting to an event triggered in the child Model’s local event dispatcher. In other words, any event triggered from the local event dispatcher in a Model will bubble up to the Collection’s local event dispatcher and be otherwise indistinguishable from events triggered directly from that Collection.

Step #6, however, is different. That event did NOT originate from the child model’s local event dispatcher. Instead, it is reacting to the global bindings, which happen to share the same name as the event we saw earlier in step #3. Because it didn’t bubble up, the events are being processed in the order they were bound (via the bind() function). The Collection was defined before the Model, thus, the Collection’s global binding fires first.

In this last snippet, we demonstrate how Models behave when not inside a Collection.

  1. Fire a local event
  2. The Model picks it up
  3. END OF FIRST EVENT
  4. Fire global event
  5. The Model’s globally attached event handler picks it up
  6. END OF SECOND EVENT

This is very straight forward if you managed to follow the last example. To get additional clarity, you may want to try renaming the events in the above class and re-run the examples.

By having the global dispatcher separate, you can consciously decide when an event should be “public,” as well as not clobber any existing Backbone functionality. Backbone is still really young (pre 1.0!), so I wanted to avoid using any solution that might break if they changed the internals. Also, completely preserving the behavior of event bubbling for Model-Collections is important to future proof my hack.

I hope this is useful for all of your Google-visitors!

Browser Wars… Wait, That’s Still Going on Right?

Rewind 5 years. Ask any self-proclaimed nerd what the browser market shares were. Market share stats were like the stock market ticker of the Internet Nerds. Everybody knew about it, and everybody cared.

But what’s the market shares today? Did you know that IE is below 50% by most measures? Did you know that Chrome ate up Firefox’s market share? And what’s Safari’s market share if all the iPhones and iPads use it?

You probably don’t know.

Because, who cares.

5-10 years ago, it mattered that IE had 70+% of the browser market because it directly influenced what was possible as an application developer. But mobile changed all that.

Mobile browsers ended up being the adoption wedge for HTML5 and alternatives to Flash thanks in large to the fact users — and developers — treated mobile as separate from regular browser apps. What a blessing in disguise: it let everybody start over. And once the mobile stuff got popular and apps broke, people blamed the bad mobile browser (“My ghetto Blackberry won’t load Facebook right!”) instead of the website. It was the perfect storm to force everybody to start adopting HTML5. Add in CSS/JavaScript standardizing tools (Modernizr,  jQuery, GWT, etc.) and developers didn’t even have to do cross-browser testing for simple stuff.

Maybe this is a bad thing to admit, but I haven’t bothered testing in all browsers for a year or two now. Stuff just breaks less often. IE7 is “good enough,” and the other browsers work 99% of the time. Thus, the only time I bother checking browser compatibility is if I’m doing something super complex or a user complains.

Good job, Internet. Ya, the evil Microsoft IE empire is still around, but we won the war and nobody even noticed.

Ruby: Time Comparisons Seem Backwards Due to Asian Culture

I found a weird quick in Ruby best explained by the fact it’s written by a Japanese dude. Nerd post below. In Ruby, when you compare two time signatures, you use the operator:

<=>

It compares two Time objects and then returns -1, 0, +1, or nil. So this is how it looks in practice:

comparison_result = today_time <=> yesterday_time

What’s the value of “comparison_result?” The answer is 1. 1? I stared at this for a long time and my inclination was to think it should return whichever side was bigger. There was very little documentation I could find on this topic, but I finally figured out why.

The reason WHY is that time flows down or backwards in Asian culture, and Ruby is written by a Japanese dude. Confirmation in this wiki article.

So the conclusion is that this operator should be read as, “Which side is smaller?” As in: left side = -1, right side = +1, 0 = same, and nil = invalid.

PHP Tip: Always Put Constants on the Left in Boolean Comparisons

This was a standard I enforced at my last company:

Whenever you are doing a boolean check (such as an IF or WHILE), always place constants on the left side of the comparison.

The following is BAD:

// BAD
if($user == LOGGED_IN) {

The following is good:

// GOOD
if(LOGGED_IN == $user) {

Why is this such a big deal? Imagine the typo where you forget the second equals sign:

// Oops! This always evaluates to true!
if($user = LOGGED_IN) {

This sort of bug is fairly common. C# went as far as to say boolean conditions must always have boolean return values, thereby eliminating the possibility of accidental assignments. Well, since PHP can’t do that, this is the next best thing. Notice how this convention will save your butt:

// Fatal error. Bug caught immediately.
if(LOGGED_IN = $user) {

Think about it. :)

Is Your Blog Not Receiving Pingbacks? I Fixed Mine.

I recently noticed that my blog was no longer registering pingbacks (the automatic in-comment notification that occurs when somebody else blogs about your post). I like these because they help me understand which of my articles are gaining traction.

My symptoms

  • My other blogs hosted on the same server seem to be pinging fine; however, those have far less posts and plugins
  • I am able to send pingbacks, apparently
  • But ping backs TO my content were dropped (even when I am self-pinging)

The fix

I figured the issue was somehow related to my recent upgrades of WordPress. After scouring the web, I found that the issue was due to a poorly designed timeout setting in WordPress.

  1. Open wp-includes/cron.php in your blog folder
  2. Go to the line that starts with: wp_remote_post( in the spawn_cron function
  3. Change ‘timeout’ => 0.01 to ‘timeout’ => 1 (or any other far more reasonable value)

This will fix blogs that are plagued by this bug.

Autocast Variables Whitepaper: What I Want to See in PHP 6

Introduction to Autocasting

This is a white paper on a feature that does not exist in PHP. It is an idea I came up with and hashed out here in this article.

Autocast variables. An autocast variable is like a container for data — everything going into an autocast variable type will always be converted to the current type of that variable. As in, if you assign a string into an integer variable, the variable will become the integer representation of the string (via implicit and immediate typecasting).

The idea is a hybrid of limited type safety where only some variables are type safe and operator overloading of the equals sign on native and complex datatypes. To help explain the idea: it would act almost like somebody following around your cursor and typing (int), (string), etc. all over your code before variable assignments EXCEPT that it can also done with non-native datatypes like classes.

The goal is to allow a developer to be – when desired – 100% certain they are working with a specific data type.

To declare a variable as an autocast, simply place a colon after the dollar sign in a variable name. Then, everything assigned to that variable is now automatically typecast to the datatype of the variable. For example

// This variable is now a container for integers
$:orderTotal = 0;
// assign a float value
$:orderTotal = 1.01;
// outputs 1; 1.01 was typecast to an integer
echo $:orderTotal;

NOTE: Why the new syntax? I toyed with the idea of an autocast keyword, but the paradigm broke down when you started assigning objects. The problem is that objects are pass-by-reference. This meant a programmer could change the datatype of an autocast variable by altering its reference. The other problem was that by not having a visual marker, it would make things very confusing  since one could never tell if they were working with an autocast until runtime. Lastly, why the dollar-colon? I would have prefered straight colon, but most of the good single-character syntax would conflict with existing PHP systems (# is a comment, : is used in ternary operators, % is modulus, ^ is a bitwise operator, etc.). A dollar sign is universally understood as a variable, so I thought the next best thing was to alter the variable in a way that today’s PHP would recognize as invalid (and thus introducing the syntax would not conflict with legacy code).

The concept is simple, but gets more complicated as you introduce objects, magic methods, and method signatures into the equation. Don’t worry, I’ve thought about all of those scenarios. Key summary of benefits:

  • New coding paradigms allow for simpler interaction between different data types (see first Practical Example)
  • Refactoring can be done in a way never before possible (see second Practical Example)
  • Code is now more “reliable” because unintended data types aren’t used (such as during boolean checks)
  • Many fatal errors can now be avoided
  • Potential use in the realm of dependency injection
  • Possibilities for true function overloading since expected datatypes are known (although, this is possible today, to be honest)

Read on to learn more!

Autocasting: Defined

You can skip this section and review it later. Note that the rest of this article will review the specifics of autocasting. Here are the basic rules:

  1. Declaration: Autocast variables are set at declaration, but the actual data type is optional and is inferred during the first variable assignment
  2. Declaration: Once an autocast variable is explicitly cast or declared as a certain data type, it can no longer change data types
  3. Usage: The colon is part of the variable name. A variable with the same name without the colon is a different variable.
  4. Null: Null always counts as a different datatype and assigning it will always trigger autocasting behavior
  5. Arrays: For arrays, non-array values are inserted as the 0th index of the array and all other values are truncated
  6. Classes:For uninitialized objects, the constructor is automatically called prior to autocasting behavior (no arguments)
  7. Classes:For initialized objects, if no autocast magic method is defined, the assigned value is dropped and a warning is thrown
  8. Magic Method:__autocast is only called when a different datatype is being assigned
  9. Scope: Autocast behavior is linked to the declared variable, NOT its contents — think of it as a container. Assigning an autocast variable into a non-autocast variable creates a copy. This MUST be so because any other implementation would allow a developer to change the contents of the autocast variable by using a reference.

Forced Native Datatype Conversions

PHP would work exactly the same as before except that certain variables could be declared as autocast. When a variable is declared as a specific type, all data going into it is automatically cast to that type. For example:

$:counter = 0; // OK, this is now an INTEGER
$:counter = 1.01; // attempting to assign a float
echo $:counter; // outputs 1, not 1.01

Another example:

$:orderTotal = 0.00; // OK, this is now a FLOATING POINT NUMBER
$:orderTotal = "0.00"; // THIS IS A STRING BEING ASSIGNED
print_r($:orderTotal); // outputs 0.00 (as a float, not a string)

Where did I get this idea? Actionscript 2, when I played with it years ago. In AS2, they had just introduced optional compile-time type checking. The goal was to allow developers to optionally set a variable’s type to trigger compile errors. I have been thinking about this solution for literally years. I didn’t like the notion of breaking existing PHP code and introduce strong typing. Besides, since PHP isn’t compiled, doing “compile time type checking” is fruitless. Thus, the solution is to encourage more thoughtful OOP by allowing developers to “declare” variable types.

What happens if you assign an autocast into a regular variable?

$:counter = 0; // OK, this is now an INTEGER
$counter = $:counter; // create a new counter variable
$counter = 1.01; // attempting to assign a float
$:counter = 1.01; // attempting to assign a float
echo $counter; // outputs 1.01
echo $:counter; // outputs 1

Answer: The assignment from an autocast to a regular variable makes a copy of the variable sans the autocast behavior.

Converting Things to Objects with __autocast

The idea does not stop at native types. I want to take this a step further and introduce magic methods that specifically deal with the autocasting behavior!

/**
 * This base64encodes data
 */
Class Borg {
    public $:slaves = "";
    /*
     * This is called during assignments of the wrong types
     *
     * @param $type the class name or native data type of the assigned value
     * @param $assignment the variable being assigned into the autocast
     */
    function __autocast($type, $assignment) {
        switch($type) {
            // note that I want to use namespaces here, but not everybody
            // has seen those yet in 5.3 and I don't want to distract from
            // the example... I made these constants up.
            case PHP_CONSTANTS_DATATYPE_STRING:
            case PHP_CONSTANTS_DATATYPE_INTEGER:
            case PHP_CONSTANTS_DATATYPE_DOUBLE:
            case PHP_CONSTANTS_DATATYPE_BOOLEAN:
                // for these native data types, just encode
                $this->:slaves = base64_encode($assignment);
                break;
            default:
                // non native data type! Borg defenses activate!
                $this->:slaves = base64_encode(serialize($assignment));
               break;
        }
    }
}

The following example assigns a string into the object, yet in the next line, the object remains intact — and taken over by the Borg!

// the variable is currently uninitialized, but is declared as type autocast Borg
Borg $:borg; // autocast variables can be declared differently
$:borg = $_POST['slave_names']; // ASSIGNING a string to the variable
echo $:borg->:slaves; // this works!

The following example shows how objects can also be converted using the magic method:

$:borg = new Borg; // the variable is now Borg
$:borg = new EnterpriseFodder(); // Did the $:borg become EnterpriseFodder?
// $:borg is still of type Borg and this outputs serialized(EnterpriseFodder)
echo $:borg->:slaves;

Before we get too far, I also want to clarify that autocast objects CAN be overwritten so long as the assignment datatype is the same or of a derived child class.

$:borg = new Borg; // the variable is now Borg
$:borg->:slaves = 'I CHANGED IT'; // changes registered correctly
$:borg = new Borg(); // Did the $:borg become EnterpriseFodder?
// empty because we overwrote the old Borg instance
echo $:borg->:slaves;

Because of the dollar-colon marker, autocast variables won’t introduce “invisible bugs.” Callers will be very aware that they are dealing with autocast variables.

Converting Objects to Other Types with __castTo

Another feature that should be possible is for for objects to define how they are cast into other datatypes. For example, imagine the following code:

$status = new Status();
Boolean $:isReady;
// $:isReady = true
$:isReady = $status;

The problem with this situation is that I have no control in how the Status class is converted into a Boolean. In this case, it would probably just convert to true. Wouldn’t it be nice if I had control over that?

class Status {
    $status = 0;
    function __castTo($type) {
        // The below uses namespaces, which is my preferred way PHP
        // would do things. I made these up.
        // We are checking if the attempt is to cast to a boolean
        if(\PHP\CONSTANTS\DATATYPE::BOOLEAN == $type) {
            // return 0 to boolean attempts
            return $this->status;
        }
        else {
            // otherwise just settle with default behavior
            return $this;
        }
    }
}

$status = new Status();
$:isReady = true; // boolean

// $:isReady = 0 = false
$:isReady = $status;

// $check = non-empty object = 1
$check = (int) $status;
echo (bool) $check; // 1 = true

The return value of __castTo becomes the value that is represented to the casting operatoin. Thus in the example above, $:isReady would only see the value of $this->status (which is 0) because $:isReady wants a Boolean. For any other data type conversion, Status would return $this (itself). This is why the second cast operation behaves totally differently and ends up equaling 1. So in terms of order of operations, __castTo is called before any attempts at casting an object, giving the object a chance to define how it should be converted.

I did want to state that the __castTo concept is 100% possible without autocast. I think it might be a cool feature all on its own that just so happened to work very well with the autocast idea.

Global autocast magic function

Just like the object magic method __autocast, there should also be a global __autocast function. This function would allow a developer to override native autocast behavior. Note that if the __autocast magic function fails to cast a variable, then the native behavior should be triggered. Returning false will suppress the native assignment (so you must make it!):

function __autocast($assignee, $assigneeType, $assigner, $assignerType) {
    // for those of you that HATE autocasting, you can make it throw exceptions
    if($assigneeType != 'Borg') {
        throw new Exception('Autocast behavior was triggered');
        // alternatively returning false here would prevent an assignment
        // from happening
    }
    // return true so that regular autocast behavior is retained
    return true;
}

Function Argument Autocasting to Enhance Type Hinting

Autocasting functionality should be used to augment function method declarations as well:

function sortData(string $:data) {

Now your code that expects a string doesn’t need to check if the data is actually a string (which is just spaghetti code anyway). Why would you want to ensure a string? How many times have you tried to echo a variable and it printed “Array” because an array snuck in and replaced your variable?

Here’s another example of function argument autocasting:

function myXMLParsingFunction(XmlReader $:parser, $data) {

What’s this do? The idea is that if you pass something in I’m not expecting, the regular autocasting behavior is triggered right there. Now I can write my method’s code worry about how to parse that data rather than if the parser is actually an instance of XmlReader. Key point: if the caller passes in a autocast variable into an autocast argument (and the types match up), all regular pass-by-ref/value logic is used. If there is a mismatch, a copy is made instead.

Dynamic Autocasting

Inline autocasting should also be possible for variables that aren’t necessarily autocast. This functionality is important where you are method chaining (prevents fatal errors). For example:

$culprit = ((autocast Borg) getBorg())->toString();

Behind the scenes, if getBorg() returns something that is not a Borg, an in-memory Borg conversion takes place. The result is then used to make the toString() call. If we took the same example, but took away the chaining, we would see another side effect of autocasting:

$borg = (autocast Borg) getBorg();
$culprit = $borg->toString();

Since autocast behavior is associated to the declared variable and not the contents, autocast functionality would NOT be inherited by the $borg variable. This way, if something crazy happens inside the getBorg() method that we aren’t expecting, we can still be sure that we get back a datatype that we expect. If the goal is to always return Borg types from getBorg(), the author could prepend the dynamic autocast before the return call:

function autocast Borg getBorg() {
    return (autocast Borg) "Enterprise Fodder";
}

Note that in the event the $borg variable is autocast to another type (i.e., if $borg is declared as autocast to a string), the Borg instance would be converted again to the type $borg wants (a string). Note that each time an autocast is assigned into a non-autocast variable, a copy is made. Thus the best thing to do in the second example  would be to declare $borg as an autocast ($:borg).

Autocast Return Types

The alternate approach to the dynamic autocasting problem on methods is to allow autocast return types in the function declarations. The idea is that in the declaration, the method author can force a dynamic autocast on all return values from the current function. This way, if a function has many exit points, the return type can be guaranteed to be consistent.

function autocast Borg getBorg() {
    return "Enterprise Fodder";
}

In this example, a Borg instance is passed back in an autocast container. If the caller is assigning the return value to an autocast variable, it is then passed-by-reference. If the caller is using a regular variable, a copy is assigned in. This way, the functionality can be introduced without breaking legacy code.

Practical Example: Models

So what’s a practical use for this aside from lessening code and cleaning up mundane “do I have what I’m expecting” code? Here’s a very simple example:

class Model {
    public $:amount = 0.00; // float!
    public $:name = ""; // string!
    public $:id = 0; // integer!
    function __autocast($type, $assignment) {
        // we are checking for if an array was assigned into this class
        if(\PHP\CONSTANTS\DATATYPE::ARRAY == $type) {
            $this->:amount = $assignment['amount'];
            $this->:name = $assignment['name'];
            $this->:id = $assignment['id'];
        }
        else {
            trigger_error('Error!Only autocasts arrays.', E_USER_WARNING);
        }
    }
}

What’s the above accomplish? Check out the sexy things I can do:

$row = array('amount' => '0.00', 'name' => 'Michi', 'id' => '1');
$:model = new Model;
$:model = $row;
echo $:model->:amount; // outputs a FLOAT (not a string) value: 0.00

The following accomplished the EXACT SAME THING because of the __autocast magic method.

$row = array('amount' => '0.00', 'name' => 'Michi', 'id' => '1');
$:model = $row;
echo $:model->:amount; // outputs a FLOAT (not a string) value: 0.00

Not only that, but we also squashed the unintended non-zero bug on the amount column! It means the future PHP models that represent database data will finally have properties that mirror the datatypes of the database, rather than just being the string representation.

Practical Example: Refactoring for Code Scaling

PHP’s greatest weakness is its ability to “scale” the code base. As the code gets larger and poor coding practices are used, it becomes very difficult to go back and fix things without completely gutting everything (see my article about this). Autocasting fixes this.

For example, nobody thinks twice when they see code like this:

$someObject->processQuery($db, $query); // drives Michi crazy

How do you know $query is a string, let alone a query? How do you know $db is an object? Do realize that if $db isn’t an object, PHP quits with a fatal error saying some method can’t be called on a non-object? This is a serious problem! And yet it’s just business as usual in the PHP world. Type hinting is NOT the full solution here, and it is worthless when you consider in refactoring. Type hinting ultimately triggers a fatal error that the developer is powerless to stop during run-time. Yes, type hinting lets you control what your function deals with, but the answer is NOT to take your toys and go home when you get something you didn’t intend. Let’s illustrate; imagine this code:

function processData($data) { // implied string (bad!)

And the author later realizes, “Wait, I want to make $data a class so I can do more to it.” So the author changes it:

function processData(Data $data) {
    $data->process();
}

But the problem is that now if somebody passes in a string/array/integer/etc., they get a FATAL ERROR! So then the function caller ends up doing crazy spaghetti that looks like this (actually 90% of the time, the caller won’t do this until after the bug hits production and a fatal error happens :( ):

    if(!($data instanceOf Data)) {
        $dataObject = new Data();
        $dataObject->setData($data); // ugh, exposed public setter method needed!
        $data = $dataObject;
    }
    processData($data);

}

That’s no good! In virtually every language, this kind of refactor is not possible without causing serious problems to the outside developers. In statically typed languages, the compiler catches these types of things, and then everybody does a mass re-write. But in dynamic languages, you can’t find these issues until you run the code. So how does autocasting solve the problem?

function processData(Data $:data) {
    $:data->process();
}
// leave the complicated stuff to the __autocast magic method!
class Data {
    private $:payload = "";
    function __autocast($type, $assignment) {
        // for now, we only worry about strings, but in the future we could do
        // a check for LegacyData types and convert those too!
        $this->:payload = $assignment;
    }
    function getData() {
        return $this->:payload;
    }
    function process() {
        return "data: " . base64_encode($this->getData());
    }
}

So if a caller passes in a string into processData, it gets assigned into :payload, and the code keeps on working. One thing that’s neat is that we don’t need to expose a public setter method just to make things backwards compatible. Additionally, if we want to do any special processing or data conversions, we can do that in the magic method. Lastly, if we upgrade things again later, we can create a new logic fork inside the autocast magic method to help convert the legacy object type to the new one.

// Oh no! Changed the argument again!!
function processData(XmlBLob $:data) {
    $:data->process();
}

class XmlBlob {
    private $:payload = "";
    function __autocast($type, $assignment) {
        // If it's of type Data, convert it over
        // otherwise, roll back to the uber legacy behavior
        // I'd really love it if this sort of comparison is legal
        if(Data == $type) {
            // convert the data to the format we want
            // we could use a magic method here too if
            // $this->:payload was a class instead of a native
            $this->:payload = self::toXML($assignment->getData());
        }
        else {
            // note this is autocast to string
            $this->:payload = $assignment;
        }
    }
    static function toXML(string $:data) {
        // do some XML conversion magic here
        return $:data;
    }
    function process() {
        return "data: " . base64_encode($this->:payload);
    }
}

In short, autocasting allows library writers to hide complexity from implementing developers. And, as a super-added bonus, it makes changing/deprecating method signatures actually possible!

Implications

There’s a number of substantial implications with this feature. Summary of points:

  • Might make things messier since autocasting is “automagical”
  • IDEs can do even cooler things since types can be known
  • True function overloading is within reach
  • Native dependency injection is also within reach
  • Possible speed improvements, but possible speed issues

First of all, this could make PHP even messier than before. But that’s the case for any new feature that is poorly used. But I do admit that introducing “magic” and “shortcuts” can eventually lead to code that looks like the nightmare we all know as Perl (zing!). That said, I see overwhelmingly cool things that become possible with this. Most importantly, PHP becomes “type safer.” Think about it: today, I will bet you good money that almost every code base has a function where the method author wrote code checking if the caller passed the right data types in — or vice versa. The reality is that even in a loosely typed language, datatypes are important. So while some portion of the population might use this to write some really crazy Perl-like code, I think the benefits outweigh the costs. These changes make it easier for library authors to maintain and understand their code (which I believe is a more important battle to win). Autocasting allows authors to put up a moat around their libraries/classes where they can absolutely control the types of variables they are dealing with — without forcing fatal errors everywhere.

It also means IDEs can do even more error checking and type hinting. Imagine if the IDE warned you when you setup a situation that, without autocast, would have triggered a fatal error! It means that when a caller tries to use the return value of a declared autocast function return type, the IDE can warn the developer.

This feature might also open the door to true polymorphic PHP. For example, a class could have two constructors: one with an autocast string argument and one with a generic argument. At run time, using some simple rules, PHP might use different versions of the same function name depending on the variable types. Voila! If a method signature can state what TYPES of arguments it wants, and we can explicitly state what we are passing in, isn’t that the first step in setting up true function overloading? While this is beyond the scope of my idea, I thought it was an interesting secondary benefit that somebody smarter than I could explore.

You might notice that autocasting behaves a lot like typical dependency injection patterns. Since constructors are automatically called for uninitialized variables, it would be possible to simulate dependency injection in functions very easily (a boon this is for testing!). In earlier examples, I showed you cases where a $parser or $db variable was passed in. Imagine if in those examples, such a variable was passed in as NULL (not provided). Now, PHP would automatically construct them from scratch, leaving the function implementer free from the burden of constructing the object. If you think about it, this puts us within striking distance of some kind of dependency injection in PHP. Then, somebody smarter than I can suggest a static __inject() magic method that gets called during automatic object construction… :)

Finally, while I’m not a C programmer, I wanted to take a moment to say that it’s possible that autocasting could provide memory/speed optimizations for PHP since certain variable memory spaces wouldn’t constantly change. Again, I’m not a C programmer and I don’t know how PHP’s memory allocation is designed, but I thought I’d throw that out there. On the same note, all of the casting logic could prove to be quite taxing. Thus, it could negate any perceived performance gains.

Spread the Idea

This is something I’ve been internalizing for no less than half a decade. I would genuinely love to see it in a future version of PHP, but I’m too busy to evangelize the idea. I’ve also never met somebody else who fully understood my idea. I’m finally releasing the idea into the wild and hoping for the best. It’ll probably sit in the Internet Idea Junk Yard. :) Please feel free to share this idea with your peers and pass it along to other PHP developres.