Autocast Variables Whitepaper: What I Want to See in PHP 6

Introduction to Autocasting

This is a white paper on a feature that does not exist in PHP. It is an idea I came up with and hashed out here in this article.

Autocast variables. An autocast variable is like a container for data — everything going into an autocast variable type will always be converted to the current type of that variable. As in, if you assign a string into an integer variable, the variable will become the integer representation of the string (via implicit and immediate typecasting).

The idea is a hybrid of limited type safety where only some variables are type safe and operator overloading of the equals sign on native and complex datatypes. To help explain the idea: it would act almost like somebody following around your cursor and typing (int), (string), etc. all over your code before variable assignments EXCEPT that it can also done with non-native datatypes like classes.

The goal is to allow a developer to be – when desired – 100% certain they are working with a specific data type.

To declare a variable as an autocast, simply place a colon after the dollar sign in a variable name. Then, everything assigned to that variable is now automatically typecast to the datatype of the variable. For example

NOTE: Why the new syntax? I toyed with the idea of an autocast keyword, but the paradigm broke down when you started assigning objects. The problem is that objects are pass-by-reference. This meant a programmer could change the datatype of an autocast variable by altering its reference. The other problem was that by not having a visual marker, it would make things very confusing  since one could never tell if they were working with an autocast until runtime. Lastly, why the dollar-colon? I would have prefered straight colon, but most of the good single-character syntax would conflict with existing PHP systems (# is a comment, : is used in ternary operators, % is modulus, ^ is a bitwise operator, etc.). A dollar sign is universally understood as a variable, so I thought the next best thing was to alter the variable in a way that today’s PHP would recognize as invalid (and thus introducing the syntax would not conflict with legacy code).

The concept is simple, but gets more complicated as you introduce objects, magic methods, and method signatures into the equation. Don’t worry, I’ve thought about all of those scenarios. Key summary of benefits:

  • New coding paradigms allow for simpler interaction between different data types (see first Practical Example)
  • Refactoring can be done in a way never before possible (see second Practical Example)
  • Code is now more “reliable” because unintended data types aren’t used (such as during boolean checks)
  • Many fatal errors can now be avoided
  • Potential use in the realm of dependency injection
  • Possibilities for true function overloading since expected datatypes are known (although, this is possible today, to be honest)

Read on to learn more!

Autocasting: Defined

You can skip this section and review it later. Note that the rest of this article will review the specifics of autocasting. Here are the basic rules:

  1. Declaration: Autocast variables are set at declaration, but the actual data type is optional and is inferred during the first variable assignment
  2. Declaration: Once an autocast variable is explicitly cast or declared as a certain data type, it can no longer change data types
  3. Usage: The colon is part of the variable name. A variable with the same name without the colon is a different variable.
  4. Null: Null always counts as a different datatype and assigning it will always trigger autocasting behavior
  5. Arrays: For arrays, non-array values are inserted as the 0th index of the array and all other values are truncated
  6. Classes:For uninitialized objects, the constructor is automatically called prior to autocasting behavior (no arguments)
  7. Classes:For initialized objects, if no autocast magic method is defined, the assigned value is dropped and a warning is thrown
  8. Magic Method:__autocast is only called when a different datatype is being assigned
  9. Scope: Autocast behavior is linked to the declared variable, NOT its contents — think of it as a container. Assigning an autocast variable into a non-autocast variable creates a copy. This MUST be so because any other implementation would allow a developer to change the contents of the autocast variable by using a reference.

Forced Native Datatype Conversions

PHP would work exactly the same as before except that certain variables could be declared as autocast. When a variable is declared as a specific type, all data going into it is automatically cast to that type. For example:

Another example:

Where did I get this idea? Actionscript 2, when I played with it years ago. In AS2, they had just introduced optional compile-time type checking. The goal was to allow developers to optionally set a variable’s type to trigger compile errors. I have been thinking about this solution for literally years. I didn’t like the notion of breaking existing PHP code and introduce strong typing. Besides, since PHP isn’t compiled, doing “compile time type checking” is fruitless. Thus, the solution is to encourage more thoughtful OOP by allowing developers to “declare” variable types.

What happens if you assign an autocast into a regular variable?

Answer: The assignment from an autocast to a regular variable makes a copy of the variable sans the autocast behavior.

Converting Things to Objects with __autocast

The idea does not stop at native types. I want to take this a step further and introduce magic methods that specifically deal with the autocasting behavior!

The following example assigns a string into the object, yet in the next line, the object remains intact — and taken over by the Borg!

The following example shows how objects can also be converted using the magic method:

Before we get too far, I also want to clarify that autocast objects CAN be overwritten so long as the assignment datatype is the same or of a derived child class.

Because of the dollar-colon marker, autocast variables won’t introduce “invisible bugs.” Callers will be very aware that they are dealing with autocast variables.

Converting Objects to Other Types with __castTo

Another feature that should be possible is for for objects to define how they are cast into other datatypes. For example, imagine the following code:

The problem with this situation is that I have no control in how the Status class is converted into a Boolean. In this case, it would probably just convert to true. Wouldn’t it be nice if I had control over that?

The return value of __castTo becomes the value that is represented to the casting operatoin. Thus in the example above, $:isReady would only see the value of $this->status (which is 0) because $:isReady wants a Boolean. For any other data type conversion, Status would return $this (itself). This is why the second cast operation behaves totally differently and ends up equaling 1. So in terms of order of operations, __castTo is called before any attempts at casting an object, giving the object a chance to define how it should be converted.

I did want to state that the __castTo concept is 100% possible without autocast. I think it might be a cool feature all on its own that just so happened to work very well with the autocast idea.

Global autocast magic function

Just like the object magic method __autocast, there should also be a global __autocast function. This function would allow a developer to override native autocast behavior. Note that if the __autocast magic function fails to cast a variable, then the native behavior should be triggered. Returning false will suppress the native assignment (so you must make it!):

Function Argument Autocasting to Enhance Type Hinting

Autocasting functionality should be used to augment function method declarations as well:

Now your code that expects a string doesn’t need to check if the data is actually a string (which is just spaghetti code anyway). Why would you want to ensure a string? How many times have you tried to echo a variable and it printed “Array” because an array snuck in and replaced your variable?

Here’s another example of function argument autocasting:

What’s this do? The idea is that if you pass something in I’m not expecting, the regular autocasting behavior is triggered right there. Now I can write my method’s code worry about how to parse that data rather than if the parser is actually an instance of XmlReader. Key point: if the caller passes in a autocast variable into an autocast argument (and the types match up), all regular pass-by-ref/value logic is used. If there is a mismatch, a copy is made instead.

Dynamic Autocasting

Inline autocasting should also be possible for variables that aren’t necessarily autocast. This functionality is important where you are method chaining (prevents fatal errors). For example:

Behind the scenes, if getBorg() returns something that is not a Borg, an in-memory Borg conversion takes place. The result is then used to make the toString() call. If we took the same example, but took away the chaining, we would see another side effect of autocasting:

Since autocast behavior is associated to the declared variable and not the contents, autocast functionality would NOT be inherited by the $borg variable. This way, if something crazy happens inside the getBorg() method that we aren’t expecting, we can still be sure that we get back a datatype that we expect. If the goal is to always return Borg types from getBorg(), the author could prepend the dynamic autocast before the return call:

Note that in the event the $borg variable is autocast to another type (i.e., if $borg is declared as autocast to a string), the Borg instance would be converted again to the type $borg wants (a string). Note that each time an autocast is assigned into a non-autocast variable, a copy is made. Thus the best thing to do in the second example  would be to declare $borg as an autocast ($:borg).

Autocast Return Types

The alternate approach to the dynamic autocasting problem on methods is to allow autocast return types in the function declarations. The idea is that in the declaration, the method author can force a dynamic autocast on all return values from the current function. This way, if a function has many exit points, the return type can be guaranteed to be consistent.

In this example, a Borg instance is passed back in an autocast container. If the caller is assigning the return value to an autocast variable, it is then passed-by-reference. If the caller is using a regular variable, a copy is assigned in. This way, the functionality can be introduced without breaking legacy code.

Practical Example: Models

So what’s a practical use for this aside from lessening code and cleaning up mundane “do I have what I’m expecting” code? Here’s a very simple example:

What’s the above accomplish? Check out the sexy things I can do:

The following accomplished the EXACT SAME THING because of the __autocast magic method.

Not only that, but we also squashed the unintended non-zero bug on the amount column! It means the future PHP models that represent database data will finally have properties that mirror the datatypes of the database, rather than just being the string representation.

Practical Example: Refactoring for Code Scaling

PHP’s greatest weakness is its ability to “scale” the code base. As the code gets larger and poor coding practices are used, it becomes very difficult to go back and fix things without completely gutting everything (see my article about this). Autocasting fixes this.

For example, nobody thinks twice when they see code like this:

How do you know $query is a string, let alone a query? How do you know $db is an object? Do realize that if $db isn’t an object, PHP quits with a fatal error saying some method can’t be called on a non-object? This is a serious problem! And yet it’s just business as usual in the PHP world. Type hinting is NOT the full solution here, and it is worthless when you consider in refactoring. Type hinting ultimately triggers a fatal error that the developer is powerless to stop during run-time. Yes, type hinting lets you control what your function deals with, but the answer is NOT to take your toys and go home when you get something you didn’t intend. Let’s illustrate; imagine this code:

And the author later realizes, “Wait, I want to make $data a class so I can do more to it.” So the author changes it:

But the problem is that now if somebody passes in a string/array/integer/etc., they get a FATAL ERROR! So then the function caller ends up doing crazy spaghetti that looks like this (actually 90% of the time, the caller won’t do this until after the bug hits production and a fatal error happens 🙁 ):

That’s no good! In virtually every language, this kind of refactor is not possible without causing serious problems to the outside developers. In statically typed languages, the compiler catches these types of things, and then everybody does a mass re-write. But in dynamic languages, you can’t find these issues until you run the code. So how does autocasting solve the problem?

So if a caller passes in a string into processData, it gets assigned into :payload, and the code keeps on working. One thing that’s neat is that we don’t need to expose a public setter method just to make things backwards compatible. Additionally, if we want to do any special processing or data conversions, we can do that in the magic method. Lastly, if we upgrade things again later, we can create a new logic fork inside the autocast magic method to help convert the legacy object type to the new one.

In short, autocasting allows library writers to hide complexity from implementing developers. And, as a super-added bonus, it makes changing/deprecating method signatures actually possible!

Implications

There’s a number of substantial implications with this feature. Summary of points:

  • Might make things messier since autocasting is “automagical”
  • IDEs can do even cooler things since types can be known
  • True function overloading is within reach
  • Native dependency injection is also within reach
  • Possible speed improvements, but possible speed issues

First of all, this could make PHP even messier than before. But that’s the case for any new feature that is poorly used. But I do admit that introducing “magic” and “shortcuts” can eventually lead to code that looks like the nightmare we all know as Perl (zing!). That said, I see overwhelmingly cool things that become possible with this. Most importantly, PHP becomes “type safer.” Think about it: today, I will bet you good money that almost every code base has a function where the method author wrote code checking if the caller passed the right data types in — or vice versa. The reality is that even in a loosely typed language, datatypes are important. So while some portion of the population might use this to write some really crazy Perl-like code, I think the benefits outweigh the costs. These changes make it easier for library authors to maintain and understand their code (which I believe is a more important battle to win). Autocasting allows authors to put up a moat around their libraries/classes where they can absolutely control the types of variables they are dealing with — without forcing fatal errors everywhere.

It also means IDEs can do even more error checking and type hinting. Imagine if the IDE warned you when you setup a situation that, without autocast, would have triggered a fatal error! It means that when a caller tries to use the return value of a declared autocast function return type, the IDE can warn the developer.

This feature might also open the door to true polymorphic PHP. For example, a class could have two constructors: one with an autocast string argument and one with a generic argument. At run time, using some simple rules, PHP might use different versions of the same function name depending on the variable types. Voila! If a method signature can state what TYPES of arguments it wants, and we can explicitly state what we are passing in, isn’t that the first step in setting up true function overloading? While this is beyond the scope of my idea, I thought it was an interesting secondary benefit that somebody smarter than I could explore.

You might notice that autocasting behaves a lot like typical dependency injection patterns. Since constructors are automatically called for uninitialized variables, it would be possible to simulate dependency injection in functions very easily (a boon this is for testing!). In earlier examples, I showed you cases where a $parser or $db variable was passed in. Imagine if in those examples, such a variable was passed in as NULL (not provided). Now, PHP would automatically construct them from scratch, leaving the function implementer free from the burden of constructing the object. If you think about it, this puts us within striking distance of some kind of dependency injection in PHP. Then, somebody smarter than I can suggest a static __inject() magic method that gets called during automatic object construction… 🙂

Finally, while I’m not a C programmer, I wanted to take a moment to say that it’s possible that autocasting could provide memory/speed optimizations for PHP since certain variable memory spaces wouldn’t constantly change. Again, I’m not a C programmer and I don’t know how PHP’s memory allocation is designed, but I thought I’d throw that out there. On the same note, all of the casting logic could prove to be quite taxing. Thus, it could negate any perceived performance gains.

Spread the Idea

This is something I’ve been internalizing for no less than half a decade. I would genuinely love to see it in a future version of PHP, but I’m too busy to evangelize the idea. I’ve also never met somebody else who fully understood my idea. I’m finally releasing the idea into the wild and hoping for the best. It’ll probably sit in the Internet Idea Junk Yard. 🙂 Please feel free to share this idea with your peers and pass it along to other PHP developres.


A PHP/MySQL Bug Most People Have But Don’t Realize

I’ve seen this over and over in my career and thought I should save others from the horror. Part of me feels like I blogged about this years ago, but I couldn’t find a post referencing it (EDIT: found it!). The bug is simple:

  1. Create a database table with a decimal value, such as order_total
  2. Write some code that retrieves the row
  3. Do an implicit boolean check on order_total to see if it has a value

Here’s some actual code:

This code has a serious bug in it. The problem is the line pertaining to checking if the order_total has been set. Pop quiz:

What is the value of the following:
(bool) “0.00”

The answer is TRUE! 0.00 may evaluate to zero, but “0.00” is not the same thing! As soon as PHP sees more than just a single “0” in a string, it assumes it’s a regular string and treats it as a non-zero string. A more obvious way to ask the same question:

What is the value of the following:
(bool) “0.0000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000″

Or what about:

What is the value of the following: (bool) “0.”

The point is that as soon as you go beyond a single zero, PHP just assumes the rest is real data and will not discard it. Thus:

SO going back to the original problem, the way to solve is is by fixing the code to either explicitly type cast the variable or use a “greater than” check:

If you don’t do either of these things, I’d suggest you go and double check some of your code.