Autocast Variables Whitepaper: What I Want to See in PHP 6

Introduction to Autocasting

This is a white paper on a feature that does not exist in PHP. It is an idea I came up with and hashed out here in this article.

Autocast variables. An autocast variable is like a container for data — everything going into an autocast variable type will always be converted to the current type of that variable. As in, if you assign a string into an integer variable, the variable will become the integer representation of the string (via implicit and immediate typecasting).

The idea is a hybrid of limited type safety where only some variables are type safe and operator overloading of the equals sign on native and complex datatypes. To help explain the idea: it would act almost like somebody following around your cursor and typing (int), (string), etc. all over your code before variable assignments EXCEPT that it can also done with non-native datatypes like classes.

The goal is to allow a developer to be – when desired – 100% certain they are working with a specific data type.

To declare a variable as an autocast, simply place a colon after the dollar sign in a variable name. Then, everything assigned to that variable is now automatically typecast to the datatype of the variable. For example

NOTE: Why the new syntax? I toyed with the idea of an autocast keyword, but the paradigm broke down when you started assigning objects. The problem is that objects are pass-by-reference. This meant a programmer could change the datatype of an autocast variable by altering its reference. The other problem was that by not having a visual marker, it would make things very confusing  since one could never tell if they were working with an autocast until runtime. Lastly, why the dollar-colon? I would have prefered straight colon, but most of the good single-character syntax would conflict with existing PHP systems (# is a comment, : is used in ternary operators, % is modulus, ^ is a bitwise operator, etc.). A dollar sign is universally understood as a variable, so I thought the next best thing was to alter the variable in a way that today’s PHP would recognize as invalid (and thus introducing the syntax would not conflict with legacy code).

The concept is simple, but gets more complicated as you introduce objects, magic methods, and method signatures into the equation. Don’t worry, I’ve thought about all of those scenarios. Key summary of benefits:

  • New coding paradigms allow for simpler interaction between different data types (see first Practical Example)
  • Refactoring can be done in a way never before possible (see second Practical Example)
  • Code is now more “reliable” because unintended data types aren’t used (such as during boolean checks)
  • Many fatal errors can now be avoided
  • Potential use in the realm of dependency injection
  • Possibilities for true function overloading since expected datatypes are known (although, this is possible today, to be honest)

Read on to learn more!

Autocasting: Defined

You can skip this section and review it later. Note that the rest of this article will review the specifics of autocasting. Here are the basic rules:

  1. Declaration: Autocast variables are set at declaration, but the actual data type is optional and is inferred during the first variable assignment
  2. Declaration: Once an autocast variable is explicitly cast or declared as a certain data type, it can no longer change data types
  3. Usage: The colon is part of the variable name. A variable with the same name without the colon is a different variable.
  4. Null: Null always counts as a different datatype and assigning it will always trigger autocasting behavior
  5. Arrays: For arrays, non-array values are inserted as the 0th index of the array and all other values are truncated
  6. Classes:For uninitialized objects, the constructor is automatically called prior to autocasting behavior (no arguments)
  7. Classes:For initialized objects, if no autocast magic method is defined, the assigned value is dropped and a warning is thrown
  8. Magic Method:__autocast is only called when a different datatype is being assigned
  9. Scope: Autocast behavior is linked to the declared variable, NOT its contents — think of it as a container. Assigning an autocast variable into a non-autocast variable creates a copy. This MUST be so because any other implementation would allow a developer to change the contents of the autocast variable by using a reference.

Forced Native Datatype Conversions

PHP would work exactly the same as before except that certain variables could be declared as autocast. When a variable is declared as a specific type, all data going into it is automatically cast to that type. For example:

Another example:

Where did I get this idea? Actionscript 2, when I played with it years ago. In AS2, they had just introduced optional compile-time type checking. The goal was to allow developers to optionally set a variable’s type to trigger compile errors. I have been thinking about this solution for literally years. I didn’t like the notion of breaking existing PHP code and introduce strong typing. Besides, since PHP isn’t compiled, doing “compile time type checking” is fruitless. Thus, the solution is to encourage more thoughtful OOP by allowing developers to “declare” variable types.

What happens if you assign an autocast into a regular variable?

Answer: The assignment from an autocast to a regular variable makes a copy of the variable sans the autocast behavior.

Converting Things to Objects with __autocast

The idea does not stop at native types. I want to take this a step further and introduce magic methods that specifically deal with the autocasting behavior!

The following example assigns a string into the object, yet in the next line, the object remains intact — and taken over by the Borg!

The following example shows how objects can also be converted using the magic method:

Before we get too far, I also want to clarify that autocast objects CAN be overwritten so long as the assignment datatype is the same or of a derived child class.

Because of the dollar-colon marker, autocast variables won’t introduce “invisible bugs.” Callers will be very aware that they are dealing with autocast variables.

Converting Objects to Other Types with __castTo

Another feature that should be possible is for for objects to define how they are cast into other datatypes. For example, imagine the following code:

The problem with this situation is that I have no control in how the Status class is converted into a Boolean. In this case, it would probably just convert to true. Wouldn’t it be nice if I had control over that?

The return value of __castTo becomes the value that is represented to the casting operatoin. Thus in the example above, $:isReady would only see the value of $this->status (which is 0) because $:isReady wants a Boolean. For any other data type conversion, Status would return $this (itself). This is why the second cast operation behaves totally differently and ends up equaling 1. So in terms of order of operations, __castTo is called before any attempts at casting an object, giving the object a chance to define how it should be converted.

I did want to state that the __castTo concept is 100% possible without autocast. I think it might be a cool feature all on its own that just so happened to work very well with the autocast idea.

Global autocast magic function

Just like the object magic method __autocast, there should also be a global __autocast function. This function would allow a developer to override native autocast behavior. Note that if the __autocast magic function fails to cast a variable, then the native behavior should be triggered. Returning false will suppress the native assignment (so you must make it!):

Function Argument Autocasting to Enhance Type Hinting

Autocasting functionality should be used to augment function method declarations as well:

Now your code that expects a string doesn’t need to check if the data is actually a string (which is just spaghetti code anyway). Why would you want to ensure a string? How many times have you tried to echo a variable and it printed “Array” because an array snuck in and replaced your variable?

Here’s another example of function argument autocasting:

What’s this do? The idea is that if you pass something in I’m not expecting, the regular autocasting behavior is triggered right there. Now I can write my method’s code worry about how to parse that data rather than if the parser is actually an instance of XmlReader. Key point: if the caller passes in a autocast variable into an autocast argument (and the types match up), all regular pass-by-ref/value logic is used. If there is a mismatch, a copy is made instead.

Dynamic Autocasting

Inline autocasting should also be possible for variables that aren’t necessarily autocast. This functionality is important where you are method chaining (prevents fatal errors). For example:

Behind the scenes, if getBorg() returns something that is not a Borg, an in-memory Borg conversion takes place. The result is then used to make the toString() call. If we took the same example, but took away the chaining, we would see another side effect of autocasting:

Since autocast behavior is associated to the declared variable and not the contents, autocast functionality would NOT be inherited by the $borg variable. This way, if something crazy happens inside the getBorg() method that we aren’t expecting, we can still be sure that we get back a datatype that we expect. If the goal is to always return Borg types from getBorg(), the author could prepend the dynamic autocast before the return call:

Note that in the event the $borg variable is autocast to another type (i.e., if $borg is declared as autocast to a string), the Borg instance would be converted again to the type $borg wants (a string). Note that each time an autocast is assigned into a non-autocast variable, a copy is made. Thus the best thing to do in the second example  would be to declare $borg as an autocast ($:borg).

Autocast Return Types

The alternate approach to the dynamic autocasting problem on methods is to allow autocast return types in the function declarations. The idea is that in the declaration, the method author can force a dynamic autocast on all return values from the current function. This way, if a function has many exit points, the return type can be guaranteed to be consistent.

In this example, a Borg instance is passed back in an autocast container. If the caller is assigning the return value to an autocast variable, it is then passed-by-reference. If the caller is using a regular variable, a copy is assigned in. This way, the functionality can be introduced without breaking legacy code.

Practical Example: Models

So what’s a practical use for this aside from lessening code and cleaning up mundane “do I have what I’m expecting” code? Here’s a very simple example:

What’s the above accomplish? Check out the sexy things I can do:

The following accomplished the EXACT SAME THING because of the __autocast magic method.

Not only that, but we also squashed the unintended non-zero bug on the amount column! It means the future PHP models that represent database data will finally have properties that mirror the datatypes of the database, rather than just being the string representation.

Practical Example: Refactoring for Code Scaling

PHP’s greatest weakness is its ability to “scale” the code base. As the code gets larger and poor coding practices are used, it becomes very difficult to go back and fix things without completely gutting everything (see my article about this). Autocasting fixes this.

For example, nobody thinks twice when they see code like this:

How do you know $query is a string, let alone a query? How do you know $db is an object? Do realize that if $db isn’t an object, PHP quits with a fatal error saying some method can’t be called on a non-object? This is a serious problem! And yet it’s just business as usual in the PHP world. Type hinting is NOT the full solution here, and it is worthless when you consider in refactoring. Type hinting ultimately triggers a fatal error that the developer is powerless to stop during run-time. Yes, type hinting lets you control what your function deals with, but the answer is NOT to take your toys and go home when you get something you didn’t intend. Let’s illustrate; imagine this code:

And the author later realizes, “Wait, I want to make $data a class so I can do more to it.” So the author changes it:

But the problem is that now if somebody passes in a string/array/integer/etc., they get a FATAL ERROR! So then the function caller ends up doing crazy spaghetti that looks like this (actually 90% of the time, the caller won’t do this until after the bug hits production and a fatal error happens 🙁 ):

That’s no good! In virtually every language, this kind of refactor is not possible without causing serious problems to the outside developers. In statically typed languages, the compiler catches these types of things, and then everybody does a mass re-write. But in dynamic languages, you can’t find these issues until you run the code. So how does autocasting solve the problem?

So if a caller passes in a string into processData, it gets assigned into :payload, and the code keeps on working. One thing that’s neat is that we don’t need to expose a public setter method just to make things backwards compatible. Additionally, if we want to do any special processing or data conversions, we can do that in the magic method. Lastly, if we upgrade things again later, we can create a new logic fork inside the autocast magic method to help convert the legacy object type to the new one.

In short, autocasting allows library writers to hide complexity from implementing developers. And, as a super-added bonus, it makes changing/deprecating method signatures actually possible!

Implications

There’s a number of substantial implications with this feature. Summary of points:

  • Might make things messier since autocasting is “automagical”
  • IDEs can do even cooler things since types can be known
  • True function overloading is within reach
  • Native dependency injection is also within reach
  • Possible speed improvements, but possible speed issues

First of all, this could make PHP even messier than before. But that’s the case for any new feature that is poorly used. But I do admit that introducing “magic” and “shortcuts” can eventually lead to code that looks like the nightmare we all know as Perl (zing!). That said, I see overwhelmingly cool things that become possible with this. Most importantly, PHP becomes “type safer.” Think about it: today, I will bet you good money that almost every code base has a function where the method author wrote code checking if the caller passed the right data types in — or vice versa. The reality is that even in a loosely typed language, datatypes are important. So while some portion of the population might use this to write some really crazy Perl-like code, I think the benefits outweigh the costs. These changes make it easier for library authors to maintain and understand their code (which I believe is a more important battle to win). Autocasting allows authors to put up a moat around their libraries/classes where they can absolutely control the types of variables they are dealing with — without forcing fatal errors everywhere.

It also means IDEs can do even more error checking and type hinting. Imagine if the IDE warned you when you setup a situation that, without autocast, would have triggered a fatal error! It means that when a caller tries to use the return value of a declared autocast function return type, the IDE can warn the developer.

This feature might also open the door to true polymorphic PHP. For example, a class could have two constructors: one with an autocast string argument and one with a generic argument. At run time, using some simple rules, PHP might use different versions of the same function name depending on the variable types. Voila! If a method signature can state what TYPES of arguments it wants, and we can explicitly state what we are passing in, isn’t that the first step in setting up true function overloading? While this is beyond the scope of my idea, I thought it was an interesting secondary benefit that somebody smarter than I could explore.

You might notice that autocasting behaves a lot like typical dependency injection patterns. Since constructors are automatically called for uninitialized variables, it would be possible to simulate dependency injection in functions very easily (a boon this is for testing!). In earlier examples, I showed you cases where a $parser or $db variable was passed in. Imagine if in those examples, such a variable was passed in as NULL (not provided). Now, PHP would automatically construct them from scratch, leaving the function implementer free from the burden of constructing the object. If you think about it, this puts us within striking distance of some kind of dependency injection in PHP. Then, somebody smarter than I can suggest a static __inject() magic method that gets called during automatic object construction… 🙂

Finally, while I’m not a C programmer, I wanted to take a moment to say that it’s possible that autocasting could provide memory/speed optimizations for PHP since certain variable memory spaces wouldn’t constantly change. Again, I’m not a C programmer and I don’t know how PHP’s memory allocation is designed, but I thought I’d throw that out there. On the same note, all of the casting logic could prove to be quite taxing. Thus, it could negate any perceived performance gains.

Spread the Idea

This is something I’ve been internalizing for no less than half a decade. I would genuinely love to see it in a future version of PHP, but I’m too busy to evangelize the idea. I’ve also never met somebody else who fully understood my idea. I’m finally releasing the idea into the wild and hoping for the best. It’ll probably sit in the Internet Idea Junk Yard. 🙂 Please feel free to share this idea with your peers and pass it along to other PHP developres.


Is PHP here to stay?

As a LAMP developer, I am starting to question the long term viability of PHP. PHP was born during an era when knowing HTML was a valid and valuable resume bullet. Because of this, most of the “advanced” aspects of PHP — which relate to the OOP functionality — were introduced only after PHP 4/5, and weakly at that. Additionally, new languages have since become popularized that show the weakness of PHP. Don’t get me wrong, I am very supportive of PHP. I just believe that it’s important that people understand both the strengths and weaknesses of the tools they use.  There are two main points I want to cover:

  1. PHP thread support is weak
  2. PHP OOP = Broken

The second point is rather technical, but it closely relates to another strength and weakness of PHP: it is loosely typed. More on that later.

Thread Support is Weak

True threading support in PHP does not exist. The closest thing is the pcntl_fork method, which copies the current process, rather than create a thread. This means asynchronous processing within a single process is not supported. Threading is useful in event-driven architectures (common in JavaScript) or when doing blocking operations such as network calls.

Because the forked process is a clone of the original, it shares all of the original resources, including database and file resources. This means that the forked process must be self-aware of whether it is a child or not, and must be careful not to modify or close these resources. This encourages spaghetti code that contains large logic forks (“if I am not a clone, else…”). Because of this, forking is messy and error prone. This gets further complicated when PHP is executed by Apache in a web environment. In fact, the PHP manual advises avoiding forking with web servers:

Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment.

Not to mention the method is incredibly C-like in that it is very “raw” (unlike other native PHP methods/classes). This increases the barrier to entry significantly, which ultimately serves to have the feature ignored by most shops.

Why is all of this important? Well, at most companies, one language is selected for all in-house development. This is because cross training and hiring is simplified if everybody speaks the same language. There are a few common tasks that are unnecessarily difficult to do in PHP:

  • Asynchronous work — handing off work such as connecting to a remote server to a child and wait for a response
  • Manage thread pools — this sort of work requires significant “by-hand” management of any processes spawned by the parent via pcntl_fork

The threading issue is only a pain point that impacts processes that need to become parallelized. It is a pain most big shops live with, or, alternatively, introduce other languages to help solve.

PHP OOP = Broken

Because of the loosely typed nature of PHP, true, well-formed object oriented programming is broken. I know that for many PHP programmers, “Object Oriented” means putting together classes and reusing code as objects. However, that is truly, sincerely, only a portion of the point of OOP. Some of the most powerful aspects of OOP are lost in PHP’s implementation of the concept. Don’t get me wrong: these decisions were probably the right fit for the niche PHP was filling, but I don’t believe most PHP programmers are fully aware of what they are missing.

While the language, thankfully, has interfaces and abstract classes, they are woefully underused. This is, in part, due to to the developer community being largely self-taught. This creates a misconception about the nature of OOP, which ultimately leads to the devaluation of the most important feature of OOP: interfaces.

I can go into why they are so important in another article, but the point is: without interfaces, true polymorphic code is impossible. Or, rather, extremely susceptible to spaghetti code and fatal errors.

In other languages (Java), code might look like this:

The interface in this example defines a uniform way to access a class through a standardized API (thus the name, application programming interface). In a strongly typed language where all variables must have a type, the cat variable is defined as an implementation of Animal. This enforces and allows the method call makeSound(). If cat has a meow() and dog has a woof() method, they can not be called here without a compiler error. This is because in this function call, the parrot variable is defined as being an instance of Animal (versus being a Dog, Cat, or Parrot). As such, only Animal methods work here.

More importantly, because the compiler does this type checking, any invalid calls, such as the last one, would error and never compile. Even if the Parrot class has a moveAround() method, it can not be called in the code above. This is an extremely important aspect of OOP since, as a definer of the Animal class, I want to make it very specific how Animals should be treated (you can only makeSound!). If a programmer tries to do something to an Animal that I haven’t defined, they get an error. If they wanted to make that last line work, they would need to use object typecasting:

Or by changing the function definition:

But note that in this case, the user had to make an explicit choice to stop using Animal’s interface. Yes, parrot is still an Animal, but it doesn’t have to be. This, in short, helps prevent spaghetti code because it forces the developers to think about whether or not they want to deviate from a particular interface. Realistically, if presented with these alternatives, a Java programmer would probably use other types of abstraction techniques (e.g., dependency injection)  to keep this method from needing to be used. However, this example was necessary to illustrate how things are done in PHP.

So how would this look in PHP? Why isn’t this the same there? Well, take a look at the following code that, unlike the Java example, works perfectly fine and raises no red flags.

This code works great. We have three arguments all forced to use the Animal interface. Great. As a casual observer, there is really, truly, nothing wrong with this code. It’s a little strange, but if it’s commonly known that Birds can moveAround(), there is no problem. In fact, in most PHP shops, I will bet money that type hinting is NOT used. This will further illustrate how bad the spaghetti is about to get (read on).

Now imagine in six months if we decide we wanted to group up this code so that it uses a single array/collection as an argument. This is where things would look like traditional polymorphic code. I mentioned spaghetti above. Let me show you why:

Wow, look at what we just did. A harmless piece of code in PHP six months ago completely breaks when you try to refactor it to use a fairly typical design pattern. More importantly, unless I put in even MORE code to do type checking, there’s a chance that the makeSound() line will actually die in a fatal error if, for example, a string is passed in as an element of the argument array! See example without Parrots:

PHP is extremely flexible when it comes to hacking out a page, but when it comes to OOP, it’s about as brittle as you get. Refactoring is painful and error prone, and elegant design patterns like the ones you might see in a message-passing language such as Objective-C, Scala, or Erlang don’t work. Remember that by using functions such as method_exists() and is_object(), I can emulate the desired behavior; however, the extra code means more places for bugs and less time spent making the program do what you want it to do. The point is that the OOP constructs in PHP don’t fully work. As a result, certain very important aspects of OOP don’t translate very well to PHP.

Some people may still cling on to the notion that “ultimately, you can still do it, it just requires more code!” But I argue that preventing “more code” is the exact reason why OOP was invented. By writing more boiler plate error checking code, we are wasting time. The issue is exacerbated by the fact that the error checking code isn’t required, unlike say, if you were throwing exceptions. It isn’t immediately obvious in that last example that you need to do error checking for is_object() on the $animal variable. It’s these types of oversights that really damage PHP as the code base gets larger.

Conclusion

What I’m realizing is that PHP isn’t meant to scale. Yes, it can take a lot of web traffic, but that’s not what I mean. I’m talking about scaling in the sense of growing team size and code base. The design of the language promotes coding paradigms that ultimately damage the code base. This is because PHP makes it harder use good OOP practices on legacy code. To illustrate:

  • PHP became popular because it is easy to hack things out, even if that something required doing it the “wrong” way. These problems come back and bite you when the code base grows.
  • PHP can’t support a large development team as effectively because its weak typing allows for sidestepping certain core OOP principles (see above)
  • PHP  allows for invisible future-bugs (see above) to be inserted without any immediate cause for alarm
  • As applications get complex and require threading or distributing of processes, PHP fails to keep up (so other languages get used)
  • Because PHP does not use dynamic dispatching (message passing), calling a method can cause runtime FATAL ERRORS (unacceptable and very hard to debug!)

All of this makes me rethink the popularity of PHP. There are some new languages, still in their infancy, that pose a threat to PHP’s current dominance. I believe that in the next few years, as today’s systems become “legacy,” today’s newcomers will finally be production ready. At that point, we might see companies adopt the newer languages, which will support more modern programming paradigms. We are seeing this today with Ruby, for example.

Of course, I could be wrong. I once told people that PHP was “C of the web.” It’s possible it’s here to stay forever, despite all of its flaws. And, for the record: I do not believe Python or Ruby will be the language that will overtake PHP, but that’s for another post.

I just want everybody to know that I am a PHP developer, so I speak from experience. We should recognize that technology changes and evolves, and it is important that we constantly update our skill to ensure they don’t become obsolete. I’m just pointing out that perhaps PHP isn’t as timeless as C (or, possibly, Java).

Lastly, I will plug my personal belief that being “religious” about a language because it is “the best” is short sighted. New languages are born, literally, every week. It’s only a matter of time before a language comes along that does what your language does more elegantly, faster, and with less code.

Only time will tell. 🙂