PHP Tip: Always Put Constants on the Left in Boolean Comparisons

This was a standard I enforced at my last company:

Whenever you are doing a boolean check (such as an IF or WHILE), always place constants on the left side of the comparison.

The following is BAD:

// BAD
if($user == LOGGED_IN) {

The following is good:

// GOOD
if(LOGGED_IN == $user) {

Why is this such a big deal? Imagine the typo where you forget the second equals sign:

// Oops! This always evaluates to true!
if($user = LOGGED_IN) {

This sort of bug is fairly common. C# went as far as to say boolean conditions must always have boolean return values, thereby eliminating the possibility of accidental assignments. Well, since PHP can’t do that, this is the next best thing. Notice how this convention will save your butt:

// Fatal error. Bug caught immediately.
if(LOGGED_IN = $user) {

Think about it. :)

Is Your Blog Not Receiving Pingbacks? I Fixed Mine.

I recently noticed that my blog was no longer registering pingbacks (the automatic in-comment notification that occurs when somebody else blogs about your post). I like these because they help me understand which of my articles are gaining traction.

My symptoms

  • My other blogs hosted on the same server seem to be pinging fine; however, those have far less posts and plugins
  • I am able to send pingbacks, apparently
  • But ping backs TO my content were dropped (even when I am self-pinging)

The fix

I figured the issue was somehow related to my recent upgrades of WordPress. After scouring the web, I found that the issue was due to a poorly designed timeout setting in WordPress.

  1. Open wp-includes/cron.php in your blog folder
  2. Go to the line that starts with: wp_remote_post( in the spawn_cron function
  3. Change ‘timeout’ => 0.01 to ‘timeout’ => 1 (or any other far more reasonable value)

This will fix blogs that are plagued by this bug.

Autocast Variables Whitepaper: What I Want to See in PHP 6

Introduction to Autocasting

This is a white paper on a feature that does not exist in PHP. It is an idea I came up with and hashed out here in this article.

Autocast variables. An autocast variable is like a container for data — everything going into an autocast variable type will always be converted to the current type of that variable. As in, if you assign a string into an integer variable, the variable will become the integer representation of the string (via implicit and immediate typecasting).

The idea is a hybrid of limited type safety where only some variables are type safe and operator overloading of the equals sign on native and complex datatypes. To help explain the idea: it would act almost like somebody following around your cursor and typing (int), (string), etc. all over your code before variable assignments EXCEPT that it can also done with non-native datatypes like classes.

The goal is to allow a developer to be – when desired – 100% certain they are working with a specific data type.

To declare a variable as an autocast, simply place a colon after the dollar sign in a variable name. Then, everything assigned to that variable is now automatically typecast to the datatype of the variable. For example

// This variable is now a container for integers
$:orderTotal = 0;
// assign a float value
$:orderTotal = 1.01;
// outputs 1; 1.01 was typecast to an integer
echo $:orderTotal;

NOTE: Why the new syntax? I toyed with the idea of an autocast keyword, but the paradigm broke down when you started assigning objects. The problem is that objects are pass-by-reference. This meant a programmer could change the datatype of an autocast variable by altering its reference. The other problem was that by not having a visual marker, it would make things very confusing  since one could never tell if they were working with an autocast until runtime. Lastly, why the dollar-colon? I would have prefered straight colon, but most of the good single-character syntax would conflict with existing PHP systems (# is a comment, : is used in ternary operators, % is modulus, ^ is a bitwise operator, etc.). A dollar sign is universally understood as a variable, so I thought the next best thing was to alter the variable in a way that today’s PHP would recognize as invalid (and thus introducing the syntax would not conflict with legacy code).

The concept is simple, but gets more complicated as you introduce objects, magic methods, and method signatures into the equation. Don’t worry, I’ve thought about all of those scenarios. Key summary of benefits:

  • New coding paradigms allow for simpler interaction between different data types (see first Practical Example)
  • Refactoring can be done in a way never before possible (see second Practical Example)
  • Code is now more “reliable” because unintended data types aren’t used (such as during boolean checks)
  • Many fatal errors can now be avoided
  • Potential use in the realm of dependency injection
  • Possibilities for true function overloading since expected datatypes are known (although, this is possible today, to be honest)

Read on to learn more!

Autocasting: Defined

You can skip this section and review it later. Note that the rest of this article will review the specifics of autocasting. Here are the basic rules:

  1. Declaration: Autocast variables are set at declaration, but the actual data type is optional and is inferred during the first variable assignment
  2. Declaration: Once an autocast variable is explicitly cast or declared as a certain data type, it can no longer change data types
  3. Usage: The colon is part of the variable name. A variable with the same name without the colon is a different variable.
  4. Null: Null always counts as a different datatype and assigning it will always trigger autocasting behavior
  5. Arrays: For arrays, non-array values are inserted as the 0th index of the array and all other values are truncated
  6. Classes:For uninitialized objects, the constructor is automatically called prior to autocasting behavior (no arguments)
  7. Classes:For initialized objects, if no autocast magic method is defined, the assigned value is dropped and a warning is thrown
  8. Magic Method:__autocast is only called when a different datatype is being assigned
  9. Scope: Autocast behavior is linked to the declared variable, NOT its contents — think of it as a container. Assigning an autocast variable into a non-autocast variable creates a copy. This MUST be so because any other implementation would allow a developer to change the contents of the autocast variable by using a reference.

Forced Native Datatype Conversions

PHP would work exactly the same as before except that certain variables could be declared as autocast. When a variable is declared as a specific type, all data going into it is automatically cast to that type. For example:

$:counter = 0; // OK, this is now an INTEGER
$:counter = 1.01; // attempting to assign a float
echo $:counter; // outputs 1, not 1.01

Another example:

$:orderTotal = 0.00; // OK, this is now a FLOATING POINT NUMBER
$:orderTotal = "0.00"; // THIS IS A STRING BEING ASSIGNED
print_r($:orderTotal); // outputs 0.00 (as a float, not a string)

Where did I get this idea? Actionscript 2, when I played with it years ago. In AS2, they had just introduced optional compile-time type checking. The goal was to allow developers to optionally set a variable’s type to trigger compile errors. I have been thinking about this solution for literally years. I didn’t like the notion of breaking existing PHP code and introduce strong typing. Besides, since PHP isn’t compiled, doing “compile time type checking” is fruitless. Thus, the solution is to encourage more thoughtful OOP by allowing developers to “declare” variable types.

What happens if you assign an autocast into a regular variable?

$:counter = 0; // OK, this is now an INTEGER
$counter = $:counter; // create a new counter variable
$counter = 1.01; // attempting to assign a float
$:counter = 1.01; // attempting to assign a float
echo $counter; // outputs 1.01
echo $:counter; // outputs 1

Answer: The assignment from an autocast to a regular variable makes a copy of the variable sans the autocast behavior.

Converting Things to Objects with __autocast

The idea does not stop at native types. I want to take this a step further and introduce magic methods that specifically deal with the autocasting behavior!

/**
 * This base64encodes data
 */
Class Borg {
    public $:slaves = "";
    /*
     * This is called during assignments of the wrong types
     *
     * @param $type the class name or native data type of the assigned value
     * @param $assignment the variable being assigned into the autocast
     */
    function __autocast($type, $assignment) {
        switch($type) {
            // note that I want to use namespaces here, but not everybody
            // has seen those yet in 5.3 and I don't want to distract from
            // the example... I made these constants up.
            case PHP_CONSTANTS_DATATYPE_STRING:
            case PHP_CONSTANTS_DATATYPE_INTEGER:
            case PHP_CONSTANTS_DATATYPE_DOUBLE:
            case PHP_CONSTANTS_DATATYPE_BOOLEAN:
                // for these native data types, just encode
                $this->:slaves = base64_encode($assignment);
                break;
            default:
                // non native data type! Borg defenses activate!
                $this->:slaves = base64_encode(serialize($assignment));
               break;
        }
    }
}

The following example assigns a string into the object, yet in the next line, the object remains intact — and taken over by the Borg!

// the variable is currently uninitialized, but is declared as type autocast Borg
Borg $:borg; // autocast variables can be declared differently
$:borg = $_POST['slave_names']; // ASSIGNING a string to the variable
echo $:borg->:slaves; // this works!

The following example shows how objects can also be converted using the magic method:

$:borg = new Borg; // the variable is now Borg
$:borg = new EnterpriseFodder(); // Did the $:borg become EnterpriseFodder?
// $:borg is still of type Borg and this outputs serialized(EnterpriseFodder)
echo $:borg->:slaves;

Before we get too far, I also want to clarify that autocast objects CAN be overwritten so long as the assignment datatype is the same or of a derived child class.

$:borg = new Borg; // the variable is now Borg
$:borg->:slaves = 'I CHANGED IT'; // changes registered correctly
$:borg = new Borg(); // Did the $:borg become EnterpriseFodder?
// empty because we overwrote the old Borg instance
echo $:borg->:slaves;

Because of the dollar-colon marker, autocast variables won’t introduce “invisible bugs.” Callers will be very aware that they are dealing with autocast variables.

Converting Objects to Other Types with __castTo

Another feature that should be possible is for for objects to define how they are cast into other datatypes. For example, imagine the following code:

$status = new Status();
Boolean $:isReady;
// $:isReady = true
$:isReady = $status;

The problem with this situation is that I have no control in how the Status class is converted into a Boolean. In this case, it would probably just convert to true. Wouldn’t it be nice if I had control over that?

class Status {
    $status = 0;
    function __castTo($type) {
        // The below uses namespaces, which is my preferred way PHP
        // would do things. I made these up.
        // We are checking if the attempt is to cast to a boolean
        if(\PHP\CONSTANTS\DATATYPE::BOOLEAN == $type) {
            // return 0 to boolean attempts
            return $this->status;
        }
        else {
            // otherwise just settle with default behavior
            return $this;
        }
    }
}

$status = new Status();
$:isReady = true; // boolean

// $:isReady = 0 = false
$:isReady = $status;

// $check = non-empty object = 1
$check = (int) $status;
echo (bool) $check; // 1 = true

The return value of __castTo becomes the value that is represented to the casting operatoin. Thus in the example above, $:isReady would only see the value of $this->status (which is 0) because $:isReady wants a Boolean. For any other data type conversion, Status would return $this (itself). This is why the second cast operation behaves totally differently and ends up equaling 1. So in terms of order of operations, __castTo is called before any attempts at casting an object, giving the object a chance to define how it should be converted.

I did want to state that the __castTo concept is 100% possible without autocast. I think it might be a cool feature all on its own that just so happened to work very well with the autocast idea.

Global autocast magic function

Just like the object magic method __autocast, there should also be a global __autocast function. This function would allow a developer to override native autocast behavior. Note that if the __autocast magic function fails to cast a variable, then the native behavior should be triggered. Returning false will suppress the native assignment (so you must make it!):

function __autocast($assignee, $assigneeType, $assigner, $assignerType) {
    // for those of you that HATE autocasting, you can make it throw exceptions
    if($assigneeType != 'Borg') {
        throw new Exception('Autocast behavior was triggered');
        // alternatively returning false here would prevent an assignment
        // from happening
    }
    // return true so that regular autocast behavior is retained
    return true;
}

Function Argument Autocasting to Enhance Type Hinting

Autocasting functionality should be used to augment function method declarations as well:

function sortData(string $:data) {

Now your code that expects a string doesn’t need to check if the data is actually a string (which is just spaghetti code anyway). Why would you want to ensure a string? How many times have you tried to echo a variable and it printed “Array” because an array snuck in and replaced your variable?

Here’s another example of function argument autocasting:

function myXMLParsingFunction(XmlReader $:parser, $data) {

What’s this do? The idea is that if you pass something in I’m not expecting, the regular autocasting behavior is triggered right there. Now I can write my method’s code worry about how to parse that data rather than if the parser is actually an instance of XmlReader. Key point: if the caller passes in a autocast variable into an autocast argument (and the types match up), all regular pass-by-ref/value logic is used. If there is a mismatch, a copy is made instead.

Dynamic Autocasting

Inline autocasting should also be possible for variables that aren’t necessarily autocast. This functionality is important where you are method chaining (prevents fatal errors). For example:

$culprit = ((autocast Borg) getBorg())->toString();

Behind the scenes, if getBorg() returns something that is not a Borg, an in-memory Borg conversion takes place. The result is then used to make the toString() call. If we took the same example, but took away the chaining, we would see another side effect of autocasting:

$borg = (autocast Borg) getBorg();
$culprit = $borg->toString();

Since autocast behavior is associated to the declared variable and not the contents, autocast functionality would NOT be inherited by the $borg variable. This way, if something crazy happens inside the getBorg() method that we aren’t expecting, we can still be sure that we get back a datatype that we expect. If the goal is to always return Borg types from getBorg(), the author could prepend the dynamic autocast before the return call:

function autocast Borg getBorg() {
    return (autocast Borg) "Enterprise Fodder";
}

Note that in the event the $borg variable is autocast to another type (i.e., if $borg is declared as autocast to a string), the Borg instance would be converted again to the type $borg wants (a string). Note that each time an autocast is assigned into a non-autocast variable, a copy is made. Thus the best thing to do in the second example  would be to declare $borg as an autocast ($:borg).

Autocast Return Types

The alternate approach to the dynamic autocasting problem on methods is to allow autocast return types in the function declarations. The idea is that in the declaration, the method author can force a dynamic autocast on all return values from the current function. This way, if a function has many exit points, the return type can be guaranteed to be consistent.

function autocast Borg getBorg() {
    return "Enterprise Fodder";
}

In this example, a Borg instance is passed back in an autocast container. If the caller is assigning the return value to an autocast variable, it is then passed-by-reference. If the caller is using a regular variable, a copy is assigned in. This way, the functionality can be introduced without breaking legacy code.

Practical Example: Models

So what’s a practical use for this aside from lessening code and cleaning up mundane “do I have what I’m expecting” code? Here’s a very simple example:

class Model {
    public $:amount = 0.00; // float!
    public $:name = ""; // string!
    public $:id = 0; // integer!
    function __autocast($type, $assignment) {
        // we are checking for if an array was assigned into this class
        if(\PHP\CONSTANTS\DATATYPE::ARRAY == $type) {
            $this->:amount = $assignment['amount'];
            $this->:name = $assignment['name'];
            $this->:id = $assignment['id'];
        }
        else {
            trigger_error('Error!Only autocasts arrays.', E_USER_WARNING);
        }
    }
}

What’s the above accomplish? Check out the sexy things I can do:

$row = array('amount' => '0.00', 'name' => 'Michi', 'id' => '1');
$:model = new Model;
$:model = $row;
echo $:model->:amount; // outputs a FLOAT (not a string) value: 0.00

The following accomplished the EXACT SAME THING because of the __autocast magic method.

$row = array('amount' => '0.00', 'name' => 'Michi', 'id' => '1');
$:model = $row;
echo $:model->:amount; // outputs a FLOAT (not a string) value: 0.00

Not only that, but we also squashed the unintended non-zero bug on the amount column! It means the future PHP models that represent database data will finally have properties that mirror the datatypes of the database, rather than just being the string representation.

Practical Example: Refactoring for Code Scaling

PHP’s greatest weakness is its ability to “scale” the code base. As the code gets larger and poor coding practices are used, it becomes very difficult to go back and fix things without completely gutting everything (see my article about this). Autocasting fixes this.

For example, nobody thinks twice when they see code like this:

$someObject->processQuery($db, $query); // drives Michi crazy

How do you know $query is a string, let alone a query? How do you know $db is an object? Do realize that if $db isn’t an object, PHP quits with a fatal error saying some method can’t be called on a non-object? This is a serious problem! And yet it’s just business as usual in the PHP world. Type hinting is NOT the full solution here, and it is worthless when you consider in refactoring. Type hinting ultimately triggers a fatal error that the developer is powerless to stop during run-time. Yes, type hinting lets you control what your function deals with, but the answer is NOT to take your toys and go home when you get something you didn’t intend. Let’s illustrate; imagine this code:

function processData($data) { // implied string (bad!)

And the author later realizes, “Wait, I want to make $data a class so I can do more to it.” So the author changes it:

function processData(Data $data) {
    $data->process();
}

But the problem is that now if somebody passes in a string/array/integer/etc., they get a FATAL ERROR! So then the function caller ends up doing crazy spaghetti that looks like this (actually 90% of the time, the caller won’t do this until after the bug hits production and a fatal error happens :( ):

    if(!($data instanceOf Data)) {
        $dataObject = new Data();
        $dataObject->setData($data); // ugh, exposed public setter method needed!
        $data = $dataObject;
    }
    processData($data);

}

That’s no good! In virtually every language, this kind of refactor is not possible without causing serious problems to the outside developers. In statically typed languages, the compiler catches these types of things, and then everybody does a mass re-write. But in dynamic languages, you can’t find these issues until you run the code. So how does autocasting solve the problem?

function processData(Data $:data) {
    $:data->process();
}
// leave the complicated stuff to the __autocast magic method!
class Data {
    private $:payload = "";
    function __autocast($type, $assignment) {
        // for now, we only worry about strings, but in the future we could do
        // a check for LegacyData types and convert those too!
        $this->:payload = $assignment;
    }
    function getData() {
        return $this->:payload;
    }
    function process() {
        return "data: " . base64_encode($this->getData());
    }
}

So if a caller passes in a string into processData, it gets assigned into :payload, and the code keeps on working. One thing that’s neat is that we don’t need to expose a public setter method just to make things backwards compatible. Additionally, if we want to do any special processing or data conversions, we can do that in the magic method. Lastly, if we upgrade things again later, we can create a new logic fork inside the autocast magic method to help convert the legacy object type to the new one.

// Oh no! Changed the argument again!!
function processData(XmlBLob $:data) {
    $:data->process();
}

class XmlBlob {
    private $:payload = "";
    function __autocast($type, $assignment) {
        // If it's of type Data, convert it over
        // otherwise, roll back to the uber legacy behavior
        // I'd really love it if this sort of comparison is legal
        if(Data == $type) {
            // convert the data to the format we want
            // we could use a magic method here too if
            // $this->:payload was a class instead of a native
            $this->:payload = self::toXML($assignment->getData());
        }
        else {
            // note this is autocast to string
            $this->:payload = $assignment;
        }
    }
    static function toXML(string $:data) {
        // do some XML conversion magic here
        return $:data;
    }
    function process() {
        return "data: " . base64_encode($this->:payload);
    }
}

In short, autocasting allows library writers to hide complexity from implementing developers. And, as a super-added bonus, it makes changing/deprecating method signatures actually possible!

Implications

There’s a number of substantial implications with this feature. Summary of points:

  • Might make things messier since autocasting is “automagical”
  • IDEs can do even cooler things since types can be known
  • True function overloading is within reach
  • Native dependency injection is also within reach
  • Possible speed improvements, but possible speed issues

First of all, this could make PHP even messier than before. But that’s the case for any new feature that is poorly used. But I do admit that introducing “magic” and “shortcuts” can eventually lead to code that looks like the nightmare we all know as Perl (zing!). That said, I see overwhelmingly cool things that become possible with this. Most importantly, PHP becomes “type safer.” Think about it: today, I will bet you good money that almost every code base has a function where the method author wrote code checking if the caller passed the right data types in — or vice versa. The reality is that even in a loosely typed language, datatypes are important. So while some portion of the population might use this to write some really crazy Perl-like code, I think the benefits outweigh the costs. These changes make it easier for library authors to maintain and understand their code (which I believe is a more important battle to win). Autocasting allows authors to put up a moat around their libraries/classes where they can absolutely control the types of variables they are dealing with — without forcing fatal errors everywhere.

It also means IDEs can do even more error checking and type hinting. Imagine if the IDE warned you when you setup a situation that, without autocast, would have triggered a fatal error! It means that when a caller tries to use the return value of a declared autocast function return type, the IDE can warn the developer.

This feature might also open the door to true polymorphic PHP. For example, a class could have two constructors: one with an autocast string argument and one with a generic argument. At run time, using some simple rules, PHP might use different versions of the same function name depending on the variable types. Voila! If a method signature can state what TYPES of arguments it wants, and we can explicitly state what we are passing in, isn’t that the first step in setting up true function overloading? While this is beyond the scope of my idea, I thought it was an interesting secondary benefit that somebody smarter than I could explore.

You might notice that autocasting behaves a lot like typical dependency injection patterns. Since constructors are automatically called for uninitialized variables, it would be possible to simulate dependency injection in functions very easily (a boon this is for testing!). In earlier examples, I showed you cases where a $parser or $db variable was passed in. Imagine if in those examples, such a variable was passed in as NULL (not provided). Now, PHP would automatically construct them from scratch, leaving the function implementer free from the burden of constructing the object. If you think about it, this puts us within striking distance of some kind of dependency injection in PHP. Then, somebody smarter than I can suggest a static __inject() magic method that gets called during automatic object construction… :)

Finally, while I’m not a C programmer, I wanted to take a moment to say that it’s possible that autocasting could provide memory/speed optimizations for PHP since certain variable memory spaces wouldn’t constantly change. Again, I’m not a C programmer and I don’t know how PHP’s memory allocation is designed, but I thought I’d throw that out there. On the same note, all of the casting logic could prove to be quite taxing. Thus, it could negate any perceived performance gains.

Spread the Idea

This is something I’ve been internalizing for no less than half a decade. I would genuinely love to see it in a future version of PHP, but I’m too busy to evangelize the idea. I’ve also never met somebody else who fully understood my idea. I’m finally releasing the idea into the wild and hoping for the best. It’ll probably sit in the Internet Idea Junk Yard. :) Please feel free to share this idea with your peers and pass it along to other PHP developres.


A PHP/MySQL Bug Most People Have But Don’t Realize

I’ve seen this over and over in my career and thought I should save others from the horror. Part of me feels like I blogged about this years ago, but I couldn’t find a post referencing it (EDIT: found it!). The bug is simple:

  1. Create a database table with a decimal value, such as order_total
  2. Write some code that retrieves the row
  3. Do an implicit boolean check on order_total to see if it has a value

Here’s some actual code:

$results = mysql_query("SELECT * FROM orders");
while($row = mysql_fetch_assoc($results)) {
    if($row['order_total']) {
        echo "Order total is clearly not zero!";
    }
    else {
        log_bad_order($row['id']);
    }
}

This code has a serious bug in it. The problem is the line pertaining to checking if the order_total has been set. Pop quiz:

What is the value of the following:
(bool) “0.00”

The answer is TRUE! 0.00 may evaluate to zero, but “0.00” is not the same thing! As soon as PHP sees more than just a single “0” in a string, it assumes it’s a regular string and treats it as a non-zero string. A more obvious way to ask the same question:

What is the value of the following:
(bool) “0.0000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000″

Or what about:

What is the value of the following: (bool) “0.”

The point is that as soon as you go beyond a single zero, PHP just assumes the rest is real data and will not discard it. Thus:

if(0.00) {
    echo "THIS NEVER EXECUTES";
}
if("0.00") {
    echo "THIS ALWAYS EVALUATES TO TRUE";
}

SO going back to the original problem, the way to solve is is by fixing the code to either explicitly type cast the variable or use a “greater than” check:

$results = mysql_query("SELECT * FROM orders");
while($row = mysql_fetch_assoc($results)) {
    if((float) $row['order_total'] > 0) {
        echo "Order total is clearly not zero!";
    }
    else {
        log_bad_order($row['id']);
    }
}

If you don’t do either of these things, I’d suggest you go and double check some of your code.

Is PHP here to stay?

As a LAMP developer, I am starting to question the long term viability of PHP. PHP was born during an era when knowing HTML was a valid and valuable resume bullet. Because of this, most of the “advanced” aspects of PHP — which relate to the OOP functionality — were introduced only after PHP 4/5, and weakly at that. Additionally, new languages have since become popularized that show the weakness of PHP. Don’t get me wrong, I am very supportive of PHP. I just believe that it’s important that people understand both the strengths and weaknesses of the tools they use.  There are two main points I want to cover:

  1. PHP thread support is weak
  2. PHP OOP = Broken

The second point is rather technical, but it closely relates to another strength and weakness of PHP: it is loosely typed. More on that later.

Thread Support is Weak

True threading support in PHP does not exist. The closest thing is the pcntl_fork method, which copies the current process, rather than create a thread. This means asynchronous processing within a single process is not supported. Threading is useful in event-driven architectures (common in JavaScript) or when doing blocking operations such as network calls.

Because the forked process is a clone of the original, it shares all of the original resources, including database and file resources. This means that the forked process must be self-aware of whether it is a child or not, and must be careful not to modify or close these resources. This encourages spaghetti code that contains large logic forks (“if I am not a clone, else…”). Because of this, forking is messy and error prone. This gets further complicated when PHP is executed by Apache in a web environment. In fact, the PHP manual advises avoiding forking with web servers:

Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment.

Not to mention the method is incredibly C-like in that it is very “raw” (unlike other native PHP methods/classes). This increases the barrier to entry significantly, which ultimately serves to have the feature ignored by most shops.

Why is all of this important? Well, at most companies, one language is selected for all in-house development. This is because cross training and hiring is simplified if everybody speaks the same language. There are a few common tasks that are unnecessarily difficult to do in PHP:

  • Asynchronous work — handing off work such as connecting to a remote server to a child and wait for a response
  • Manage thread pools — this sort of work requires significant “by-hand” management of any processes spawned by the parent via pcntl_fork

The threading issue is only a pain point that impacts processes that need to become parallelized. It is a pain most big shops live with, or, alternatively, introduce other languages to help solve.

PHP OOP = Broken

Because of the loosely typed nature of PHP, true, well-formed object oriented programming is broken. I know that for many PHP programmers, “Object Oriented” means putting together classes and reusing code as objects. However, that is truly, sincerely, only a portion of the point of OOP. Some of the most powerful aspects of OOP are lost in PHP’s implementation of the concept. Don’t get me wrong: these decisions were probably the right fit for the niche PHP was filling, but I don’t believe most PHP programmers are fully aware of what they are missing.

While the language, thankfully, has interfaces and abstract classes, they are woefully underused. This is, in part, due to to the developer community being largely self-taught. This creates a misconception about the nature of OOP, which ultimately leads to the devaluation of the most important feature of OOP: interfaces.

I can go into why they are so important in another article, but the point is: without interfaces, true polymorphic code is impossible. Or, rather, extremely susceptible to spaghetti code and fatal errors.

In other languages (Java), code might look like this:

interface Animal { void makeSound(); }
void farm(Animal cat, Animal dog, Animal parrot) {
  cat.makeSound();
  dog.makeSound();
  // Note: Parrot class *DOES* have a method called moveAround()
  parrot.moveAround(); // ERROR!

The interface in this example defines a uniform way to access a class through a standardized API (thus the name, application programming interface). In a strongly typed language where all variables must have a type, the cat variable is defined as an implementation of Animal. This enforces and allows the method call makeSound(). If cat has a meow() and dog has a woof() method, they can not be called here without a compiler error. This is because in this function call, the parrot variable is defined as being an instance of Animal (versus being a Dog, Cat, or Parrot). As such, only Animal methods work here.

More importantly, because the compiler does this type checking, any invalid calls, such as the last one, would error and never compile. Even if the Parrot class has a moveAround() method, it can not be called in the code above. This is an extremely important aspect of OOP since, as a definer of the Animal class, I want to make it very specific how Animals should be treated (you can only makeSound!). If a programmer tries to do something to an Animal that I haven’t defined, they get an error. If they wanted to make that last line work, they would need to use object typecasting:

void farm(Animal cat, Animal dog, Animal parrot) {
  ...
  ((Parrot) parrot).moveAround();

Or by changing the function definition:

void farm(Animal cat, Animal dog, Parrot parrot) {
  ...
  parrot.moveAround();

But note that in this case, the user had to make an explicit choice to stop using Animal’s interface. Yes, parrot is still an Animal, but it doesn’t have to be. This, in short, helps prevent spaghetti code because it forces the developers to think about whether or not they want to deviate from a particular interface. Realistically, if presented with these alternatives, a Java programmer would probably use other types of abstraction techniques (e.g., dependency injection)  to keep this method from needing to be used. However, this example was necessary to illustrate how things are done in PHP.

So how would this look in PHP? Why isn’t this the same there? Well, take a look at the following code that, unlike the Java example, works perfectly fine and raises no red flags.

interface Animal { function makeSound(); }
function farm(Animal $cat, Animal $dog, Animal $parrot) {
  $cat.makeSound();
  $dog.makeSound();
  $parrot.moveAround(); // WORKS FINE 
}

This code works great. We have three arguments all forced to use the Animal interface. Great. As a casual observer, there is really, truly, nothing wrong with this code. It’s a little strange, but if it’s commonly known that Birds can moveAround(), there is no problem. In fact, in most PHP shops, I will bet money that type hinting is NOT used. This will further illustrate how bad the spaghetti is about to get (read on).

Now imagine in six months if we decide we wanted to group up this code so that it uses a single array/collection as an argument. This is where things would look like traditional polymorphic code. I mentioned spaghetti above. Let me show you why:

interface Animal { function makeSound(); }
function farm(array $animals) { // note, we can't guarantee what's inside of this array
  foreach($animals as $animal) {
    if($animal instanceof Parrot) { // or maybe a method_exists() call?
      $animal.moveAround(); // SPAGHETTI
    }
    else {
      $animal.makeSound(); // Hope for no fatal errors!
    }
  }
}

Wow, look at what we just did. A harmless piece of code in PHP six months ago completely breaks when you try to refactor it to use a fairly typical design pattern. More importantly, unless I put in even MORE code to do type checking, there’s a chance that the makeSound() line will actually die in a fatal error if, for example, a string is passed in as an element of the argument array! See example without Parrots:

interface Animal { function makeSound(); }
function farm(Array $animals) { // note, we can't guarantee what's inside of this array
  foreach($animals as $animal) {
    $animal.makeSound(); // Hope for no fatal errors!?
  }
}

PHP is extremely flexible when it comes to hacking out a page, but when it comes to OOP, it’s about as brittle as you get. Refactoring is painful and error prone, and elegant design patterns like the ones you might see in a message-passing language such as Objective-C, Scala, or Erlang don’t work. Remember that by using functions such as method_exists() and is_object(), I can emulate the desired behavior; however, the extra code means more places for bugs and less time spent making the program do what you want it to do. The point is that the OOP constructs in PHP don’t fully work. As a result, certain very important aspects of OOP don’t translate very well to PHP.

Some people may still cling on to the notion that “ultimately, you can still do it, it just requires more code!” But I argue that preventing “more code” is the exact reason why OOP was invented. By writing more boiler plate error checking code, we are wasting time. The issue is exacerbated by the fact that the error checking code isn’t required, unlike say, if you were throwing exceptions. It isn’t immediately obvious in that last example that you need to do error checking for is_object() on the $animal variable. It’s these types of oversights that really damage PHP as the code base gets larger.

Conclusion

What I’m realizing is that PHP isn’t meant to scale. Yes, it can take a lot of web traffic, but that’s not what I mean. I’m talking about scaling in the sense of growing team size and code base. The design of the language promotes coding paradigms that ultimately damage the code base. This is because PHP makes it harder use good OOP practices on legacy code. To illustrate:

  • PHP became popular because it is easy to hack things out, even if that something required doing it the “wrong” way. These problems come back and bite you when the code base grows.
  • PHP can’t support a large development team as effectively because its weak typing allows for sidestepping certain core OOP principles (see above)
  • PHP  allows for invisible future-bugs (see above) to be inserted without any immediate cause for alarm
  • As applications get complex and require threading or distributing of processes, PHP fails to keep up (so other languages get used)
  • Because PHP does not use dynamic dispatching (message passing), calling a method can cause runtime FATAL ERRORS (unacceptable and very hard to debug!)

All of this makes me rethink the popularity of PHP. There are some new languages, still in their infancy, that pose a threat to PHP’s current dominance. I believe that in the next few years, as today’s systems become “legacy,” today’s newcomers will finally be production ready. At that point, we might see companies adopt the newer languages, which will support more modern programming paradigms. We are seeing this today with Ruby, for example.

Of course, I could be wrong. I once told people that PHP was “C of the web.” It’s possible it’s here to stay forever, despite all of its flaws. And, for the record: I do not believe Python or Ruby will be the language that will overtake PHP, but that’s for another post.

I just want everybody to know that I am a PHP developer, so I speak from experience. We should recognize that technology changes and evolves, and it is important that we constantly update our skill to ensure they don’t become obsolete. I’m just pointing out that perhaps PHP isn’t as timeless as C (or, possibly, Java).

Lastly, I will plug my personal belief that being “religious” about a language because it is “the best” is short sighted. New languages are born, literally, every week. It’s only a matter of time before a language comes along that does what your language does more elegantly, faster, and with less code.

Only time will tell. :)

Q: Hiding JS Files? A: Impossible

In my popular post about hiding your Word Press folder, a reader asked:

Hi Michi, can you help me with this, in the head section i wrote this:

<script src="/style/js/somescripts3.js” type=”text/javascript” charset=”utf-8″>

and when we go to the webpage then right click, it will show:

<script src="content/themes/exampletheme/exampletheme/style/js/somescripts3.js”
  type=”text/javascript” charset=”utf-8″>

can you teach me or show me how to do that, any help highly appreciated, And im so sorry if my english not good.

This question was complicated enough where I thought a new post might make sense.

For clarification, I believe he is asking if it’s possible to put one thing in the source and another that the browser sees. This is impossible. Anything that the browser can see, the user can see. There is no way to “show” something different in the source of an HTML file versus what the browser sees (except through obfuscation); however, you can forward things along behind the scenes. You want to create an htaccess rule that will redirect your requests.

RewriteRule /path/to/thejsfileyouwanttoshow\.js /path/to/real/js/file.js [L]

Let me reiterate that you *cannot* hide the content of the JS file. However, you *can* hide the true folder structure of the web server. If you desire to hide your JS contents, the better solution is a JS minifier.

Alternatively, if your goal is to somehow make it harder for somebody to steal your code and you don’t want ot use a JS minifier, you could write the JavaScript tag dynamically using another piece of JavaScript. However, ultimately, that level of weak obfuscation won’t protect you from anything since Firebug will quickly expose what’s really going on.

I hope that answers your question.

The Basics on Using Models and Controllers in PHP

Today I want to talk about passing in objects as arguments in PHP methods. Many PHP developers do not have this patience. This is obvious when studying libraries written for Java versus those in PHP. It is a horridly underused programming style in PHP, and since PHP supports argument type prototyping in methods, I thought it would be good to go over this particular style of development.

First of all, let’s start with a few “PHP” way of doing a mundane task (this will seem extremely familiar):

function login($username, $password, $remember);

function processMail($to, $from, $subject, $body);

function editPost($id, $title, $body, $newTime);

Of course, good developers would make sure these types of examples are part of a class:

$user = new User;
$user->username = $_POST[“username”];
$user->password = $_POST[‘password’];
$user->login();

The examples can continue, and I hope that you good developers use this type of basic OOP development when doing using PHP. :) But there are examples where the standard “newbie” OOP model seemingly falls apart. The system quickly breaks down when new requirements are added to the system. For example, what if we want to do a “remember me” option in the login? What about logging in as an administrator versus a regular user? Okay, now what if we have different login session lengths depending on the user type? How long do you think that login function will be? Think about how it will accommodate IP bans, login logging, banned users, suspended users, max failed attempt lockouts, etc. The list goes on, and depending on your implementation, things can get very ugly. Your login function might become a huge bloated monster sitting in your User class.

The problem is that what you’re actually doing is mixing a data model [user data] with controller logic [how to login using the user data]. The solution is to separate these two entities into two classes, which is what you would see in most modern MVC frameworks.

Here is the seemingly more complex, but far more elegant solution:

$authenticationController = new UserAuthenticationController;
$user = new User;
$user->username = $_POST[‘username’];
$user->password = $_POST[‘password’];
$user->rememberMe = true;
$authenticationController->login($user);

The prototype for the login method would look like this:

function login(User $userObject);

In this example, I am hoping to show you possibilities. First, notice that the arguments for login() are down to one. But the more interesting part of the implementation comes with the proper abstraction between the User data and the authentication process. In my example, I Just logged in a regular user. So if I had to map out my class structure, it might look like this:

(Abstract class) AuthenticationController
=> GuestAuthenticationController
  => UserAuthenticationController
    => AdminAuthenticationController

The old way of doing things would look something like these:

function adminLogin($username, $password, $remember);

$user->adminLogin();

But using the Java-esk model, we’d end up with something like this:

$authenticationController = new AdminAuthenticationController;
$user = new User;
$user->username = $_POST[‘username’];
$user->password = $_POST[‘password’];
$user->rememberMe = true;
$authenticationController->login($user);

This means the login method is likely broken up in a few pieces inside the AuthenticationController. The Guest user’s login() method would always return false. The UserAuthenticationController would piggy back on AuthenticationController::login() by looking at the User::rememberMe variable and take it into account. But the AdminAuthenticationController doesn’t allow people’s logins to be remembered due to security reasons, so it doesn’t take that variable into account. And in that crazy case where there is nothing different about the admin login, the method would remain untouched (inherited from the parent), but any other changes (such as session length) would still kick in for the admin user with no additional coding.

All of this is done without modifying the core User class. The user class remains clean for its own further abstraction possibilities. New fields such as profile, name, DOB, etc., could be added with no modifications to the controller.

Yes, my version requires the most lines of code, but it is also the easiest to maintain and understand. Why? Because it isn’t cluttering up the User data class with methods that have nothing to do with the user. If you’ve ever written a generic “user” class, you know how large and cluttered such a class can become when you start piling in the methods for login, logout, preferences, session management, lookups, and other needs. I haven’t even talked about the fact that virtually all “operations” that involve a user also involve the database, which adds its own headaches. Being able to keep the hard work in other more data-manipulation-oriented classes is for your own good.

If down the road, it is determined that logging in should also require pre-approved IP addresses, what will your code need? Will your login method need an IP address passed in too? Or will the IP address be generated inside the login method? In my example, I would update the core Authentication class and be done. What happens when a new requirement is added that requires that the login also passes a CAPTCHA test? What about when logins need to be logged in a flat file? What happens when we change logins to use the email field instead of the user field?

Today, I only talked about logins. The ideas I propose here are not new; they are simply good ideas that get ignored by web developers. Remember, you’re application developers that happen to work in a browser. Don’t think that regular application design principles don’t apply to you: they do, more than ever.

Neat Idea: Creating Alphanumeric IDs

UPDATE: For those of you looking for a great way to generate highly unique ID that is shorter than what you might get using a hex number, try this (it will generate a ~17 character ID):

list($hex, $dec) = explode(‘.’, uniqid(null, true));
$id = (base_convert($hex, 16, 36) . base_convert($dec, 10, 36));

Ever needed to create an ID has that looked something like f39a2xm91? You might not have, but some day you’ll want to. The easy way out is to use the native md5() function, but that creates a long 32 character hash which may be a total waste of (database) space. These types of IDs are often used to mask integer IDs so that your users can’t just type in user_id=10000, user_id=10002, user_id=10003, and so forth to look at your records. Some might even call it security through obscurity. Well, let’s be clear: this sort of activity does not add security, but it does make for making “browsing” behavior more difficult.

Either way, if you desire to move away from the classic integer format IDs, I have a different solution for you:

base_convert($someId, 10, 36);

This will convert the number 10,001 into 7pt, 10,002 into 7pu, and 100,000,000 into 1njchs. As you can see, you can store a heck of a lot of numeric information in a very tiny amount of (character) space. I am not saying this will save you database storage space, but it will make your URLs shorter.

One of the main benefits is this is that you can store more data in a smaller human-readable space, thereby allowing you to create smaller unique IDs. So for example, in our logging system at work, I use this method to generate issue IDs that end-users can send to us when they have a problem. This issue ID is based on the last eight digits of current time with microseconds concatenated with (using “.”) a random seven digit number in front. I then base-36 encode this resulting number (stripping out the decimal point).

Note: my solution is specific to the problem I was facing. It’s not necessarily a full proof way to generate unique values, but it’s what you would call “good enough”. Do not use the solution unless your solution does not hinge on absolute unique values.

A warning about base_convert is that large numbers breaks down in PHP, so be careful (we’re talking very large numbers). This means pasting together the current timestamp, the user ID, the session ID, and a fourteen digit number into one 50 character long number will probably result in some precision errors (not a huge deal for most implementations, but be warned). From the PHP manual:

This is related to the fact that it is impossible to exactly express some fractions in decimal notation with a finite number of digits. For instance, 1/3 in decimal form becomes 0.3333333. . ..

 

So never trust floating number results to the last digit and never compare floating point numbers for equality.

The “base” refers to the numbering system used to convert the number. In a base 11 system, the counting goes to 10, then the letter A, and then loops around back to 1. So in other words, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, A, 1, 2… In base a base 16 system, you would go all the way to F before going back. So the larger the base, the more “compressed” a number can become.

I used base a 36 scheme (0 – 9, A – Z), but you can use smaller bases to come up with longer conversions. For example, a base 21 conversion (0 – 9, A – K) will convert 10,001 into 11e5, 10,002 into 11e6 and 100,000,000 into 13a3k7g. So in short, if you have a database where your record IDs start at above 7 or 8 digits, maybe you can think about base encoding them into shorter IDs.

Just a neat idea I wanted to share.

Debugging Tips for Database Abstraction

Today I want to talk about database script debugging in large systems. The main problem is that in large applications, it becomes difficult to find the source of rogue queries that, for example, broke in a recent system update.This may not readily apply to most of you, but bear with me: some day it will.

Pretend for a moment you have a database architecture where you have 2 masters (dual replication) and 2 read-only slaves. Now pretend that you have a large application with 100 different pages/scripts. You have 5 web servers with mirror copies of the application. This would be a fairly typical setup for a small, but growing company.

One day, you come into work and find out that you had a bad transaction lock that caused your system to hang all weekend. So you look at the process list and you know what query is causing the problem (because it’s still stuck). The problem is that it looks suspiciously like the queries you’d find on virtually every page in your application. How do you fix this problem? An different (but related) problem is when an update initially executed on one master database server replicated to a slave and got stuck on the slave but executed fine elsewhere. What happened? Which master server got the initial query? This sort of debugging is very difficult to track down without more information such as where the query was initially sent and from what page it originated.

The primary challenge is figuring out which query came from what page in your application. The solution is to add logging straight into your queries. The implemented looks something like this:

//Get the current page or script file
$source = $_SERVER['REQUEST_URI'] ? $_SERVER['REQUEST_URI'] : $_SERVER['SCRIPT_FILENAME'];
//Replace out any comment tags and add in the database being connected to
$metaData = str_replace(array('/*', '*/'), array('/ *', '* /'), $source) . " ($databaseHost)");
//Escape the query so the URI can't be used to inject data
$metaData = mysql_real_escape_string($metaData);
//Execute the query
$result = mysql_query("/* $metaData */ " . $query, $connection);

This solution inserts a comment into your query that gives you useful information that can be seen when looking at the raw query. MySQL uses C++ style comment blocks (the /* */) which are ignored by the parsing engine. This means you can pass data to the engine which can be useful for debugging. These comments are also replicated down to the slaves, which can be useful when you find a slave having problems with a query that came from a master server. For those of you unaware, the “URI” refers to the full URL that was typed in the address bar to access a page.

But make sure that you correctly sanitize the URI so that somebody can’t arbitrarily end your comment block (with a */) and inject their own nonsense into your query. Also, considering issues like multi-byte character attacks, I don’t even want to take the risk of not further escaping the data with a call to mysql_real_escape_string.

The solution we use at my work logs the web server IP, database server IP, and script path/URI. Other potential ideas are local timestamps, version information, user IDs, and session IDs.

In conclusion, this solution will help you identify the source (and sometimes the destination) of queries that are causing problems. This has been used in our production environment at work often when trying to determine what pages are producing extremely slow queries. This solution should work with any database, although my example is written for MySQL.

Happy debugging!

PHP Best Practice: Don’t use INC extensions

I have been bad about updating, and this goes back to an old habit that probably has to do with human nature: as time between updates increases, there’s a desire to write a “big” update, which is increasingly difficult as news-worthy events happen and are ignored. There’s so many things for me to update about that I could touch on, such as the iPhone SDK update, news on IE8 passing the ACID2 test, my predictions from a year ago that were spot on (until about a week ago when all stocks tanked), and Sun buying MySQL. But I wont. Perhaps next time. So this post is small, but serves as a feeler post to help me get back into the routine. The truth is that I have several programming post drafts setting on my machine that could have been posted a long time ago if I had given them a final read-over. Those things take a lot longer than they look from the casual observer.

Today’s post is a best practices post. The tip is simple: When creating a naming convention, never rely on the .inc extension. The .inc is used in some shops to denote files that serve as libraries. This is a terrible practice for a number of reasons.

First, it means deploying your library ANYWHERE requires adding the extension to your server’s configurations so that it knows these files are for PHP executables. This isn’t a deal breaker in most cases, but beware that if you use shared hosting environments, this sort of thing can be annoying and stall development.

The second far more practical reason is for security. When these library files are moved to a new server which has yet to be configured, they are wide open for public viewing. Because the server doesn’t know they are PHP files, they are served up as text files, essentially exposing your code base for the world to see. I’ve seen this issue pop up in production environments where a new web server was brought online without being fully configured, causing pages to become exposed. This is the sort of business that helps cause source code leaks (remember the Facebook code leak late last year?).

Of course, this points to the greater issue that library files shouldn’t be web accessible, but I have also seen this paradigm used in common CMS applications where you have a .php file include a .inc file that contains the bulk of the page logic. Here, again, you would be exposing highly sensitive application logic to the world.

If you really want to denote files differently, I prefer to use file prefixes. As in, classes might get a prefix like “class.[rest-of-filename]”. Or perhaps “function.[rest-of-filename]”. There’s even “include.[rest-of-filename]”. The point is, a prefix can’t kill you because the files retain the .php extension. :) Happy coding!