Wireshark and Stealing Data

I saw a pretty interesting video series today about sniffing network data. It really helps illustrate the importance of using HTTPS for any remotely-sensitive transactions. I’ve created a playlist you can find below:

Did Digg Miss the Boat Again?

So Digg released a new layout the other day, and I feel like another boat was missed. They made a big splash about this and it was covered in numerous places (for example, TCMashable). The new layout is noticeably faster to load, which is a huge plus. They tout that this new version emphasizes a “My news” approach to Digg, where they personalize the Digg site based on what you dugg in the past. In practice, unless you’re a power user, your news ends up spammed with news from one or two blogs you frequent.

I feel they are addressing their threats in the wrong order. Their website wasn’t perfect, but it wasn’t their weakness either. Consider:

  1. The Like button is dominating right now. Virtually every blog has it.
  2. Facebook is a HUGE news traffic driver. Way bigger than Digg.
  3. A lack of personalization was never Digg’s problem. Plenty of news sites on the web are popular with no personalization functionality.

First, Digg needs to figure out a way to make article submission “fair” for the little guy (read: long tail of users). They should have fixed the fundamentally flawed “democracy” where certain users effectively had 1000 votes. Personalization is an approach to the problem, but it ultimately doesn’t stop popular individuals from heavily influencing all of their followers’ feeds (and thus accumulate votes). The main complaint was that only power users could effectively get articles to the front page. Perhaps the algorithm should better incorporate Digg-to-viewer ratios or weight Diggs from non-followers as greater. The point is, until this is fixed, Digg will never fully engage its non-power users due to a lack of incentive. This represents the vast majority of its user base.

Second, Digg needs to up readership engagement. They should really look at their Digg button and see how it compares against the infamous Like button. It needs to be as brainless as the Like button. Clicking on “Digg This” should instantly submit the article to Digg. No windows; no dialogs. This is how the Like button works, and its pervasiveness shows how simplicity can trump everything else. Of course, doing this might mean changing how article submissions work on Digg — no problem: let power users check a setting where they ARE prompted for a custom title or description. The point is, the process needs to have as few places as possible where a user can change their mind about participating in the Digging process.

The website, I think, was never the problem.

PHP Tip: Always Put Constants on the Left in Boolean Comparisons

This was a standard I enforced at my last company:

Whenever you are doing a boolean check (such as an IF or WHILE), always place constants on the left side of the comparison.

The following is BAD:

// BAD
if($user == LOGGED_IN) {

The following is good:

// GOOD
if(LOGGED_IN == $user) {

Why is this such a big deal? Imagine the typo where you forget the second equals sign:

// Oops! This always evaluates to true!
if($user = LOGGED_IN) {

This sort of bug is fairly common. C# went as far as to say boolean conditions must always have boolean return values, thereby eliminating the possibility of accidental assignments. Well, since PHP can’t do that, this is the next best thing. Notice how this convention will save your butt:

// Fatal error. Bug caught immediately.
if(LOGGED_IN = $user) {

Think about it. :)

Is Your Blog Not Receiving Pingbacks? I Fixed Mine.

I recently noticed that my blog was no longer registering pingbacks (the automatic in-comment notification that occurs when somebody else blogs about your post). I like these because they help me understand which of my articles are gaining traction.

My symptoms

  • My other blogs hosted on the same server seem to be pinging fine; however, those have far less posts and plugins
  • I am able to send pingbacks, apparently
  • But ping backs TO my content were dropped (even when I am self-pinging)

The fix

I figured the issue was somehow related to my recent upgrades of WordPress. After scouring the web, I found that the issue was due to a poorly designed timeout setting in WordPress.

  1. Open wp-includes/cron.php in your blog folder
  2. Go to the line that starts with: wp_remote_post( in the spawn_cron function
  3. Change ‘timeout’ => 0.01 to ‘timeout’ => 1 (or any other far more reasonable value)

This will fix blogs that are plagued by this bug.

Autocast Variables Whitepaper: What I Want to See in PHP 6

Edit: I’ve moved the autocast white paper to its own page. Let me know what you think!

How would you answer the following question:

Imagine you are now in charge of PHP. What do you cut/add/change in PHP 6.0?

I ask this during interviews, and as you can imagine, I get all sorts of answers. The best answers are pulling in features from other languages, particularly OOP concepts. These answers aren’t bad, but they almost always try to “fix” PHP’s broken OOP while also crippling the strength of a loosely typed language. I have my own unique answer to this question, and I wanted to share.

Maybe somebody else has already thought of this, but if not, I’m going to coin it right here:

Autocast variables. An autocast variable is like a container for data — everything going into an autocast variable type will always be converted to the current type of that variable. As in, if you assign a string into an integer variable, the variable will become the integer representation of the string (via implicit and immediate typecasting).

The idea is a hybrid of limited type safety – where only some variables are type safe – and operator overloading of the equals sign – on native datatypes. To help explain the idea: it would act almost like somebody following around your cursor and typing (int), (string), etc. all over your code before all variable assignments.

The goal is to allow a developer to be – when desired – 100% certain they are working with a specific data type.

Introduction to Autocasting

To declare a variable as an autocast, simply place a colon after the dollar sign in a variable name. Then, everything assigned to that variable is now automatically typecast to the datatype of the variable. For example

// This variable is now a container for integers
$:orderTotal = 0;
// assign a float value
$:orderTotal = 1.01;
// outputs 1; 1.01 was typecast to an integer
echo $:orderTotal;

NOTE: Why the new syntax? I toyed with the idea of an autocast keyword, but the paradigm broke down when you started assigning objects. The problem is that objects are pass-by-reference. This meant a programmer could change the datatype of an autocast variable by altering its reference. The other problem was that by not having a visual marker, it would make things very confusing  since one could never tell if they were working with an autocast until runtime. Lastly, why the dollar-colon? I would have prefered straight colon, but most of the good single-character syntax would conflict with existing PHP systems (# is a comment, : is used in ternary operators, % is modulus, ^ is a bitwise operator, etc.). A dollar sign is universally understood as a variable, so I thought the next best thing was to alter the variable in a way that today’s PHP would recognize as invalid (and thus introducing the syntax would not conflict with legacy code).

The concept is simple, but gets more complicated as you introduce objects, magic methods, and method signatures into the equation. Don’t worry, I’ve thought about all of those scenarios. Key summary of benefits:

  • New coding paradigms allow for simpler interaction between different data types (see first Practical Example)
  • Refactoring can be done in a way never before possible (see second Practical Example)
  • Code is now more “reliable” because unintended data types aren’t used (such as during boolean checks)
  • Many fatal errors can now be avoided
  • Potential use in the realm of dependency injection
  • Possibilities for true function overloading since expected datatypes are known (although, this is possible today, to be honest)

Edit: read the rest here.

A PHP/MySQL Bug Most People Have But Don’t Realize

I’ve seen this over and over in my career and thought I should save others from the horror. Part of me feels like I blogged about this years ago, but I couldn’t find a post referencing it (EDIT: found it!). The bug is simple:

  1. Create a database table with a decimal value, such as order_total
  2. Write some code that retrieves the row
  3. Do an implicit boolean check on order_total to see if it has a value

Here’s some actual code:

$results = mysql_query("SELECT * FROM orders");
while($row = mysql_fetch_assoc($results)) {
    if($row['order_total']) {
        echo "Order total is clearly not zero!";
    }
    else {
        log_bad_order($row['id']);
    }
}

This code has a serious bug in it. The problem is the line pertaining to checking if the order_total has been set. Pop quiz:

What is the value of the following:
(bool) “0.00″

The answer is TRUE! 0.00 may evaluate to zero, but “0.00″ is not the same thing! As soon as PHP sees more than just a single “0″ in a string, it assumes it’s a regular string and treats it as a non-zero string. A more obvious way to ask the same question:

What is the value of the following:
(bool) “0.0000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000″

Or what about:

What is the value of the following: (bool) “0.”

The point is that as soon as you go beyond a single zero, PHP just assumes the rest is real data and will not discard it. Thus:

if(0.00) {
    echo "THIS NEVER EXECUTES";
}
if("0.00") {
    echo "THIS ALWAYS EVALUATES TO TRUE";
}

SO going back to the original problem, the way to solve is is by fixing the code to either explicitly type cast the variable or use a “greater than” check:

$results = mysql_query("SELECT * FROM orders");
while($row = mysql_fetch_assoc($results)) {
    if((float) $row['order_total'] > 0) {
        echo "Order total is clearly not zero!";
    }
    else {
        log_bad_order($row['id']);
    }
}

If you don’t do either of these things, I’d suggest you go and double check some of your code.

Representing heirarchical data in MySQL

I’ve always wondered if there was a better way to manage nested data structures (such as product categories) in MySQL. Today I stumbled across a solution called the Nested Set Model.

The only addition I made to the solution is rename what they call the “category_id” and call that a “sort_id”. Then I added a primary key called “id” to the table. This way, I have an immutable ID I can use in the application (such as for URL deep linking). For example:

CREATE TABLE nested_category (
  id INT AUTO_INCREMENT PRIMARY KEY,
  sort_id INT NOT NULL,
  name VARCHAR(20) NOT NULL,
  left_sort INT NOT NULL,
  right_sort INT NOT NULL
);

Incentives and the Related Dangers

Incentives are just as dangerous as they are powerful. I have the running theory that most incentives can actually do the exact opposite of the intended goal when executed wrong.

Let’s start with an example to illustrate. You’re in charge of a small company that picks up garbage after events like street fairs and parades. However, you just got an angry call from your customer (the city) that your company has been doing an increasingly poor job and they are threatening to cut your contract.

You can fire and hire people, but ultimately, you or some new manager will need to fix the culture of the team. Aside from the obvious choice of talking to your staff about goals and values, let’s assume that incentivizing performance ends up being the option you go with. It’s time to play with fire.

What are some obvious ways to incentivize good cleaning efforts? There are many, but I’ll focus on a really obvious one for this post: Tie bonuses to volume/weight of trash picked up.

Is this a bad incentive? Not necessarily. But if executed poorly, it can be disasterous.

Which is more important to pick up? The pile of 50 napkins or the four empty soda bottles? Under this solution, people would be incentivized to ignore napkins, ciggarettes, and plastic bags while encouraged to chase after bottles (bonus points for liquid content) and discarded food. In fact, once employees start realizing this, they might even start picking up rocks and dirt instead of actually cleaning — in effect making the situation worse.

The situation above is universal across all industries. In software, the oft cited “Dilbert” situation is when performance gets tied to lines of code written. The point is, any system introduced that attempts to incentivize a certain type of behavior can cause employees to focus on the wrong thing. If you tell your staff that closing bug tickets is tied to bonuses, your entire team will focus on that metric like a laser. This will be good at first until you realize that everybody is spam-fixing the “mispelled text” bug tickets and nobody is bothering with the REAL problems.

Is PHP here to stay?

As a LAMP developer, I am starting to question the long term viability of PHP. PHP was born during an era when knowing HTML was a valid and valuable resume bullet. Because of this, most of the “advanced” aspects of PHP — which relate to the OOP functionality — were introduced only after PHP 4/5, and weakly at that. Additionally, new languages have since become popularized that show the weakness of PHP. Don’t get me wrong, I am very supportive of PHP. I just believe that it’s important that people understand both the strengths and weaknesses of the tools they use.  There are two main points I want to cover:

  1. PHP thread support is weak
  2. PHP OOP = Broken

The second point is rather technical, but it closely relates to another strength and weakness of PHP: it is loosely typed. More on that later.

Thread Support is Weak

True threading support in PHP does not exist. The closest thing is the pcntl_fork method, which copies the current process, rather than create a thread. This means asynchronous processing within a single process is not supported. Threading is useful in event-driven architectures (common in JavaScript) or when doing blocking operations such as network calls.

Because the forked process is a clone of the original, it shares all of the original resources, including database and file resources. This means that the forked process must be self-aware of whether it is a child or not, and must be careful not to modify or close these resources. This encourages spaghetti code that contains large logic forks (“if I am not a clone, else…”). Because of this, forking is messy and error prone. This gets further complicated when PHP is executed by Apache in a web environment. In fact, the PHP manual advises avoiding forking with web servers:

Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment.

Not to mention the method is incredibly C-like in that it is very “raw” (unlike other native PHP methods/classes). This increases the barrier to entry significantly, which ultimately serves to have the feature ignored by most shops.

Why is all of this important? Well, at most companies, one language is selected for all in-house development. This is because cross training and hiring is simplified if everybody speaks the same language. There are a few common tasks that are unnecessarily difficult to do in PHP:

  • Asynchronous work — handing off work such as connecting to a remote server to a child and wait for a response
  • Manage thread pools — this sort of work requires significant “by-hand” management of any processes spawned by the parent via pcntl_fork

The threading issue is only a pain point that impacts processes that need to become parallelized. It is a pain most big shops live with, or, alternatively, introduce other languages to help solve.

PHP OOP = Broken

Because of the loosely typed nature of PHP, true, well-formed object oriented programming is broken. I know that for many PHP programmers, “Object Oriented” means putting together classes and reusing code as objects. However, that is truly, sincerely, only a portion of the point of OOP. Some of the most powerful aspects of OOP are lost in PHP’s implementation of the concept. Don’t get me wrong: these decisions were probably the right fit for the niche PHP was filling, but I don’t believe most PHP programmers are fully aware of what they are missing.

While the language, thankfully, has interfaces and abstract classes, they are woefully underused. This is, in part, due to to the developer community being largely self-taught. This creates a misconception about the nature of OOP, which ultimately leads to the devaluation of the most important feature of OOP: interfaces.

I can go into why they are so important in another article, but the point is: without interfaces, true polymorphic code is impossible. Or, rather, extremely susceptible to spaghetti code and fatal errors.

In other languages (Java), code might look like this:

interface Animal { void makeSound(); }
void farm(Animal cat, Animal dog, Animal parrot) {
  cat.makeSound();
  dog.makeSound();
  // Note: Parrot class *DOES* have a method called moveAround()
  parrot.moveAround(); // ERROR!

The interface in this example defines a uniform way to access a class through a standardized API (thus the name, application programming interface). In a strongly typed language where all variables must have a type, the cat variable is defined as an implementation of Animal. This enforces and allows the method call makeSound(). If cat has a meow() and dog has a woof() method, they can not be called here without a compiler error. This is because in this function call, the parrot variable is defined as being an instance of Animal (versus being a Dog, Cat, or Parrot). As such, only Animal methods work here.

More importantly, because the compiler does this type checking, any invalid calls, such as the last one, would error and never compile. Even if the Parrot class has a moveAround() method, it can not be called in the code above. This is an extremely important aspect of OOP since, as a definer of the Animal class, I want to make it very specific how Animals should be treated (you can only makeSound!). If a programmer tries to do something to an Animal that I haven’t defined, they get an error. If they wanted to make that last line work, they would need to use object typecasting:

void farm(Animal cat, Animal dog, Animal parrot) {
  ...
  ((Parrot) parrot).moveAround();

Or by changing the function definition:

void farm(Animal cat, Animal dog, Parrot parrot) {
  ...
  parrot.moveAround();

But note that in this case, the user had to make an explicit choice to stop using Animal’s interface. Yes, parrot is still an Animal, but it doesn’t have to be. This, in short, helps prevent spaghetti code because it forces the developers to think about whether or not they want to deviate from a particular interface. Realistically, if presented with these alternatives, a Java programmer would probably use other types of abstraction techniques (e.g., dependency injection)  to keep this method from needing to be used. However, this example was necessary to illustrate how things are done in PHP.

So how would this look in PHP? Why isn’t this the same there? Well, take a look at the following code that, unlike the Java example, works perfectly fine and raises no red flags.

interface Animal { function makeSound(); }
function farm(Animal $cat, Animal $dog, Animal $parrot) {
  $cat.makeSound();
  $dog.makeSound();
  $parrot.moveAround(); // WORKS FINE 
}

This code works great. We have three arguments all forced to use the Animal interface. Great. As a casual observer, there is really, truly, nothing wrong with this code. It’s a little strange, but if it’s commonly known that Birds can moveAround(), there is no problem. In fact, in most PHP shops, I will bet money that type hinting is NOT used. This will further illustrate how bad the spaghetti is about to get (read on).

Now imagine in six months if we decide we wanted to group up this code so that it uses a single array/collection as an argument. This is where things would look like traditional polymorphic code. I mentioned spaghetti above. Let me show you why:

interface Animal { function makeSound(); }
function farm(array $animals) { // note, we can't guarantee what's inside of this array
  foreach($animals as $animal) {
    if($animal instanceof Parrot) { // or maybe a method_exists() call?
      $animal.moveAround(); // SPAGHETTI
    }
    else {
      $animal.makeSound(); // Hope for no fatal errors!
    }
  }
}

Wow, look at what we just did. A harmless piece of code in PHP six months ago completely breaks when you try to refactor it to use a fairly typical design pattern. More importantly, unless I put in even MORE code to do type checking, there’s a chance that the makeSound() line will actually die in a fatal error if, for example, a string is passed in as an element of the argument array! See example without Parrots:

interface Animal { function makeSound(); }
function farm(Array $animals) { // note, we can't guarantee what's inside of this array
  foreach($animals as $animal) {
    $animal.makeSound(); // Hope for no fatal errors!?
  }
}

PHP is extremely flexible when it comes to hacking out a page, but when it comes to OOP, it’s about as brittle as you get. Refactoring is painful and error prone, and elegant design patterns like the ones you might see in a message-passing language such as Objective-C, Scala, or Erlang don’t work. Remember that by using functions such as method_exists() and is_object(), I can emulate the desired behavior; however, the extra code means more places for bugs and less time spent making the program do what you want it to do. The point is that the OOP constructs in PHP don’t fully work. As a result, certain very important aspects of OOP don’t translate very well to PHP.

Some people may still cling on to the notion that “ultimately, you can still do it, it just requires more code!” But I argue that preventing “more code” is the exact reason why OOP was invented. By writing more boiler plate error checking code, we are wasting time. The issue is exacerbated by the fact that the error checking code isn’t required, unlike say, if you were throwing exceptions. It isn’t immediately obvious in that last example that you need to do error checking for is_object() on the $animal variable. It’s these types of oversights that really damage PHP as the code base gets larger.

Conclusion

What I’m realizing is that PHP isn’t meant to scale. Yes, it can take a lot of web traffic, but that’s not what I mean. I’m talking about scaling in the sense of growing team size and code base. The design of the language promotes coding paradigms that ultimately damage the code base. This is because PHP makes it harder use good OOP practices on legacy code. To illustrate:

  • PHP became popular because it is easy to hack things out, even if that something required doing it the “wrong” way. These problems come back and bite you when the code base grows.
  • PHP can’t support a large development team as effectively because its weak typing allows for sidestepping certain core OOP principles (see above)
  • PHP  allows for invisible future-bugs (see above) to be inserted without any immediate cause for alarm
  • As applications get complex and require threading or distributing of processes, PHP fails to keep up (so other languages get used)
  • Because PHP does not use dynamic dispatching (message passing), calling a method can cause runtime FATAL ERRORS (unacceptable and very hard to debug!)

All of this makes me rethink the popularity of PHP. There are some new languages, still in their infancy, that pose a threat to PHP’s current dominance. I believe that in the next few years, as today’s systems become “legacy,” today’s newcomers will finally be production ready. At that point, we might see companies adopt the newer languages, which will support more modern programming paradigms. We are seeing this today with Ruby, for example.

Of course, I could be wrong. I once told people that PHP was “C of the web.” It’s possible it’s here to stay forever, despite all of its flaws. And, for the record: I do not believe Python or Ruby will be the language that will overtake PHP, but that’s for another post.

I just want everybody to know that I am a PHP developer, so I speak from experience. We should recognize that technology changes and evolves, and it is important that we constantly update our skill to ensure they don’t become obsolete. I’m just pointing out that perhaps PHP isn’t as timeless as C (or, possibly, Java).

Lastly, I will plug my personal belief that being “religious” about a language because it is “the best” is short sighted. New languages are born, literally, every week. It’s only a matter of time before a language comes along that does what your language does more elegantly, faster, and with less code.

Only time will tell. :)

The Destruction of the Head Hunting Industry

This is a random thought that just popped in my head.

With information becoming increasingly available, I’ve been thinking that the head hunting business will go through a major destructive phase in the next few years. There’s two things the Internet changed:

  • Better distribution of information on job openings
  • Better distribution of information on candidates

Definition: For those of you who are unaware, head hunters are professionals that search for employees and pair them up with open positions in companies. In a typical scenario, a company will pay a recruiter (head hunter) a fee that equates to 2-3 months of that employee’s yearly salary. Companies pay this because recruiting employees is expensive. I’ve done a lot of hiring in the last few years, and I know how time consuming it is to review hundreds of resumes and then interview. A head hunter is basically an outsourced HR department. Additionally, candidates often approach head hunters who re-post job openings in various job boards.

And there’s a third trend that will come based on increasing information available to the public:

  • Automation of job and candidate pairing

A long time ago, I was business partners with a man who was formerly a head hunter. I remember him telling me how wonderful the internet made his job. He told me that when he was my age, recruiting meant shaking a lot of hands, memorizing every face and name you ever met, and storing large piles of business cards. For him, recruiting was now about posting jobs on Craigslist and Monster and referring the candidates. To him, he was still the gatekeeper. These days, anybody can be a headhunter with a little Internet know how.

head hunter productivity chart
head hunter productivity goes up first, then down (we are in the middle stage now)

However, sites like LinkedIn can change all that. The one true value proposition that head hunters provide is that they serve as match maker. But as more information is available and technology improves, this process should become more and more automated. For example, right now, LinkedIn has job postings. On its own, it’s just a new competitor to Craigslist, but what makes things interesting is that LinkedIn also has the data points to find all of the candidates out there that might fit the job requirements — without anybody lifting a finger.

Right now, the information stream is mono-directional: job postings (and recruiters) broadcast information. The goal is a bi-directional system where seekers fill out their requirements (a.k.a. their resumes) and both sides let the system do the matching. This can only work if both sides have maximum information about the other. Think of it like dating site for job seekers. It’s a hard problem to solve given the time-sensitive nature of job searches, but it’s an inevitable outcome as more and more information centralizes onto the Internet.

5AM thought of the day.