A Great Web Developer is a Great Application Developer

After being a part of the developer hiring process for a while now, I have begun to see what distinguishes an exceptional web developer (the ones that get hired faster than we can even make an offer) and everybody else. Things have changed since 1995 – web development requires actual knowledge of programming!

The problem is that people think web development is somehow easier than other types of development. Nothing is inherently easier about developing an application in PHP than in C++. While I could accuse people who say this as being naive, the truth is that web developers themselves seem to hold this as the truth. In order to excel at being a web developer, one must understand how to develop an application. In other words, there are very different skills required in programming something like Facebook or Digg than for updating Mom’s store website.

First, you must actually understand object oriented programming (OOP). Everybody these days says they “know” OOP. I don’t mean “you read about it in a book” or that you can use a class when necessary; I mean being able to write extensible, modular libraries. This takes time and practice. When people first begin writing classes, they mistake the process for “grouping up functions under one class name in one file.” This is dead wrong and actually counter productive in some cases.

Classes are about automation and simplification. The classic analogy for OOP is a car because it has many parts that make it a whole. But just like the driver has no idea what a crank shaft, piston, or fuel injector does, people using your class should have no idea (nor need to learn why) about the inner workings of your class. All they need to know is that calling one of the few public methods (turn_ignition(), step_on_pedal(), steer()) does a whole lot of cool stuff behind the scenes that makes their life easier. And just like how you wouldn’t want the entire car engine welded together (instead of using bolts and screws), you shouldn’t write humongous class methods that do 1000 things — break it up into digestible parts so that other people can enhance just the pieces they want (see “overloading” below).

When you design a class, you start by listing out things that the class will be used for. Then, you figure out the lowest common denominators between these actions to remove “special, one time cases.” You now have a list of your primary public functions. Everything else in your class should pretty much remain protected or private (discretion is learned with experience). The entire point is that classes are not a group of similar functions, but rather a single conceptual unit.

And when you’re all done with understanding why object oriented programming exists, you must then learn to use Inheritance, and Overloading (some languages) and Interfaces.

Second, you must understand and demonstrate competency with design patterns. At the very least, you should be aware of the Singleton and Factory patterns, which are very commonly used, and for (arguably) good reasons. Knowledge of basic design patterns goes a long way because a time will come when you may need to borrow from one of these concepts when developing your web application. It also means you have a wider exposure to the different types of architectures available to someone writing a library or application.

Last, you must understand scaling issues in growing web applications and how to resolve them. This is probably the rarest trait, but if you have it, you will shine like a rock star. It is important to realize that certain operations are much more costly than others. In today’s world, the CPU is rarely the bottleneck, but there are still bottlenecks. A common instance of this problem is when Word Press blogs go offline because it is slammed by a spike in traffic. That said, the most common type of bottleneck involves the database and often manifests in one of three ways:

  • Each query sent to the database involves some small amount of lag between the web server and the database server. Hundreds of queries on a page will slow down your page loads. On high traffic pages, one less query means thousands of queries a second. Make sure you learn how to do JOIN statements.
  • Each query must use an index in the database. Without this index, a query looking at a large table will bottleneck the hard drive (because it has to search large portions of the disk). Learn how to use the MySQL command: explain.
  • Database tables using MyISAM (instead of InnoDB) risk facing table locking issues. When you run a big or complex update on MyISAM, no other updates can occur on that table until the first one is done. If that table is very large, this may mean someone’s page sits there and loads for a long time (or times out). Typically, one should use a transactional database such as InnoDB.

Other types of common scaling issues are:

  • Session management on the database instead of letting PHP handle it. This is an issue because any large web application has more than one web server which means when a person hits a new page they might be loading from a different web server each time.
  • Managing files generated on the server – such as when a user uploads a file. The issue here is that if a file is created on one server (in a cluster of 100), how do the other servers know it exists? Obviously the solution is to somehow centralize this with a synchronization process or store all of the files somewhere else.
  • How queries can start acting different (much slower) when your tables reach 1M+ rows without proper precautions. Larger tables change things because indexes sometimes stop getting used or slowness that wasn’t apparent before is now very noticeable. (A book could be written here…)

Merely reading and understanding these issues has put you ahead of a bunch of people out there. 🙂

In conclusion, you must re-think the “web developer” as an application developer that happens to work in a browser. I often hear the excuse that these types of advanced concepts can’t be readily applied in small-time applications (such as a simple store-front), but I believe this is simply not true. Virtually every site requires a database connection or a template system (for easy inclusion of header/footer files). Database abstraction and template systems are probably the single best place to use OOP concepts. It simply makes no sense that one would use the raw mysql_query() function in any website, even if it’s just Mom’s baby pictures. Putting these ideas in classes reduces security vulnerabilities and makes development much faster. If the notion of abstracting templates or databases is totally foreign to you, I suggest you start by looking at Smarty or EzSQL.

Happy studying. 🙂

PHP/MySQL: The Escape Method Done Right

The issue is that PHP has some built in methods for escaping data. No, addslashes() is insufficient to protect you from SQL injection attacks (read: these get you fired). Here’s the solution for an escape function that does everything you could hope for. The @ symbols suppress PHP warnings so that I can use them to my advantage (newbies, please don’t try it at home). This goes inside a Database class.

/**
 * Escapes the passed value so it is ready to be inserted into the database. Takes magic quotes into
 * consideration as well.
 *
 * @param    string    parameter
 * @return    string    escaped parameter
 */
public function escape($value) {
    /*
     * stripslashes only if necessary
     */
    if (get_magic_quotes_gpc()) {
        $value = stripslashes($value);
    }
    /*
     * if this fails ($newValue is false), we know we need to fall back on the PHP4 way
     */
    $newValue = @mysql_real_escape_string($value);
    /*
     * if no connection handler can be found use this instead
     */
    if(FALSE === $newValue) {
        $newValue = @mysql_escape_string($value);
    }
    return $newValue;
}

Feel free to post suggestions.

Five Tips for (total) PHP Beginners

Someone asked me to write this a long time ago, so here’s my list. I make the basic assumption that you at least know some SQL.

1. Use EzSQL.

It is an open source library that does database abstraction. “Database Abstraction” is a fancy term for “making the database more developer-friendly.” It well help you grasp concepts like rows vs columns, how to manage multiple results, and escaping data. See their examples. While it is (in my opinion) a lacking solution for experts, it is awesome for newbies. I used this way back when I first started. Use it if you’re new to PHP.

2. DON’T Rely on Magic Quotes.

Magic Quotes is a retarded feature that tries to sanitize your data in an automated fashion. “Sanitize” is a programming term for “making data safe.” As in, without sanitizing data, hackers can do mean things to your database.

Some may argue turning this off is *unsafe* for beginners, but it also trains beginners to be less cautious about sanitizing data. This is a problem since the majority of corporate PHP servers (as in, the real world) have Magic Quotes off.  For example, let’s assume $name is equal to michi. Some programmers might try:

DELETE FROM users WHERE name='$name'; 
— thus: DELETE FROM users WHERE name='michi'

But what if a hacker managed to make $name equal to michi’ OR TRUE. What does the query end up looking like? If Magic Quotes is relied on, you would have been protected:

DELETE FROM users WHERE name='michi\' OR TRUE'; 
-- result: no such user; no harm, no foul

So it tries to find a user called michi’ OR TRUE, which is clearly not going to be found. In this case, Magic Quotes looks like a good thing. However, if you got used to Magic Quotes and then got a job, you might forget that it isn’t enabled in the Real World, and this is what happens:

DELETE FROM users WHERE name='michi' OR TRUE; 
-- (hint: it deletes all your users!)

Fired!

Just pretend like Magic Quotes is always off and use EzSQL’s escape() method or my solution (both solutions work regardless if magic quotes is on or off).

3. Always use require_once.

You will eventually see when include, require, or include_once is the better choice over require_once, but that time will come much later. As a rule of thumb for beginners, the last thing you probably expect is a silent “oops we couldn’t find the file so we’ll just continue on as if nothing happened…” bug.

4. Don’t display (echo) errors when a function has an error. 

Instead, store the errors in a class/global variable or return the error (neither of these are good practices for novice developers who should be throwing exceptions). The reason is that you never know in what context a function gets called; displaying errors could be disastrous in some situations. For example, if you were in the middle of writing a CSV file (comma delimited data file), you don’t want random unformatted error text appearing inside it saying “Error, invalid name!”. As in (notice the extra comma that got inserted in):

Charlie, Abigail, 31, California
Bob, Smith, 29, New York
Patrick, O'NeilError, invalid name!, 22, Washington
#check it out, a random extra comma got tossed in

Even if the script terminated at that point (which I assumed), the file may have been written to. Frankly, as a beginner, you or I can’t expect you to remember to *always* check if there was an error so it’s better to be defensive and just not randomly display stuff (which is why throwing exceptions is the ideal solution).

For those of you wondering why an echo statement might modify a file, note that for things like downloads in a browser, you will output the file in the browser and then change the header file-type to reflect what the user should do (i.e., download it). Thus, in this context, yes, you would echo out the entire file whether its XML, a Word document, a zip, a PDF, or HTML and then the user would be prompted to save it.

5. Use ob_start().

Put ob_start() at the beginning of your script before everything else.

<?php
ob_start();
// the rest of your script
?>

This will save you lots of time when you get advanced enough to do redirection or understand what it does. The syntax for browser redirection is:

// this is a side note on redirection; don’t put this in your script 😛
header("location: another_file.php");
die;
// always call die directly after a redirect!

Redirection means that the page should stop (if the redirection was successful) and the user never sees any output from that page. They are then immediately redirected to another page. Since they aren’t supposed to see any output, showing them output before a redirection ruins the redirect. This is often very confusing and difficult to fix for a beginner. ob_start() fixes that issue (it’s magical like that).

Note that if ob_start() is used, you don’t need to do anything else special. There is no need to use the other functions like ob_flush(), ob_end_clean(), etc. You’ll get to those later (probably a few months) once you fully understand what ob_start() does.

Max File Size Causes *Silent* PHP Errors

Today, at work, I hit the strangest PHP error. I thought I’d share. 

I wrote a logging script that tracks whenever something goes wrong. First, it tries to write everything to a raw file and then to the database. The thinking was that if something bad happened to the database, at least I would have a log entry in a regular file. Unfortunately, I didn’t take into consideration what happens when something bad happens to the file, which turned out to be far worse.

We noticed something was wrong when no errors were being logged, and yet the script was clearly failing to do its job. There were no PHP errors, no database records logged, and nothing changed in the log file. After careful scrutiny, I confirmed there was indeed an error happening, and it had to be logging. So I looked at the log file, and then I noticed this:

2147483647 ClientException.1.log

That huge number is the number of bytes in that file. 2,147,483,647 bytes = 2 GIGABYTES. The file was too big for PHP to want to open. I ran the PHP script by hand and it simply said the following message where I called fopen():

File size limit exceeded

When this error was encountered, PHP died with no warnings, errors, or notices. There was nothing in the PHP logs to show something went wrong — that was some kind of operating system level error message. The code — no, PHP — simply halted! No destructors, no cleanup — nothing. Even the output buffer was destroyed, which means if there was ob_start() anywhere in the code above, all of the previous output (echo) was lost.

Scary.

So next time your script is dying without an explanation, make sure you check how big the logs are.

Is Your PHP Web-form Hacker Proof?

Several months ago, my friend informed me that he was seeing a large volume of email spam coming from one of my legacy sites. After investigating, we found that my “contact” page was the source. A hacker was spamming people through the contact form on my website! The page in question was a simple PHP script where people could provide an email address and a short message that would be emailed to me. So here’s the question of the day: How did they make it email other people with their own custom email messages?

Most people may not be aware of this, but the culprit is the PHP mail() function. The relevant portion of the definition is as follows:

bool mail ( string $to, string $subject, string $message [, string $additional_headers] )

So most people would probably do something like this:

mail($to, $subject, $message);

But by default, this will create an email address with some ugly “from” address like, root@localhost. What if you want to have a “from” email address, such as the person who filled out the form?

mail($myEmailAddress, $subject, $message, “From: $fromAddress”);

But don’t you need to escape it? Well, a lot of beginners (like I did when I started out), don’t realize that “addslashes” does nothing. A hacker could provide additional headers in your script by typing in the following in your web-form:

me@fakeemail.com \r\n

Followed directly by another header:

Bcc : otherpeople@hotmail.com [ignore spaces]

The above code would send an email to otherpeople@hotmail.com through the BCC field. That alone doesn’t sound too threatening. However, it is actually possible to inject in a message body, attachments, other recipients, a new subject, etc. at this point. A hacker can completely re-format your original email. In essence, they can post data to your form to make it do something completely different than what you had intended.

And judging from the fact that I wasn’t getting emails, I would assume that they can suppress the original “to” email address as well, making the breach completely unnoticeable.

To fix this, make sure you filter arguments that get passed in as headers by removing the \r and \n characters.

$fromAddress = str_replace(array(“\r”, “\n”), ”, $formAddress);

As always, remember my code examples tend to use curly quotes that PHP won’t recognize.

So, is your form hacker-proof?

Further reading on this topic can be found here.

Note: I had a horrible time trying to make this post. Apparently, WordPress doesn’t like the word “Bcc[colon]” appearing anywhere in the post! How lame!?

MySQL 5 and Condition

The word “condition” is a reserved name in MySQL 5, apparently. It was not in MySQL 4. Thus, if you have a query where you do something like this:

SELECT the_field_name AS condition FROM the_table

That causes problems.

PHP Nuisance: How to Stream PDF Files

If you’ve ever tried to stream a PDF file in PHP, you know how incredibly annoying it is to get it working across all browsers. In fact, the maintainer of HtmlToPdf didn’t have an official response on the topic of streaming PDF data to the browser. It only worked most of the time. When it fails, the page loads the data, but nothing actually shows up and your PDF reader will not fire.

The very short explanation is that IE ignores certain content type headers and tries to guess what a file is based on its content. This is a problem in some situations when IE guesses wrong. God, IE is so dumb because it tries to be too smart. Don’t you hate people like that?

Anyway, to get it to work, you have to hack at it. There are added complications to this when you are using HTTPS instead of HTTP.

After we spent a few hours today at work trying to solve this problem, I remembered I had done this before. I dug up some old code and found the solution:

$filename = ‘yourfilename.pdf’;
header(“Pragma: public”);
header(“Expires: 0”);
header(“Cache-Control: must-revalidate, post-check=0, pre-check=0”);
header(“Cache-Control: public”);
header(“Content-Description: File Transfer”);
header(“Content-Type: application/pdf”);
header(“Content-Disposition: attachment; filename=$filename”);
header(“Content-Transfer-Encoding: binary”);
// echo the pdf raw binary data here

As far as I know, this works in every browser. All you Googlers: Enjoy! 😉

Leave a comment if this worked! 🙂

Variable Assignment, Exception Handling, and Destructors in PHP

I had the craziest set of “bugs” the other day — well, that was until I realized they weren’t bugs. It turns out, at least in PHP, if you try to assign an object to a variable, but during the assignment, an exception is thrown, the variable keeps its original value. Oh, and the destructor fires immediately before anything else. I am not sure if this is true in other languages, but if it is, please let me know. 🙂

Not sure what the hell I’m talking about, or want to learn more about exception handling? If I get any requests in my comments, I’ll follow up on a walk-through about the topic. Same goes for destructors. Speak up or forever hold your peace.

Exceptions and Variable Assignment

So back to the problem at hand:

$connection = ‘hello’;
try {
    $connection = new Database($parameters);
    $connection->doStuff();
} catch (Exception $exception) {
    echo $exception->getMessage();
} // continue as if nothing happened
var_dump($connection);

So what’s the result when I dump $connection?

  1. If no exception is thrown anywhere, the value is the Database object.
  2. If an exception is thrown during doStuff(), the value is the Database object (as expected).
  3. If an exception is thrown in the constructor of Database, the value of $connection is “hello”.

This makes sense. During the assignment, the code is short circuited, and the object is never assigned.

In other words:

  1. new Database() is processed.
  2. During processing (the constructor), throws an error.
  3. The code exits out of the constructor and looks for a catch statement.
  4. The first catch statement found is below the assignment line.
  5. Thus not only does doStuff() never get executed, but the Database object is never assigned.
  6. The $connection variable is never modified.

The lesson I take away from this is to not throw exceptions in constructors. This is largely whey I only do property assignments during constructors and any potentially error causing stuff is tossed in an initialize() function I call later during the first operation that might require more than simple assignments. But that’s just me.

Destructors on Exceptions During Constructors

For those of you still reading, and understanding what I’m talking about, I have another point to add: destructors fire before the exception is thrown out! As in, as soon as the program sees the throw keyword and we leave the constructor, the destructor fires! I imagine this is because since the variable was never assigned, it’s just a dangling variable in memory, and thus gets cleaned up immediately.

This is probably nothing unexpected, but surely something to keep in mind if you use PHP destructors (as I was). This is relevant because I was trying to log some stuff during the destructor, even on an error. Exceptions cause text to get dumped, so I figured I would just collect the text in the destructor (from the output buffer) and then toss it in a file. Unfortunately, that didn’t work because the destructor fired way before the exception had a chance to display anything. My (broken) code looked something like this:

ob_start();
$framework = NULL;
try {
    $framework = new Framework();
    $framework->initialize();
} catch (Exception $exception) {
    echo $exception->getMessage();
}
unset($framework); // make the destructor go off

During the destructor, anything written to the output buffer (error or no error) was tossed in a file. This worked unless an exception was tossed in the Framework constructor.

Output Buffers and Destructors (tricky!)

Lastly, as a golden bonus to those still reading, the reason I manually unset the variable is because through trial and error, I discovered that at the end of a program’s execution, the output buffer is flushed (and cleaned) before variables are erased. This meant that by the time my destructor fired (when $framework got erased), the output buffer was already empty and crap was already on the client’s screen. Thus, I had to unset it manually. 🙂

I hope this helped somebody.

Fatal error: Call to undefined function mysql_connect()

Hah. It seems the crazy errors are plaguing me today. The latest one:

Fatal error: Call to undefined function mysql_connect() …

Sweet…

We moved to a new data center, and this error seems to be related to using PHP on Fedora or Redhat. It seems their default installation doesn’t have MySQL extensions support.

First, make sure there is no section for MySQL when you execute the following PHP:

phpinfo();

If MySQL isn’t there, it’s time to get your admin in the loop.

Fatal error: Call to undefined function preg_match()

Today’s crazy error of the day is:

Fatal error: Call to undefined function preg_match() …

Wahoo! A built in function being undefined!

So how’s this possible? It seems that if you reinstall PHP without compiling in PCRE (Perl Compatible Regular Expressions), you can get this crazy error. It’s a flag you may need to manually enable.