Archive for the ‘geeky’ Category.

Neat Idea: Creating Alphanumeric IDs

UPDATE: For those of you looking for a great way to generate highly unique ID that is shorter than what you might get using a hex number, try this (it will generate a ~17 character ID):

list($hex, $dec) = explode(‘.’, uniqid(null, true));
$id = (base_convert($hex, 16, 36) . base_convert($dec, 10, 36));

Ever needed to create an ID has that looked something like f39a2xm91? You might not have, but some day you’ll want to. The easy way out is to use the native md5() function, but that creates a long 32 character hash which may be a total waste of (database) space. These types of IDs are often used to mask integer IDs so that your users can’t just type in user_id=10000, user_id=10002, user_id=10003, and so forth to look at your records. Some might even call it security through obscurity. Well, let’s be clear: this sort of activity does not add security, but it does make for making “browsing” behavior more difficult.

Either way, if you desire to move away from the classic integer format IDs, I have a different solution for you:

base_convert($someId, 10, 36);

This will convert the number 10,001 into 7pt, 10,002 into 7pu, and 100,000,000 into 1njchs. As you can see, you can store a heck of a lot of numeric information in a very tiny amount of (character) space. I am not saying this will save you database storage space, but it will make your URLs shorter.

One of the main benefits is this is that you can store more data in a smaller human-readable space, thereby allowing you to create smaller unique IDs. So for example, in our logging system at work, I use this method to generate issue IDs that end-users can send to us when they have a problem. This issue ID is based on the last eight digits of current time with microseconds concatenated with (using “.”) a random seven digit number in front. I then base-36 encode this resulting number (stripping out the decimal point).

Note: my solution is specific to the problem I was facing. It’s not necessarily a full proof way to generate unique values, but it’s what you would call “good enough”. Do not use the solution unless your solution does not hinge on absolute unique values.

A warning about base_convert is that large numbers breaks down in PHP, so be careful (we’re talking very large numbers). This means pasting together the current timestamp, the user ID, the session ID, and a fourteen digit number into one 50 character long number will probably result in some precision errors (not a huge deal for most implementations, but be warned). From the PHP manual:

This is related to the fact that it is impossible to exactly express some fractions in decimal notation with a finite number of digits. For instance, 1/3 in decimal form becomes 0.3333333. . ..

 

So never trust floating number results to the last digit and never compare floating point numbers for equality.

The “base” refers to the numbering system used to convert the number. In a base 11 system, the counting goes to 10, then the letter A, and then loops around back to 1. So in other words, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, A, 1, 2… In base a base 16 system, you would go all the way to F before going back. So the larger the base, the more “compressed” a number can become.

I used base a 36 scheme (0 – 9, A – Z), but you can use smaller bases to come up with longer conversions. For example, a base 21 conversion (0 – 9, A – K) will convert 10,001 into 11e5, 10,002 into 11e6 and 100,000,000 into 13a3k7g. So in short, if you have a database where your record IDs start at above 7 or 8 digits, maybe you can think about base encoding them into shorter IDs.

Just a neat idea I wanted to share.

Debugging Tips for Database Abstraction

Today I want to talk about database script debugging in large systems. The main problem is that in large applications, it becomes difficult to find the source of rogue queries that, for example, broke in a recent system update.This may not readily apply to most of you, but bear with me: some day it will.

Pretend for a moment you have a database architecture where you have 2 masters (dual replication) and 2 read-only slaves. Now pretend that you have a large application with 100 different pages/scripts. You have 5 web servers with mirror copies of the application. This would be a fairly typical setup for a small, but growing company.

One day, you come into work and find out that you had a bad transaction lock that caused your system to hang all weekend. So you look at the process list and you know what query is causing the problem (because it’s still stuck). The problem is that it looks suspiciously like the queries you’d find on virtually every page in your application. How do you fix this problem? An different (but related) problem is when an update initially executed on one master database server replicated to a slave and got stuck on the slave but executed fine elsewhere. What happened? Which master server got the initial query? This sort of debugging is very difficult to track down without more information such as where the query was initially sent and from what page it originated.

The primary challenge is figuring out which query came from what page in your application. The solution is to add logging straight into your queries. The implemented looks something like this:

//Get the current page or script file
$source = $_SERVER['REQUEST_URI'] ? $_SERVER['REQUEST_URI'] : $_SERVER['SCRIPT_FILENAME'];
//Replace out any comment tags and add in the database being connected to
$metaData = str_replace(array('/*', '*/'), array('/ *', '* /'), $source) . " ($databaseHost)");
//Escape the query so the URI can't be used to inject data
$metaData = mysql_real_escape_string($metaData);
//Execute the query
$result = mysql_query("/* $metaData */ " . $query, $connection);

This solution inserts a comment into your query that gives you useful information that can be seen when looking at the raw query. MySQL uses C++ style comment blocks (the /* */) which are ignored by the parsing engine. This means you can pass data to the engine which can be useful for debugging. These comments are also replicated down to the slaves, which can be useful when you find a slave having problems with a query that came from a master server. For those of you unaware, the “URI” refers to the full URL that was typed in the address bar to access a page.

But make sure that you correctly sanitize the URI so that somebody can’t arbitrarily end your comment block (with a */) and inject their own nonsense into your query. Also, considering issues like multi-byte character attacks, I don’t even want to take the risk of not further escaping the data with a call to mysql_real_escape_string.

The solution we use at my work logs the web server IP, database server IP, and script path/URI. Other potential ideas are local timestamps, version information, user IDs, and session IDs.

In conclusion, this solution will help you identify the source (and sometimes the destination) of queries that are causing problems. This has been used in our production environment at work often when trying to determine what pages are producing extremely slow queries. This solution should work with any database, although my example is written for MySQL.

Happy debugging!

PHP Best Practice: Don’t use INC extensions

I have been bad about updating, and this goes back to an old habit that probably has to do with human nature: as time between updates increases, there’s a desire to write a “big” update, which is increasingly difficult as news-worthy events happen and are ignored. There’s so many things for me to update about that I could touch on, such as the iPhone SDK update, news on IE8 passing the ACID2 test, my predictions from a year ago that were spot on (until about a week ago when all stocks tanked), and Sun buying MySQL. But I wont. Perhaps next time. So this post is small, but serves as a feeler post to help me get back into the routine. The truth is that I have several programming post drafts setting on my machine that could have been posted a long time ago if I had given them a final read-over. Those things take a lot longer than they look from the casual observer.

Today’s post is a best practices post. The tip is simple: When creating a naming convention, never rely on the .inc extension. The .inc is used in some shops to denote files that serve as libraries. This is a terrible practice for a number of reasons.

First, it means deploying your library ANYWHERE requires adding the extension to your server’s configurations so that it knows these files are for PHP executables. This isn’t a deal breaker in most cases, but beware that if you use shared hosting environments, this sort of thing can be annoying and stall development.

The second far more practical reason is for security. When these library files are moved to a new server which has yet to be configured, they are wide open for public viewing. Because the server doesn’t know they are PHP files, they are served up as text files, essentially exposing your code base for the world to see. I’ve seen this issue pop up in production environments where a new web server was brought online without being fully configured, causing pages to become exposed. This is the sort of business that helps cause source code leaks (remember the Facebook code leak late last year?).

Of course, this points to the greater issue that library files shouldn’t be web accessible, but I have also seen this paradigm used in common CMS applications where you have a .php file include a .inc file that contains the bulk of the page logic. Here, again, you would be exposing highly sensitive application logic to the world.

If you really want to denote files differently, I prefer to use file prefixes. As in, classes might get a prefix like “class.[rest-of-filename]“. Or perhaps “function.[rest-of-filename]“. There’s even “include.[rest-of-filename]“. The point is, a prefix can’t kill you because the files retain the .php extension. :) Happy coding!

BUG: Constructors, Interfaces, and Abstracts Don’t Mix Well

I just discovered a bug today in PHP 5.1 (haven’t confirmed if it was fixed in newer versions). When trying to enforce interface arguments on constructors, PHP behaves unexpectedly. Normally, interfaces allow you to enforce argument counts or types in child class methods, but not with the constructor (and probably destructor).

Crash course on interfaces: An interface lets you as a developer dictate a standard for a class. For example, you might write an interface class for interacting with your class. Then other people who want to interact with your class would “implement” your interface class. This would force their classes to have a certain set of methods, of which you dictate their names and argument counts (and types). This way, your class is always guaranteed these implementer classes have certain key methods. In the real life example, it’s like saying an interface for a Car would have methods like brake($amount), gas($amount), steer($direction), etc, and the User class would be able to have a guaranteed way of interacting with the Car object (i.e., $user->getCar(‘Ferrari’)->steer(‘left’)). Abstract methods exist in abstract classes and are essentially the same thing. Read more about these here and here.

First, here is an example of a typical interface:

class ExampleClass {}

interface TestInterface {
	public function output(ExampleClass $var);
}

class Test implements TestInterface {
	// error, no output() method was defined
}

The following fails too:

class ExampleClass {}

interface TestInterface {
	public function output(ExampleClass $var);
}

class Test implements TestInterface {
	public function output($var) {} // error, wrong argument type
}

Here is the same example but with the __construct method instead:

class ExampleClass {}

interface TestInterface {
	public function __construct(ExampleClass $var);
}

class Test implements TestInterface {
	// error, no __construct() method was defined
}

Up to here, it works as expected. However, if you define the constructor, the __construct method argument datatype/count checks go out the window:

class ExampleClass {}

interface TestInterface {
	public function __construct(ExampleClass $var);
}

class Test implements TestInterface {
	public function __construct() {} // NO ERROR
}

Despite the data types and argument count being off, PHP doesn’t care. Even if I define an argument in the constructor, the datatype check is ignored. So the best you can do is force a __construct() definition to be required, but you can’t dictate its arguments (i.e., interfaces for constructor methods are useless). And finally, for those of you really astute readers:

class ExampleClass {}

abstract class AbstractTest {
	abstract public function __construct(ExampleClass $var);
}

class Test extends AbstractTest {
	public function __construct() {} // NO ERROR
}

This problem produces the SAME results if instead of an interface, abstract methods in an abstract parent class are used.

Improving Your JavaScript Load Time

On our production website at work, I noticed that there was considerable lag time when loading the page due to a high number of JavaScript files. For those of you who don’t know, when a JavaScript file is loaded into a page, the rest of the page will hang until that file is completely downloaded. So unlike an image on a page, a slow JavaScript file can completely bog down your page. This is similar to an issue I noticed many months ago with FeedBurner. Each JavaScript file that is pulled requires the full overhead of firing up Apache and serving an HTTP request. This can be slower for you and painful for the web server if you have a high traffic website.

Additionally, once the JavaScript files load, if the code is full of asynchronous snippets (AJAX, event handlers, etc), the pieces can load in the wrong order! This has caused me headaches when unexplainable and random JavaScript errors began popping up (undefined variables and functions that are clearly defined in another file that should have loaded prior). This issue became increasingly common as the number of files being loaded increased. While I admit I don’t understand browser physiology enough to explain why this problem is more common with more files, I concluded there is some correlation that likely is attributed to the rendering order.

So after some thought, I came up with a solution. The goal was simple: decrease the number of web calls and try to make the JavaScript code render in 100% reliable and linear matter. Additionally, the hack would need to be easy to implement and take issues such as caching into account. The solution is a PHP file that looks like this:

/*
 * This file compiles a collection of JS files and then dumps them collectively
 * to the page, thereby reducing overall request overhead
 * Copyright 2007 Michi Kono (www.michikono.com)
 * Feel free to modify this however you want.
 */
header("content-type: application/x-javascript");
foreach(explode(",", $_GET["files"]) as $filename) {
   /*
    * prevent malicious attacks, only allow JS files
    */
   $filename = basename(trim($filename), ".js") . ".js";
   if($filename && file_exists($filename)) {
       $handle = fopen($filename, "r");
       fpassthru($handle);
       /*
        * in case there is no trailing ;
        */
       echo ";";
   }
}

Put this file in your JavaScript folder next to the rest of your JavaScript files. I called mine render.php.

Then, you put this in your HTML:

<script type="text/javascript" src="/javascript/render.php?files=firstfile.js, secondfile.js, thirdfile.js, etc.js"></script>

Ta-da! Faster JavaScript loading for everybody. :) Oh, and the issue of caching? Just put a timestamp on the end of the JavaScript URL string (like, "&<?php echo substr(time(), 0, -2) ?>" — cache changes every 100 seconds)

Other useful ideas: JS code could be compressed during this step, with comments and extra spaces being removed. Because code is being run through PHP, server side macros are now possible, rather than relying on cryptic JavaScript functions (such as for date management or database integration).

The Secret of SQL_CALC_FOUND_ROWS

Today, I wanted to go over a relatively simple MySQL feature that a lot of people don’t understand: SQL_CALC_FOUND_ROWS. To use this mystical key word, simply put it in your query right after the SELECT statement. For example:

SELECT * FROM USER WHERE id > 10 LIMIT 2,1 –see just second record

Becomes

SELECT SQL_CALC_FOUND_ROWS * FROM USER WHERE id > 10 LIMIT 2,1

This won’t change your results. It may, however, make your query run slower than when you select just one row the regular way. What this statement does is tell MySQL to find out just how many total records exist that match your criteria (in this case, where id is bigger than 10). For example, let’s assume that the user table has 100 records that have an id bigger than 10, then the query will take as long as it would have taken for the engine to find those 100 records.

The returned result will still be one the records you are expecting (in this case, the second record it found). But here is where the magic starts: If the very next query you run is a special select statement, you will have access to the total that was found. As in:

SELECT FOUND_ROWS(); –returns 100

The MySQL documentation on this subject says:

[SELECT FOUND_ROWS()] returns a number indicating how many rows the first SELECT would have returned had it been written without the LIMIT clause. In the absence of the SQL_CALC_FOUND_ROWS option in the most recent SELECT statement, FOUND_ROWS() returns the number of rows in the result set returned by that statement.

No matter what your LIMIT clause looks like (such as LIMIT 10, 1), this second query will still return the same number (in this example, 100). Why is this useful? Pagination. Often times, beginners (including me a few years ago) are stuck doing something like this:

SELECT count(*) FROM USER WHERE id > 10 –figure out how many total records there are
SELECT * FROM USER WHERE id > 10 LIMIT 50, 1 –get to record #50

People do this because you need the total to know if other matching results exist or what the last page number is.

This requires the engine to run the same query twice. This can be disastrous in cases where that query already takes a very long time to run. By including SQL_CALC_FOUND_ROWS, the overhead of running that count is grouped up with the process of actually retrieving the row of interest. So while the initial query might take a little longer to run than if you hadn’t tried to do a count, it is definitely faster than running the same query twice.

To take this to the next level, your pagination code should omit the use of SQL_CALC_FOUND_ROWS in subsequent page loads by caching the total count in the URL or session.

Happy hunting!

A Great Web Developer is a Great Application Developer

After being a part of the developer hiring process for a while now, I have begun to see what distinguishes an exceptional web developer (the ones that get hired faster than we can even make an offer) and everybody else. Things have changed since 1995 – web development requires actual knowledge of programming!

The problem is that people think web development is somehow easier than other types of development. Nothing is inherently easier about developing an application in PHP than in C++. While I could accuse people who say this as being naive, the truth is that web developers themselves seem to hold this as the truth. In order to excel at being a web developer, one must understand how to develop an application. In other words, there are very different skills required in programming something like Facebook or Digg than for updating Mom’s store website.

First, you must actually understand object oriented programming (OOP). Everybody these days says they “know” OOP. I don’t mean “you read about it in a book” or that you can use a class when necessary; I mean being able to write extensible, modular libraries. This takes time and practice. When people first begin writing classes, they mistake the process for “grouping up functions under one class name in one file.” This is dead wrong and actually counter productive in some cases.

Classes are about automation and simplification. The classic analogy for OOP is a car because it has many parts that make it a whole. But just like the driver has no idea what a crank shaft, piston, or fuel injector does, people using your class should have no idea (nor need to learn why) about the inner workings of your class. All they need to know is that calling one of the few public methods (turn_ignition(), step_on_pedal(), steer()) does a whole lot of cool stuff behind the scenes that makes their life easier. And just like how you wouldn’t want the entire car engine welded together (instead of using bolts and screws), you shouldn’t write humongous class methods that do 1000 things — break it up into digestible parts so that other people can enhance just the pieces they want (see “overloading” below).

When you design a class, you start by listing out things that the class will be used for. Then, you figure out the lowest common denominators between these actions to remove “special, one time cases.” You now have a list of your primary public functions. Everything else in your class should pretty much remain protected or private (discretion is learned with experience). The entire point is that classes are not a group of similar functions, but rather a single conceptual unit.

And when you’re all done with understanding why object oriented programming exists, you must then learn to use Inheritance and Overloading.

I would link to a Wikipedia article on overloading, but the topic is poorly explained. [EDIT: This section applies more to PHP] In short, overloading (also known as polymorphism) is the practice of “overwriting” a function of a parent class in a child class. Let’s look at a metaphor where Inheritance and Overloading are used:

A Mammal would be a parent level concept (class). All Mammals sleep() and eat(). A Cow is a Mammal, so it inherits (extends) the Mammal concept (class) and has all attributes of a Mammal, including sleep() and eat(). But its eat() function is different because eat() does what all other mammals do, but also calls regurgitate(). A Bat is a Mammal, but its sleep() requires hanging upside down in addition to doing whatever all other Mammals do (such as recovering health). Overloading allows us to change a behavior (function) to suite the child’s specific behavior that is different. It is easiest to think of it as “overwriting.”

Second, you must understand and demonstrate competency with design patterns. At the very least, you should be aware of the Singleton and Factory patterns, which are very commonly used, and for (arguably) good reasons. Knowledge of basic design patterns goes a long way because a time will come when you may need to borrow from one of these concepts when developing your web application. It also means you have a wider exposure to the different types of architectures available to someone writing a library or application.

Last, you must understand scaling issues in growing web applications and how to resolve them. This is probably the rarest trait, but if you have it, you will shine like a rock star. It is important to realize that certain operations are much more costly than others. In today’s world, the CPU is rarely the bottleneck, but there are still bottlenecks. A common instance of this problem is when Word Press blogs go offline because it is slammed by a spike in traffic. That said, the most common type of bottleneck involves the database and often manifests in one of three ways:

  • Each query sent to the database involves some small amount of lag between the web server and the database server. Hundreds of queries on a page will slow down your page loads. On high traffic pages, one less query means thousands of queries a second. Make sure you learn how to do JOIN statements.
  • Each query must use an index in the database. Without this index, a query looking at a large table will bottleneck the hard drive (because it has to search large portions of the disk). Learn how to use the MySQL command: explain.
  • Database tables using MyISAM (instead of InnoDB) risk facing table locking issues. When you run a big or complex update on MyISAM, no other updates can occur on that table until the first one is done. If that table is very large, this may mean someone’s page sits there and loads for a long time (or times out). Typically, one should use a transactional database such as InnoDB.

Other types of common scaling issues are:

  • Session management on the database instead of letting PHP handle it. This is an issue because any large web application has more than one web server which means when a person hits a new page they might be loading from a different web server each time.
  • Managing files generated on the server – such as when a user uploads a file. The issue here is that if a file is created on one server (in a cluster of 100), how do the other servers know it exists? Obviously the solution is to somehow centralize this with a synchronization process or store all of the files on the database using blob fields. :)
  • How queries can start acting different (much slower) when your tables reach 1M+ rows without proper precautions. Larger tables change things because indexes sometimes stop getting used or slowness that wasn’t apparent before is now very noticeable. (A book could be written here…)

Merely reading and understanding these issues has put you ahead of a bunch of people out there. :)

In conclusion, you must re-think the “web developer” as an application developer that happens to work in a browser. I often hear the excuse that these types of advanced concepts can’t be readily applied in small-time applications (such as a simple store-front), but I believe this is simply not true. Virtually every site requires a database connection or a template system (for easy inclusion of header/footer files). Database abstraction and template systems are probably the single best place to use OOP concepts. It simply makes no sense that one would use the raw mysql_query() function in any website, even if it’s just Mom’s baby pictures. Putting these ideas in classes reduces security vulnerabilities and makes development much faster. If the notion of abstracting templates or databases is totally foreign to you, I suggest you start by looking at Smarty or EzSQL.

Happy studying. :)

PHP/MySQL: The Escape Method Done Right

The issue is that PHP has some built in methods for escaping data. No, addslashes() is insufficient to protect you from SQL injection attacks (read: these get you fired). Here’s the solution for an escape function that does everything you could hope for. The @ symbols suppress PHP warnings so that I can use them to my advantage (newbies, please don’t try it at home). This goes inside a Database class.

/**
 * Escapes the passed value so it is ready to be inserted into the database. Takes magic quotes into
 * consideration as well.
 *
 * @param    string    parameter
 * @return    string    escaped parameter
 */
public function escape($value) {
    /*
     * stripslashes only if necessary
     */
    if (get_magic_quotes_gpc()) {
        $value = stripslashes($value);
    }
    /*
     * if this fails ($newValue is false), we know we need to fall back on the PHP4 way
     */
    $newValue = @mysql_real_escape_string($value);
    /*
     * if no connection handler can be found use this instead
     */
    if(FALSE === $newValue) {
        $newValue = @mysql_escape_string($value);
    }
    return $newValue;
}

Feel free to post suggestions.

Five Tips for (total) PHP Beginners

Someone asked me to write this a long time ago, so here’s my list. I make the basic assumption that you at least know some SQL.

1. Use EzSQL.

It is an open source library that does database abstraction. “Database Abstraction” is a fancy term for “making the database more developer-friendly.” It well help you grasp concepts like rows vs columns, how to manage multiple results, and escaping data. See their examples. While it is (in my opinion) a lacking solution for experts, it is awesome for newbies. I used this way back when I first started. Use it if you’re new to PHP.

2. DON’T Rely on Magic Quotes.

Magic Quotes is a retarded feature that tries to sanitize your data in an automated fashion. “Sanitize” is a programming term for “making data safe.” As in, without sanitizing data, hackers can do mean things to your database.

Some may argue turning this off is *unsafe* for beginners, but it also trains beginners to be less cautious about sanitizing data. This is a problem since the majority of corporate PHP servers (as in, the real world) have Magic Quotes off.  For example, let’s assume $name is equal to michi. Some programmers might try:

DELETE FROM users WHERE name='$name'; 
– thus: DELETE FROM users WHERE name='michi'

But what if a hacker managed to make $name equal to michi’ OR TRUE. What does the query end up looking like? If Magic Quotes is relied on, you would have been protected:

DELETE FROM users WHERE name='michi\' OR TRUE'; 
-- result: no such user; no harm, no foul

So it tries to find a user called michi’ OR TRUE, which is clearly not going to be found. In this case, Magic Quotes looks like a good thing. However, if you got used to Magic Quotes and then got a job, you might forget that it isn’t enabled in the Real World, and this is what happens:

DELETE FROM users WHERE name='michi' OR TRUE; 
-- (hint: it deletes all your users!)

Fired!

Just pretend like Magic Quotes is always off and use EzSQL’s escape() method or my solution (both solutions work regardless if magic quotes is on or off).

3. Always use require_once.

You will eventually see when include, require, or include_once is the better choice over require_once, but that time will come much later. As a rule of thumb for beginners, the last thing you probably expect is a silent “oops we couldn’t find the file so we’ll just continue on as if nothing happened…” bug.

4. Don’t display (echo) errors when a function has an error. 

Instead, store the errors in a class/global variable or return the error (neither of these are good practices for novice developers who should be throwing exceptions). The reason is that you never know in what context a function gets called; displaying errors could be disastrous in some situations. For example, if you were in the middle of writing a CSV file (comma delimited data file), you don’t want random unformatted error text appearing inside it saying “Error, invalid name!”. As in (notice the extra comma that got inserted in):

Charlie, Abigail, 31, California
Bob, Smith, 29, New York
Patrick, O'NeilError, invalid name!, 22, Washington
#check it out, a random extra comma got tossed in

Even if the script terminated at that point (which I assumed), the file may have been written to. Frankly, as a beginner, you or I can’t expect you to remember to *always* check if there was an error so it’s better to be defensive and just not randomly display stuff (which is why throwing exceptions is the ideal solution).

For those of you wondering why an echo statement might modify a file, note that for things like downloads in a browser, you will output the file in the browser and then change the header file-type to reflect what the user should do (i.e., download it). Thus, in this context, yes, you would echo out the entire file whether its XML, a Word document, a zip, a PDF, or HTML and then the user would be prompted to save it.

5. Use ob_start().

Put ob_start() at the beginning of your script before everything else.

<?php
ob_start();
// the rest of your script
?>

This will save you lots of time when you get advanced enough to do redirection or understand what it does. The syntax for browser redirection is:

// this is a side note on redirection; don’t put this in your script :P
header("location: another_file.php");
die;
// always call die directly after a redirect!

Redirection means that the page should stop (if the redirection was successful) and the user never sees any output from that page. They are then immediately redirected to another page. Since they aren’t supposed to see any output, showing them output before a redirection ruins the redirect. This is often very confusing and difficult to fix for a beginner. ob_start() fixes that issue (it’s magical like that).

Note that if ob_start() is used, you don’t need to do anything else special. There is no need to use the other functions like ob_flush(), ob_end_clean(), etc. You’ll get to those later (probably a few months) once you fully understand what ob_start() does.

Max File Size Causes *Silent* PHP Errors

Today, at work, I hit the strangest PHP error. I thought I’d share. 

I wrote a logging script that tracks whenever something goes wrong. First, it tries to write everything to a raw file and then to the database. The thinking was that if something bad happened to the database, at least I would have a log entry in a regular file. Unfortunately, I didn’t take into consideration what happens when something bad happens to the file, which turned out to be far worse.

We noticed something was wrong when no errors were being logged, and yet the script was clearly failing to do its job. There were no PHP errors, no database records logged, and nothing changed in the log file. After careful scrutiny, I confirmed there was indeed an error happening, and it had to be logging. So I looked at the log file, and then I noticed this:

2147483647 ClientException.1.log

That huge number is the number of bytes in that file. 2,147,483,647 bytes = 2 GIGABYTES. The file was too big for PHP to want to open. I ran the PHP script by hand and it simply said the following message where I called fopen():

File size limit exceeded

When this error was encountered, PHP died with no warnings, errors, or notices. There was nothing in the PHP logs to show something went wrong — that was some kind of operating system level error message. The code — no, PHP — simply halted! No destructors, no cleanup – nothing. Even the output buffer was destroyed, which means if there was ob_start() anywhere in the code above, all of the previous output (echo) was lost.

Scary.

So next time your script is dying without an explanation, make sure you check how big the logs are.