Archive for 26th March 2007

How I Integrated Dugg Mirror When I got Dugg

My regular readers probably didn’t realize, but I was recently on the front page of Digg, Slashdot, and Reddit. Digg and Slashdot are notorious for killing servers that get linked in their stories. If you want to see the stats for the days in question, you should see my post about it. This post is a little geeky, so beware.

Here’s the quick and dirty list for keeping your server up during a Digg Crisis:

  1. Download and enable wp-cache.
  2. Use the .htaccess rule I explain below. It keeps the rest of your site operational and even lets people post comments directly from the mirror!
  3. Turn off miscellaneous plugins. The biggest suspect was my related posts plugin (see below).

CPU Load

When I got on the front page of Reddit, my traffic spiked immediately. My server load went up to about 2.00. A “2.00″ is high, but is manageable and won’t cause any real problems. A “1.00″ approximately equates to “100% of CPU used.” So you can guess what a “2.00″ means. Realizing Digg was coming next, I knew I had to start making preparations, since this graph shows just how much bigger Digg is than Reddit.

I knew my hits would increase 100x, so I had to shut down as much unnecessary stuff as possible. First, I wanted to turn off every plugin I had on Word Press and then activate wp-cache, which would significantly reduces database overhead. Unfortunately, before I was finished configuring the cache settings, I hit the front page of Digg in record time. The server became non-responsive and I couldn’t even hit the “enable” button. My server load sky rocketed past 500.00. At 500, things stop working.

Integrating Dugg Mirror

Dugg Mirror is a service that creates copies of articles that hit the main page of Digg. Their goal is to serve as a backup in case the main source goes down (as it often does). As soon as my server was Dugg, my objective was to forward all traffic to the mirror.

When my server died, I was racing against time to redirect the traffic. Until I redirected the traffic, I couldn’t do anything to mitigate the problem (such as disabling plugins). It took me about five minutes (due to incredible lag) to connect to the server, go to the correct directory, and edit the .htaccess file to add this line at the very top:

RewriteCond %{HTTP_REFERER} (digg.com) [NC]
RewriteRule maybe-google-wanted-to-be-sued-youtube-and-plan-b http://duggmirror.com/tech_news/Why_Google_wanted_YouTube_to_be_Sued/ [R,L,NC]

The first line says, “If this visitor is from digg.com”. The second line (wrapped over two lines) says, “then redirect hits to my google article to the Dugg Mirror.” This redirected everybody who came to my site from Digg that wanted to see my article to the Mirror. Note that the rest of my site worked completely fine, and anybody trying to post a comment directly from the mirror was able to do so.

Upon applying this fix, my server load dropped to 4.00.

To make this work for your dugg articles, use the following:

RewriteCond %{HTTP_REFERER} (digg.com) [NC]
RewriteRule [article URL without beginning/trailing slash or domain] [dugg mirror URL] [R,L,NC]

The Finishing Touches

I made sure my most CPU/MySQL intensive plugins were off when a digg user came (since they were the ones causing problems). I put a snippet of code around my related post plugin that looked like this:

<?php if(FALSE === stristr($_SERVER['HTTP_REFERER'], ‘digg.com’)) { 
/* Intensive plugin-code */
} ?>

This snippet basically says, “If this visitor is from Digg, don’t do the burdensome plugins.” The thinking is that if a visitor is coming from Digg, I am likely Dugg, so if I wasn’t lucky enough to see it coming, at least I will mitigate some of the problems.

I also disabled my anti-comment-spam plugins and only kept Akismet running (since I hear Techcrunch uses it). I also initially disabled Spam Karma 2, but I eventually turned back on (I prefer it over Akismet).

As a good example horror story of how a plugin can kill you, I once had the Bad Behavior plugin installed (late 2005). It completely locked up my database when several search bots hit the site because it attempted to log each and everything the bots did. It took me days to figure out my blog was taking down my entire server because of this plugin! (There is a new version out now, but I am too scared to try it now.)

Anyway, my point is for you to be careful with plugins that use the database and only use ones you absolutely need, especially when getting Dugg.

I then finished enabling my blog cache, and my server load fell to 1.00. My server was down for about 5 minutes total. Minutes later, I pointed the traffic back onto my site with Word Press caching enabled and the load sat around 4.00. I won’t know for sure, but I think my server would have survived had I enabled the cache plugin from the beginning (which I tried to do!! :( ).

The next day, I was on Slashdot and my server never went down. This is why I conclude that not enabling the cache plugin had more to do with going down than any other factor.

Non-Word Press Administrators

Note that the web server was fine. MySQL failed. MySQL is much less robust than the web server when it comes to this sort of stuff, and requires significantly more baby-sitting. This is especially true for applications that weren’t designed to scale, such as Word Press. This is why the cache plugin is so powerful.

If you have a web application that is MySQL intensive and NOT Word Press, the steps you need to take to keep your site up are different:

  • Minimize the SQL running on the landing page that is getting Dugg. This may involve turning off things like session logging. For example, I turned off a user tracker that inserted a record into the database every time a visitor came to the site. Disabling this sped things up quite a bit.
  • Create as much static content as possible, at least for the first two hours. After the initial surge, traffic will drop to manageable levels (see hourly graphs near bottom). Your best bet is to “fake” part of your application with static content and a disclaimer to come back later.
  • Increase the memory usage limit for MySQL.
  • Increase the maximum allowed connections to MySQL.
  • Make sure you are using indexes in your queries. The quick and very dirty way of explaining this is if you have a SQL statement “… WHERE blah = ‘some value’”, make sure there is an index on the column blah if that table is more than 500 records and that column has many unique values (i.e., ignore columns like status, gender, or active/inactive). No, that’s not the ideal answer (this is why DBAs make the big bucks), but it’s the quick and dirty explanation why many Word Press plugins tend to contribute to a server dying when getting Dugg. Perhaps I will cover this in more depth another day.

I hope this helps!

Charts of Digg, Slashdot, and Reddit Traffic

Here are some numbers on the visitors brought in after I published my popular Google theory. Some of the most interesting stats:

  • Reddit and Slashdot generated nearly the same amount of traffic.
  • Only 74.8% of visitors use Windows (geeky crowd, part 1).
  • Firefox dominates with 65.5% (geeky crowd, part 2).
  • At the peak, I was using 3.0 Mbps when I hit the front page of Digg and Slashdot. I had about 2.0 Mbps sustained from each incident while I was at the top of the main pages.
  • The Reddit traffic surge is very gradual. It is difficult to assess exactly when I hit the front page of Reddit. I think this has to do with the fact that their site customizes the front page for each user. My money is on 4:15PM 03/20/2007 (see chart at bottom).

By siteVisitors by site:

  • Digg: 36,582 (of which 22,581 were redirected to Dugg mirror)
  • Slashdot: 14,961
  • Reddit: 9,296
  • Stumbleupon: 1,270

By day Visitor by day:

  • March 19, 2007: 332 (regular day)
  • March 20, 2007: 13,771 (Reddit front page)
  • March 21, 2007: 17,512 (Digg front page)
  • March 22, 2007: 17,321 (Slashdot front page)
  • March 23, 2007: 2,131
  • March 24, 2007: 1,404

Operating systems

Operating system:

  • 74.8% Windows
  • 13.4% Mac
  • 9% Linux

Bandwidth by day:

  • March 19, 2007: 26 MB (regular day)
  • March 20, 2007: 3,800 MB (Reddit front page)
  • March 21, 2007: 6,340 MB (Digg front page)
  • March 22, 2007: 6,460 MB (Slashdot front page)
  • March 23, 2007: 741 MB
  • March 24, 2007: 462 MB

Browsers Browser:

  • 65.5% Firefox
  • 19.7% IE
  • 5.3% Safari
  • 4.2% Opera

Bandwidth usage graphs:

Note, technically, I made front page of Digg at 10PM on 03/20/2007.

Traffic seems to heavily corresponds with work hours. ;)

Take note: this ordeal cost me almost 15 gigs of bandwidth for a text post. For those of you hoping to make a buck off of Digg, make sure you have no overage charges with your host. Also, make sure you can actually handle the traffic. The spike will bring down your entire server if you aren’t prepared.

Digital Transmission Rights: Revisiting with Internet Radio

Last week, digital music hit the headlines again when news came that Internet Radio was now at risk. Increasingly popular online radio services, such as Pandora, have stated that these fees would put them out of business. In short, a company can broadcast music over the radio by compensating the song writer while that same company must pay double fees if it streams the music over the Internet:

Internet radio royalties have become a thorny issue in part because conventional stations do not pay [labels] to use recordings. Both online and regular stations pay royalties to songwriters. But under a 1995 law, companies that transmit music using the Internet … must compensate both. … The $500 minimum for each channel is among the ruling’s more difficult aspects. Many Web radio sites offer thousands of channels, a strategy that would be impossible with this rate structure.

I can not understand why these labels are pushing so hard to destroy their own business in a time where CD sales are seeing record declines. The simple answer is that they don’t understand the Internet as a new medium, but the true answer is that they understand it all too well. They recognize that in an age where content is digitally distributed, the labels will be the first to die.

Bennett Lincoff, recently famous for this article on the Digital Transmission Right, followed up on his original article and addressed this exact issue last month. In it, he argues why Internet Radio could thrive at the benefit of content owners if the entire right to distribute was sold as a package, rather than as a per-play fee. This right would give the recipient nearly unlimited rights to give away, stream, or sell the content over the Internet to anybody who wants it. Legal versions of services such as Kazaa could resurface, so long as they pay for the transmission right. The cost of this right could be offset with ads and subscriber fees, but the benefit to consumers is clear. Lincoff goes on to predict:

Moreover, new businesses may arise to displace record labels as the source of funds to underwrite concert tours … And, as the digital music marketplace matures, the network itself will become the primary channel of “distribution” and licensed transmissions will displace sales … These circumstances suggest that the relative importance of the roles played by the major record labels … may diminish over time.

While the Transmission Right sounds great, it has a very long uphill battle.

One of the main problems in the uptake of this Transmission Right is who it needs to lobby. Because it will potentially sideline any major label, anybody who supports it could be seen as a hostile entity. Thus music producers may be less willing to support it because they would be afraid of upsetting their label. This means most mainstream artists will not join this movement. The labels themselves would do everything in their power to crush the idea of a transmission right that encourages activities like file sharing. Thus higher Internet Radio fees would be pressed faster than ever, in a short-sighted attempt to kill Internet Radio in its infancy, thereby destroying general demand for rights like the one Lincoff suggests.

Right now, the world is probably still not read for this right. Not because it isn’t a good idea, but simply because the politics involved: there are too many people with very deep pockets who want digital distribution to fail. But with CD sales hitting a recent all-time low, the tides are shifting.

For something to change, it will require a smart and savvy entrepreneur to build a company that negotiates these rights individually with the artists. This would probably need to start with the indie bands and slowly expand. In short, a band would need to understand how these rights benefit them. Here are the four closing points that every artist should know about the Digital Transmission Right:

  1. The rights to the music remain in the artist’s control.
  2. Income is only limited by how many deals the artist makes. The transmission right can be resold to any number of distributors for any price the artist can get, which could scale based on the size of a particular medium. For example, this means a different distribution deal with Yahoo Music, iTunes, and Myspace.
  3. The end user gets a DRM-free copy of the music. They may have paid for this song, perhaps through a service like iTunes, but it is 100% DRM free.
  4. Music discovery explodes. A consumer can listen to full tracks with no legal repercussions. They can find new artists they like and share these songs with their friends and family.
  5. No middlemen. The artists make money from selling this right and through proceeds at concerts. They keep the majority – possibly all – of the renevue.

Everybody wins. Well, everybody, but the RIAA.