Next blog:

Daily Photo: Happy Summer Solstice

Previous blog:

My Parents Should Get a Dog

June 21, 2011 at 12:52 PM

It Was Time to Bake the Blog


Its been said that on the Internet, performance is a feature. If your site takes too long to load and has too many interactive widgets and whatnots, users will be confused, turned off and simply keep moving. One of the reasons Google won the early search wars, aside from its eerily spot-on results, was its speed.

I've been looking at this for a while on Last summer I undertook a significant rewrite to remove much of the feature bloat, cruft and clutter to pare down the number of technically interesting but little-used "features" that were in sum dramatically impacting load times.

The result of that effort is essentially the site you see now. However, in the past two weeks, I took another significant step to both speed up the performance of the blog and to improve its appearance in search results, hopefully expanding its readership and generating those coveted page views.

Previously, this website didn't exist - it was as series of templates that were dynamically populated by the server with information stored in a database. This is the hallmark of a content management system, or CMS. The merits of this approach is that each blog page only exists once, and if I make a change to that template, all of my blog pages immediately benefit from that change without any further effort or resource expenditure. It also allowed me to make blog creation mobile and available anywhere, such as from an iPad, iPhone or any email client.

Having a dynamic blog page also allowed me to easily update each page with the number of views, the number of comments, and even allow users to post comments in real-time and have those comments immediately viewable to other visitors.

The downside to this was that when a new blog was published, the server had to dynamically generate the same blog page for each reader, creating a significant load on the system. If too many read the same post at once, the server might start serving errors or simply grind to a halt. A year ago, as few as 100 concurrent readers would overload the server (those damn unpaid interns, you get what you pay for).

Further, because these blog posts didn't actually exist, they weren't as readily available to be searched and indexed by search engines. I also wasn't very smart about tagging headlines, including correct keyword tags and following good HTML practices that the Googlebot and other tools were trained to look for. This made the posts difficult to find and limits the potential audience. Search engines have become smarter about following dynamic pages, but their forte remains reading and indexing text.

Finally, the links generated by this system were both unattractive and uninformative. The URLs of your website is part of its design, and they should convey meaning to the reader:

For example, consider the following legacy URL from the old system:

The system was able to make sense of this URL as follows: blog.cfm is the template file, and was populated from the database with the content related to blog post number 1656. As these numbers were generated sequentially, all you could infer from this link was that this was the 1,656th post on and related sites, but nothing else. If you didn't know and trust this source, you might not be inclined to click on this link as it was anyone's guess what lie on the other end.

Consider the URL for the same post under the new system:

It may not look much neater upon first glance, but you can now see that this is a blog post, it was created on September 23, 2010 and the subject appears to be, Travels to China: Three days in Shanghai (I recommend you read that post, its a good one!).

The new system does not rely on a blog page template for each post. Instead, the server has canvassed the archives and written out a file with the content of each of those posts. Yes, this means that at some point, the server was asked to write out 1,853 text files containing the content of each blog. The first time I tried this it crashed the server, so now it processes them in manageable batches. But since I seldom revise the template or make changes to old posts, those files can rest comfortably on the server until viewed, no database required. Here's a side note: each text file is about 10-12K (kilobytes) in size, so the total size of the published works is about 20,000K or about 2 megabytes storage (not including the photos).

When I create a new blog post, the content is stored in the database until I hit publish, at which time the contents are written to a file and saved on the server. The same process is used when editing. I now have a text file and a copy in the database, which bolsters my backup in the unlikely event of a major disaster. This also allows me to create posts remotely, say from my iPhone or email client, and easily edit and update posts at a later date then have the server write out the file on a regular schedule.

This change has realized a number of benefits. A typical web server on a modest machine can easily serve up millions of text files per second. Dynamically generated pages would top out in the low thousands. Do I get that kind of traffic on No, but I now could and the load time for each page is significantly faster.

Another benefit to this approach is these text files are more easily indexed by Google and other search engines (unless the blog is marked private, in which case the files are tagged for no index, a request which Google and other respectable search engines uphold).

Back to the title of this post, a baked blog is one like I described where the content is written out into static text files rather than retrieved dynamically from a database on each view. Converting my system to this approach was a bit tricky, and I had write a dynamic plug-in to load the comments as a separate module and perform other behind-the-scenes tasks like counting page views, comments and the number of words in a post. The next step would be to convert the rest of the site, but I'm not sure I want to do that as those pages are updated on the fly based on new content generated by me and others. I'm not sure that's needed.

Whether this conversion was "worth" it or not, its too soon to tell. My page views have gone up significantly in the past 6 weeks as I'm on the front page of Google search results for a number of odd searches like "Three Days in Shanghai" and "Find my iPhone". Results aside, it was an interesting technical and intellectual challenge and one I was happy to have undertaken.

Next blog:

Daily Photo: Happy Summer Solstice

Previous blog:

My Parents Should Get a Dog
Copyright All rights reserved.