PHP – An Annoying Tryst

It has been a few months since I’ve started working primarily in PHP and I am actually liking the language a lot! I like the simplicity and symmetry found in C style of solving problems, and PHP has got liberal amounts of that kind of resemblance. I have even come to terms with the clutter of all unorganized functions, considering all the positive aspects of PHP programming.

But an annoying incident happened today with PHP. I had been testing and developing my PHP app over the past few weeks now. Everything worked wonders really, until I deployed the code on client’s staging server that is. One of the scripts started giving errors while running, and after some debugging, I found that these two lines were the culprit:

$result = $result->fetchAll()[0][0];  //doesn't work in PHP 5.3, only in 5.4!
//$result = $result->fetchAll();$result=$result[0][0];  //works in both 5.3 and 5.4!
$states = ['QLD', 'SA', 'NT', 'WA'];    //doesn't work in PHP 5.3!
//$states = array('QLD', 'SA', 'NT', 'WA'); //works in both 5.3 and 5.4!

The comments are the after-effects of my analysis, of course. The reason my code broke was that I was running PHP 5.4 and the client had PHP 5.3. I generally prefer programming short-cuts, so in the first instance, I used it for fetching the results of an sqlite table query.
Turned out, this way of directly de-referencing an array is not allowed in PHP 5.3! You need another place-holder variable to keep this value.

Now there is nothing wrong in breaking compatibility between PHP versions if it is REALLy crucial to implement a functionality. But for basic things like array declarations? Its a different matter that almost every other language nowadays allow de-referencing of arrays, but PHP doesn’t. Buy why break compatibility all of a sudden? I was quite annoyed by this. Other languages like python do their utmost to keep existing code from breaking, and upgrades to newer versions are in a phased manner.

Again in the second case, the array declaration synatx ([x, y, z, ...]) works only in 5.4 and not in 5.3. Now I can understand the frustrations of all the developers who were happy moving to python and python-based frameworks. For a language already suffering from defects such as lack of organization in its functions and class-libraries, the designers’ strategy to break code compatibility between PHP versions only for aesthetic reasons (array(a,b,c) vs [a, b, c]) is bound to lose the confidence of its users sooner or later.

9 Optimizations to make your Linux Desktop fly like a Rocket!

This article is the result of notes I’ve prepared during tweaking, twisting and optimizing ubuntu variants over the last few years. In case you use any other distro, some of these settings may not be applicable to you. For best results, these changes must be done on top of a fresh installation, otherwise chances of things breaking increase a bit. Each step is optional – In case of software removals, do it only if you are not going to use the concerned software. Be careful before making any changes and know exactly what and why you are doing that.

#1 Optimize disk access with noatime:

Each file and folder on your linux system has a file-creation timestamp and a modification timestamp. Apart from that, linux tries to keep track of “access time” for each of these files. Now keeping track of the access time has its performace-cost, and if you want to remove this performance cost, you need to specify “noatime” attribute in the disk partition entries in your /etc/fstab file. Edit this file in your text-editor and set noatime as follows:

UUID=97102801-14e3-9752-7412-d9c57e30981w / ext4 errors=remount-ro 0,noatime 1

#2 Optimize Swappiness:

Swappiness is the tendency of the linux kernel to prefer disk-swapping vis-a-vis physical memory. The default swappiness value of 60 was kept considering server installations. If you are a desktop user having a machine with good RAM, you would normally prefer disk-swapping to be minimal. You can safely reduce this value to 10. To do so edit the file /etc/sysctl.conf and add the following:

vm.swappiness=10

(Just change the entry if it already exists, don’t make a duplicate!)

#3 Install preload:

If you typically use the same programs regularly, preload will help you by loading into memory, the programs that you use most frequently. To install on ubuntu:

sudo apt-get install preload

#4 Place your mission critical apps in /dev/shm:

Few weeks back, I was having performance issues with running Eclipse on ubuntu. After tweaking and optimizing various JVM settings in vain, the thing that really made the difference was placing the entire JDK folder in RAMDISK. The /dev/shm folder is like a virtual ramdisk (on ubuntu and derivatives) where you can place your temporal, high-priority stuff to run it in “best performance” mode.

#5 Remove unwanted programs from startup:

Many linux distros such as ubuntu come loaded with a ton of baggage, and if you are someone like me, you might feel obliged to reduce some burden off your system by removing or disabling unwanted software and daemons from it. You can do this by going to “Startup Applications” in the System menu, but ubuntu hides the pre-installed apps by default. To overcome this limitation, open your terminal and issue the below command:

sudo sed -i 's/NoDisplay=true/NoDisplay=false/g' /etc/xdg/autostart/*.desktop

ubuntu startup

Now you can go through the startup programs list and can disable the unwanted ones. Common sense will tell you that if you don’t use bluetooth on your machine, you can get rid of the “Bluetooth Manager”. Similar is the case with “Backup Monitor” in case you don’t need to sync your backups in realtime. Here is the list of services that I’ve safely disabled without causing any issues:

  • Backup Monitor
  • Bluetooth Manager
  • Chat
  • Desktop sharing
  • Gwibber
  • Ocra screen reader
  • Personal file sharing
  • Screen saver
  • Ubuntu one
  • update notifier

#6 Uninstall software that you don’t use:

The next step is to remove those software that you don’t use at all. Again, some common sense but with some caution is needed here. There are some programs (like empathy) that form the core part of ubuntu, so it won’t allow you to “apt-get remove..” them without removing unity itself. In such cases, we will disable such programs from starting up as services (next step). Some of the programs that you may safely remove are:

apt-get remove samba-common
apt-get remove cups
apt-get remove avahi-daemon avahi-autoipd

I typically uninstall all three after a new installation. The first one is needed for file-sharing in the local network if you have one. Second is the print daemon, and third is used to broadcast common network services across the local network and finding local hosts by using friendly names like “local.workstation”.

#7 Disable unwanted daemons:

In case you don’t want to remove the cups program as you might need printing in future, you can disable them for the time being. To do so, issue the below command:

sudo sh -c “echo ‘manual’ >> /etc/init/cups.override”

You may disable any daemon in this manner by doing a manual override, just replace the “cups.override” with the daemon name that you want removed such as:

sudo sh -c “echo ‘manual’ >> /etc/init/bluetooth.override”

Later, if you want to enable that daemon, all you to do is delete the .override file.

#8 Optimize Nautilus to behave in a speedy manner:

This is totally optional. Nautilus, by default, tries to show thumbnails of each and every file in a directory. If the directory contains a lot of files, this causes a noticable delay. Now if you are in the habit of regularly previewing thumbnails of your images, don’t do this optimization. Otherwise, if previewing thumbnails don’t matter to you and all you are interested in is speed (like me), you can go to Edit->Preferences->Preview-tab and set the preview settings to Never.

#9 Disable translation downloads in aptitude:

This setting is for speeding up the downloads from apt repositories rather than your machine. By default, ubuntu adds additional translation repos when you issue “apt-get update” command to update your repository settings. If you only need english, you can disable translation downloads by editing /etc/apt/apt.conf.d/00aptitude and additing this line to it:

Acquire::Languages “none”;

References:
http://askubuntu.com/questions/74653/how-can-i-remove-the-translation-entries-in-apt
http://askubuntu.com/questions/2194/how-can-i-improve-overall-system-performance
http://askubuntu.com/questions/173094/how-can-i-use-ram-storage-for-the-tmp-directory-and-how-to-set-a-maximum-amount

PHP-FPM vs node.js – The REAL Performance Battle

Benchmark

Even after getting all the boos and howls from PHP fan-boys after I criticized their favorite language in my last article, my search for the holy grail of performance truth still continues. That was only expected, considering how widely PHP is used across the web from small businesses to the Googles and Facebooks of the world to power their websites.

However, I do understand now that pitting PHP running on apache against a stand-alone node was a bit unfair with PHP for it was limited by what the apache configuration could handle.

No, this time I went with nginx, a light and performance oriented server that was specifically designed to solve the C10K problem from the ground-up. And who better than PHP-FPM, the enhanced Fastcgi process manager that implements asynchronous features (at least in theory) to take on node.js. node.js is the one server that implements all its features primarily using callbacks in javascript, and thus drastically improvising performance by leveraging the benefits of functional programming (again, in theory).

I used the same code I had used earlier but did a small improvement to it so that the random filenames generated for performing I/O are unique:

<?php
//asyncdemo.php
$s=""; //generate a random string of 108KB and a random filename
$filename="";
//generate a random filename
do {
	$fname = rand(1,99999).'.txt';
} while(file_exists($fname));
 
 
//generate a random string of 108kb
for($i=0;$i<108000;$i++)
{
	$n=rand(0,57)+65;
	$s.=chr($n);
}
 
//write the string to disk
file_put_contents($fname,$s);
 
//read the string back from disk
$result = file_get_contents($fname);
 
//write the string back on the response stream
echo $result;
//server.js
var http = require('http');	
var server = http.createServer(handler);
var fs = require('fs');
 
function handler(request, response) {
    response.writeHead(200, {'Content-Type': 'text/plain'});
 
    //generate a random filename
    do{fname = (1 + Math.floor(Math.random()*99999999))+'.txt';
    } while(fs.existsSync(fname));
 
    //generate a random string of 108kb
    var payload="";
	for(i=0;i<108000;i++)
	{
		n=Math.floor(65 + (Math.random()*(122-65)) );
		payload+=String.fromCharCode(n);
	}
 
    //write the string to disk in async manner
	fs.writeFile(fname, payload, function(err) {
			if (err) console.log(err);
 
			//read the string back from disk in async manner
			fs.readFile(fname, function (err, data) {
				if (err) console.log(err);
				response.end(data); //write the string back on the response stream
			});	
		}
	);
}
 
server.listen(8080);
console.log('Running on localhost:8080');

So, what happens when we run a piece of web application code performing async I/O for two thousand times (with two hundred concurrent) using a tool like apache-bench? Who is faster – PHP-FPM or node.js? Here is the answer.

Summarized results: PHP-FPM: 64.447 seconds

node.js: 42.441 seconds
The Machine: Intel Pentium Dual-Core 2.30GHz running Linux 3.2.0
The Configurations:
PHP-FPM node.js
PHP 5.4.23 (fpm-fcgi) (built: Jun 22 2014 14:51:15 node v0.10.28
Detailed Results:
PHP-FPM node.js
ab -c 200 -n 2000 http://localhost:8080/asyncdemo/asyncdemo.php

Concurrency Level:      200
Time taken for tests:   64.447 seconds
Complete requests:      2000
Failed requests:        6
   (Connect: 0, Receive: 0, Length: 6, Exceptions: 0)
Write errors:           0
Non-2xx responses:      6
Total transferred:      215649378 bytes
HTML transferred:       215355222 bytes
Requests per second:    31.03 [#/sec] (mean)
Time per request:       6444.742 [ms] (mean)
Time per request:       32.224 [ms] (mean, across all concurrent requests)
Transfer rate:          3267.70 [Kbytes/sec] receive
ab -c 200 -n 2000 http://localhost:8080/

Concurrency Level:      200
Time taken for tests:   42.441 seconds
Complete requests:      2000
Failed requests:        1
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Write errors:           0
Total transferred:      216155440 bytes
HTML transferred:       215953440 bytes
Requests per second:    47.12 [#/sec] (mean)
Time per request:       4244.115 [ms] (mean)
Time per request:       21.221 [ms] (mean, across all concurrent requests)
Transfer rate:          4973.69 [Kbytes/sec] received

So, moral of the story is that even the latest and greatest of PHP world falls behind node.js (though by a much smaller margin than before). Now, I do understand that PHP’s market is very large, and with so many opensource CMSes like wordpress, mediawiki and drupal already powered by PHP, it is quite difficult to shake PHP’s market share in the near future.

On the other hand, with the performance advantage that node.js offers, its a very lucrative option for startups small businesses that don’t have the funding to develop high-end enterprise apps in say, Java or SAP. More importantly, if tommorrow I were to given a task of developing a performance-driven app, is there one reason why I should not write it in node.js and go for PHP-FPM instead? Some food for thought. Comments are Welcome!

PHP vs node.js: The REAL statistics

When it comes to web programming, I’ve always coded in ASP.NET or the LAMP technologies for most part of my life. Now, the new buzz in the city is node.js. It is a light-weight platform that runs javascript code on server-side and is said to improvise performance by using async I/O.

The theory suggests that synchronous or blocking model of I/O works something like this:

Blocking I/O

I/O is typically the costliest part of a web transaction. When a request arrives to the apache web server, it passes it to PHP interpreter for scripting any dynamic contents. Now comes the tricky part – If the PHP script wants to read something from the disk/database or write to it, that is the slowest link in the chain. When you call PHP function file_get_contents(), the entire thread is blocked until the contents are retrieved! The server can’t do anything until your script gets the file contents. Consider what happens when multiples of simultaneous requests are issued by different users to your server? They get queued, because no thread is available to do the job since they are all blocked in I/O!

Here comes the unique selling-point of node.js. Since node.js implements async I/O in almost all its functions, the server thread in the above scenario is freed as soon as the file retrieval function (fs.readFile) is called. Then, once the I/O completes, node calls a function (passed earlier by fs.readFile) along with the data parameters. In the meantime, that valuable thread can be used for serving some other request.

So thats the theory about it anyway. But I’m not someone who just accepts any new fad in the town just because it is hype and everyone uses it. Nope, I want to get under the covers and verify it for myself. I wanted to see whether this theory holds in actual practice or not.

So I brought upon myself the job of writing two simple scripts for benchmarking this – one in PHP (hosted on apache2) and other in javascript (hosted on node.js). The test itself was very simple. The script would:

1. Accept the request.
2. Generate a random string of 108 kilobytes.
3. Write the string to a file on the disk.
4. Read the contents back from disk.
5. Return the string back on the response stream.

This is the first script, index.php:

<?php
//index.php
$s=""; //generate a random string of 108KB and a random filename
$fname = chr(rand(0,57)+65).chr(rand(0,57)+65).chr(rand(0,57)+65).chr(rand(0,57)+65).'.txt';
for($i=0;$i<108000;$i++)
{
	$n=rand(0,57)+65;
	$s = $s.chr($n);
}
 
//write s to a file
file_put_contents($fname,$s);
$result = file_get_contents($fname);
echo $result;

And here is the second script, server.js:

//server.js
var http = require('http');	
var server = http.createServer(handler);
 
function handler(request, response) {
	//console.log('request received!');
	response.writeHead(200, {'Content-Type': 'text/plain'});
 
	s=""; //generate a random string of 108KB and a random filename
	fname = String.fromCharCode(Math.floor(65 + (Math.random()*(122-65)) )) +
		String.fromCharCode(Math.floor(65 + (Math.random()*(122-65)) )) +
		String.fromCharCode(Math.floor(65 + (Math.random()*(122-65)) )) + 
		String.fromCharCode(Math.floor(65 + (Math.random()*(122-65)) )) + ".txt";
 
	for(i=0;i<108000;i++)
	{
		n=Math.floor(65 + (Math.random()*(122-65)) );
		s+=String.fromCharCode(n);
	}
 
	//write s to a file
	var fs = require('fs');
	fs.writeFile(fname, s, function(err, fd) {
			if (err) throw err;
			//console.log("The file was saved!");
			//read back from the file
			fs.readFile(fname, function (err, data) {
				if (err) throw err;
				result = data;
				response.end(result);
			});	
		}
	);
}
 
server.listen(8124);
console.log('Server running at http://127.0.0.1:8124/');

And then, I ran the apache benchmarking tool on both of them with 2000 requests (200 concurrent). When I saw the time stats of the result, I was astounded:

#PHP:
Concurrency Level:      200
Time taken for tests:   574.796 seconds
Complete requests:      2000
 
#node.js:
Concurrency Level:      200
Time taken for tests:   41.887 seconds
Complete requests:      2000

The truth is out. node.js was faster than PHP by more 14 times! These results are astonishing. It simply means that node.js IS going to be THE de-facto standard for writing performance driven apps in the upcoming future, there is no doubt about it!

Agreed that the nodejs ecosystem isn’t that widely developed yet, and most node modules for things like db connectivity, network access, utilities, etc. are actively being developed. But still, after seeing these results, its a no-brainer. Any extra effort spent in developing node.js apps is more than worth it. PHP might be still having the “king of web” status, but with node.js in the town, I don’t see that status staying for very long!

Update

After reading some comments from the below section, I felt obliged to create a C#/mono version too. This, unfortunately, has turned out to be the slowest of the bunch (~40 secs for 1 request). Either the Task library in mono is terribly implemented, or there is something terribly wrong with my code. I’ll fix it once I get some time and be back with my next post (maybe ASP.NET vs node.js vs PHP!).

Second Update

As for C#/ASP.NET, this is the most optimum version that I could manage. It still lags behind both PHP and node.js and most of the issued requests just get dropped. (And yes, I’ve tested it on both Linux/Mono and Windows-Server-2012/IIS environments). Maybe ASP.NET is inherently slower, so I’ll have to change the terms of this benchmark to take it into comparison:

public class Handler : System.Web.IHttpHandler
{
    private StringBuilder payload = null;
 
    private async void processAsync()
    {
        var r = new Random ();
 
        //generate a random string of 108kb
        payload=new StringBuilder();
        for (var i = 0; i < 54000; i++)
            payload.Append( (char)(r.Next(65,90)));
 
        //create a unique file
        var fname = "";
        do{fname = @"c:\source\csharp\asyncdemo\" + r.Next (1, 99999999).ToString () + ".txt";
        } while(File.Exists(fname));            
 
        //write the string to disk in async manner
        using(FileStream fs = File.Open(fname,FileMode.CreateNew,FileAccess.ReadWrite))
        {
            var bytes=(new System.Text.ASCIIEncoding ()).GetBytes (payload.ToString());
            await fs.WriteAsync (bytes,0,bytes.Length);
            fs.Close ();
        }
 
        //read the string back from disk in async manner
        payload = new StringBuilder ();
        StreamReader sr = new StreamReader (fname);
        payload.Append(await sr.ReadToEndAsync ());
        sr.Close ();
        //File.Delete (fname); //remove the file
    }
 
    public void ProcessRequest (HttpContext context)
    {
        Task task = new Task(processAsync);
        task.Start ();
        task.Wait ();
 
        //write the string back on the response stream
        context.Response.ContentType = "text/plain";
        context.Response.Write (payload.ToString());
    }
 
 
    public bool IsReusable 
    {
        get {
            return false;
        }
    }
}

References

  1. https://en.wikipedia.org/wiki/Node.js
  2. http://notes.ericjiang.com/posts/751
  3. http://nodejs.org
  4. https://code.google.com/p/node-js-vs-apache-php-benchmark/wiki/Tests